Tag Archives: data science

How early should you get to the airport?

Nate Silver has put together a spreadsheet with a comprehensive answer to this question, based on records from 800 flights he personally has taken. You have to be a paid subscriber to his site to get the spreadsheet, but there are plenty of clues in the narrative. It is not crystal clear which factors are additive rather than overlapping.

  • First, the “base case” is a solo English-speaking American business traveler. Families and people who don’t speak perfect English are inconveniences that can be treated in a stochastic manner. More specifically, this base case is a solo domestic (U.S.) traveler, with TSA PreCheck or CLEAR, and not checking bags. For this base case, the rule is “60 minutes from walking through the airport door to departure.”
  • For a car commute, round up whatever Uber/Lyft says the trip will take by 30%. [For public transportation, my rule is to take the vehicle before the last vehicle that would get me there just barely on time.]
  • If parking or returning a rental car, add 15-30 minutes.
  • Add 15 minutes for a really big, busy airport (like JFK, O’Hare). 5-10 if you use that airport a lot and know it inside and out.
  • Add 5-10 minutes if you have a connection make. His reasoning: “This might seem silly since it doesn’t affect the departure time at the originating airport. But it raises the stakes for missing your flight. Also, if you arrive at the very last moment, you’ll likely be asked to gate-check your bag, which can get you off to a slower start when making that tight connection.”
  • Add 20 minutes if you do not have PreCheck/CLEAR, +5 for really big busy airports and -5 for small ones
  • Add another 20 minutes in bad weather.
  • For international, add 20-40 minutes if you don’t need to check in at the counter and another 15 (business class) to 30 (economy class) if you do.
  • Add 20 minutes if you just enjoy relaxing at the airport with a beverage before flying.
  • Special case: If you’re going somewhere (like Canada) that you need to clear immigration before getting on the plane, you need to allow an extra 30 minutes. Presumably it will save you the same amount at the other end (although in my personal experience, U.S. immigration is about as bad as it gets anywhere I have been.)

Okay, let’s try adding this up for my most common travel cases.

Case 1: A domestic business trip, let’s say I’m attending a 2-3 day conference. I have one bag that fits in the overhead compartment or can be gate-checked. I check in online. Weather is reasonable. I’m going to allow 60 minutes at the airport, +10 because my home airport of Philadelphia is pretty big and pretty busy but I know it well, +25 because I don’t have Precheck and security can be a real cluster-, +20 because I enjoy sitting down and having a beer before flying. That’s 1 hour 55 minutes, so the “2 hour rule” was just about right. I could do PreCheck if I really travel enough to make it worthwhile, and obviously I don’t need the beer, I just want it.

Case 2: An international trip with the family. I am past the stroller/car seat/diaper phase which would add exponential complication – not part of Nate Silver’s computational framework. Let’s say I am traveling in reasonable weather from the nightmare hell (but relatively easy to get to) hub of Newark. 60 minutes + 15 because it’s a nightmare hell hub + 10 minutes because there’s a connection + 20 minutes for security + 60 for counter check in (! – but yes, it can be this bad). I’ll skip that beer because I don’t want to get even more dehydrated on a long haul flight. I get 2 hours 40 minutes, so the “3 hour rule” is not far off.

So in conclusion, for me the 2 hour rule can maybe be shaved to 1.5 and the 3 hour rule to 2.5 if I want to live adventurously. I try to get to the airport by public transportation when I can though, so that adds another layer of likely delays. My rule there is to take the vehicle before the vehicle that would get me there just in time. Sometimes you just have to try to relax and make the most of wherever you happen to be, and not worry so much about the time. If I spent another half hour at home or the office before I left for the airport, what would I really do – either interact with other people or do something on a screen. Traveling is stressful, and it can be good to take a moment between the mad dash to the airport and security line, and the various inconveniences and indignities of actually flying. At the airport, I am more likely to read a book, have a beverage, or unwind a bit before flying if things aren’t too crazy.

the ggplot2 “ecosystem”

In the beginning there was R. Or, S? I’ve heard that R actually rests on a foundation of C++ or Java. Anyway, then there was the tidyverse, sort of another whole programming language that rests in R (or a metastasizing cancer that has grown to dominate R, if you ask certain people, but I personally am a big fan). Now within the tidyverse was always ggplot2, which I have grown to rely on almost exclusively for plotting. Now ggplot2 itself has grown into an “ecosystem” of related programs and extensions. Here is a useful guide. I’ve always been interested in finding the really good ones for things like interactive charts (plotly) and animations (gganimate). And awesome as ggplot2 is, there are some things that are just clunky, like scales and legends (seriously, legends are a big pain point for me – I hope there is an extension out there that really streamlines legends). But I am also wary of using extensions that might be buggy or not updated/supported long term, which could make my code obsolete sooner. So I usually try to do things with ggplot2 proper first, and if that doesn’t work with a reasonable effort I will try one of the extensions. So this guide seems timely and useful.

April election poll check-in, or “it’s just the fading price shocks in gas and groceries, stupid”

Here’s where we stand as I write this on April 3, 2024. Sure, there are all sorts of reasons the polls might be wrong and it is a long time until election day…but I would rather be ahead in the polls and saying that than behind, wouldn’t you? Or even behind and getting less behind.

STATE2020 RESULTMost Recent Real Clear Politics Poll Average (as of 4/3/24)
ArizonaBiden +0.4%Trump +5.2% (March 1: Trump +5.5)
GeorgiaBiden +0.3%Trump +4.5% (March 1: Trump +6.5)
WisconsinBiden +0.6%Trump +0.6% (March 1: Trump +1.0%)
North CarolinaTrump +1.3%Trump +4.6% (March 1: Trump +5.7%)
PennsylvaniaBiden +1.2%Trump +0.6% (March 1: Biden +0.8%)
MichiganBiden +2.8%Trump +3.4% (March 1: Trump +3.6%)
NevadaBiden +2.4%Trump +3.2% (March 1: Trump +7.7%)

The electoral college vote, as it stands at the moment, would be 312 for Trump to 226 for Biden. (March 1: 293 for Trump to 245 for Biden)

So the verdict is…Biden behind but getting less behind in every swing state (6 out of 7) except Pennsylvania. The Nevada, Georgia, and North Carolina moves are all more than 1% towards Biden. Arizona, Wisconsin, and Michigan are less than 1% towards Biden. The Pennsylvania move is less than 1% towards Trump, but because this flips the state from slight Biden to slight Trump, Trump now leads all swing states and the electoral college looks even worse for Biden than a month ago.

Have we gone from “it’s the economy, stupid” to “it’s the rate of change in the rate of change in the price of groceries, compared to the rate of change of the rate of change in the price of groceries two years ago, stupid”? Maybe it’s that simple. Sure, there is plenty going on in the world in terms of war and peace and the collapsing biosphere that supports all life. But we are Americans, and we don’t base our votes on these things. At least not enough of us, enough of the time to make a difference compared to the damn price of groceries. All things being equal, I would wager on this trend continuing over the next seven months. Of course, all things will probably not be equal – a significant recession that throws a significant number of voters out of work would be the worst possible thing for Biden. Because it doesn’t matter so much how much the damn groceries cost if you have no money at all. On the other hand, most other crises might tend to give Biden a chance to show some leadership, which at least some voters might like. And of course, Biden and/or Trump could drop dead at any time. I am not predicting any of these things, just defining a range of things that could happen.

weather forecasting

This is interesting. It is not 100% clear to me what the measure of accuracy is below, but the plot shows how much weather forecasting has improved over the last 50 years or so. A 3-5 day forecast is highly accurate now, and 3-5 are not that different. It’s interesting to me that there is such as large drop off in accuracy between a 7 and 10 day forecast – that is not necessarily intuitive, but useful even in everyday life. A 10-day forecast is basically a coin flip, while check back 3 days later and you are closer to 80/20 odds. This is based on pressure measured at a certain height I think, so it doesn’t necessarily mean forecasts of precipitation depth and intensity, rain vs. snow vs. ice, thunder and lightning, tornadoes, etc. are going to be as accurate as this implies.

Our World in Data

There is some suggesting that AI (meaning purely statistical approaches, or AI choosing any blend of statistics and physics it wants?) might make forecasting much faster, cheaper, and easier yet again.

most popular R books of 2023

Here is something useful (to me, personally, and maybe too others), and thankfully not too pessimistic or morally fraught.

A Crash Course in Geographic Information Systems (GIS) using R – yes, please! We must end the tyranny of the monopolistic Environmental Systems “Research Institute”. Okay, they make some nice products, but just admit you are a rapacious for-profit corporation, please!

A ggplot2 Tutorial for Beautiful Plotting in R – Who doesn’t need to improve their data visualization and communication game?

just start your y-axis at zero

Seriously, just do that and it will work out most of the time. The only exception in my mind is if you are comparing the range or spread of two data sets and neither one is close to zero.

Snopes

I’ve been to Indonesia, and people there are normal human beings who are in fact somewhat shorter than Europeans on average. But their heads were typically around my shoulder height, not my knees. Some political violence has occurred there in the not-so-distant past, but I found the culture warm and hospitable. Like almost any country not at war, the biggest risk to your physical safety is probably being in a car accident or hit by a car. The next biggest if you are there for any length of time might be air pollution and second hand smoke. Once an Indonesian woman yelled at me to not sit next to her on a ferry. The ferry was crowded and there was nowhere else to sit, but I was eventually able to solve the problem by swapping seats with another woman (my gender being what made her uncomfortable apparently.) Other times I had groups of female Indonesian tourists stop me on the street and ask to take vacation pictures with me to show their friends back home. This was when I was quite a bit younger than I am now.

tile maps

Tile maps, which visually show areas with unequal areas as having equal area, are, somewhat obviously, appropriate when you don’t want the unequal geographic area to distort the message you are trying to communicate. An example might be if you want to show a variable by congressional districts, which have (roughly) equal populations but variable (spatial) areas.

A couple other ideas with tile maps are (1) to use rectangles of equal shape but different length/width ratios, and (2) to use words spatially arranged and with a variety of properties (font, size, color) to denote a variety of variables.

accuracy of a model vs. its “decisional quality”

I like the way the abstract of this paper distinguishes between (1) the accuracy of a model as measured by comparing it to physical observations (always assuming those are an accurate or at least unbiased measurement of the true state of the universe and (2) the appropriateness of a model to be used in decision making. I find these concepts very, very difficult to get across even to scientists and engineers.

Ecological forecasting models: Accuracy versus decisional quality

We consider here forecasting models in ecology or in agronomy, aiming at decision making based upon exceeding a quantitative threshold. We address specifically how to link the intrinsic quality of the model (its accuracy) with its decisional quality, ie its capacity to avoid false decisions and their associated costs. The accuracy of the model can be evaluated by the [Greek symbol rho – I don’t know what they mean by this just from reading the abstract] of the regression of observed values versus estimated ones or by the determination coefficient. We show that the decisional quality depends not only of this accuracy but also of the threshold retained to make the decision as well as on the state of nature. The two kinds of decisional errors consists either in deciding no action while an action is required (false negatives) or to act while it is useless (false positives). We also prove that the costs associated to those decisions depend also both of the accuracy of the model and of the value of the decision threshold.

Ecological Modeling

(slightly less) depressing stats on the U.S.: suicides

Here are some suicide stats from Our World in Data. It would be nice if they would add some more groupings like OECD, but I have chosen a somewhat arbitrary sample of peer countries. It surprised me that even though we are hearing about “deaths of despair”, the U.S. is not doing terribly on this metric compared to peers. We are doing a bit worse than our close cultural cousins Canada and Australia. The UK does surprisingly well on this metric, even a bit better than Germany and Denmark. Latin America (I picked Mexico because they’re our neighbor and Brazil because they’re big) doesn’t seem to have a big issue with suicide. The two Asian countries I picked do seem to have an issue – Japan has a higher suicide rate than all the European countries I picked. Then there is a big jump to the two worst countries (that I picked arbitrarily), South Korea and Russia. Russia is the worst, but has brought its rate down a lot if you buy into this data analysis.