Nate Silver (of 538 fame) tweeted an interesting problem today. Somebody on Reddit had averaged the birthdays of all the presidents and found it to be July 4th (link). Nate responded that it was wrong and said the real average is sometime in late November. I thought it was an interesting problem and figured I’d work on my python skills so I decided to see for myself and threw in a couple of alterations as well.
The problem with the first guy is that he was treating the calendar as a flat line rather than the circle that it is. If you take enough dates and average them you’ll of course end up with an average near the middle of the calendar. But you can see the problem with a simple example pointed out by somebody else on reddit. What if I told you there was somebody born January 15ish and another person born December 15ish. If you averaged it this way then the average would be the middle of the year: July 1st or 2nd. But if you just think about it, that’s obviously wrong; those would average to about New Years Day: January 1st. So you have to treat the calendar as a circle rather than a line. Another way to check this: What if we changed the day that we call the first of the year to June 22nd or something weird. The reddit answer would change but the Nate answer would remain the same. In my opinion, if your numbers change because of something as arbitrary as what we call the first of the year, it isn’t a valid method.
So how did I calculate this? With python, of course! I made a spreadsheet with the presidents names, birthdays, day of the year born, terms served, and days in office. Note that Grover Cleveland only counts once and that I ignore leap days.
President | Name | Days in Office | Terms | Birthday | Birth day-of-year |
1 | George Washington | 2865 | 2 | 2/22/1732 | 53 |
2 | John Adams | 1460 | 1 | 10/30/1735 | 303 |
3 | Thomas Jefferson | 2922 | 2 | 4/13/1743 | 103 |
4 | James Madison | 2922 | 2 | 3/16/1751 | 75 |
5 | James Monroe | 2922 | 2 | 4/28/1758 | 118 |
6 | John Quincy Adams | 1461 | 1 | 7/11/1767 | 192 |
7 | Andrew Jackson | 2922 | 2 | 3/15/1767 | 74 |
8 | Martin Van Buren | 1461 | 1 | 12/5/1782 | 339 |
9 | William Henry Harrison | 31 | 1 | 2/9/1773 | 40 |
10 | John Tyler | 1430 | 1 | 3/29/1790 | 88 |
11 | James K. Polk | 1461 | 1 | 11/2/1795 | 306 |
12 | Zachary Taylor | 492 | 1 | 11/24/1784 | 328 |
13 | Millard Fillmore | 969 | 1 | 1/7/1800 | 7 |
14 | Franklin Pierce | 1461 | 1 | 11/23/1804 | 327 |
15 | James Buchanan | 1461 | 1 | 4/23/1791 | 113 |
16 | Abraham Lincoln | 1503 | 2 | 2/12/1809 | 43 |
17 | Andrew Johnson | 1419 | 1 | 12/29/1808 | 364 |
18 | Ulysses S. Grant | 2922 | 2 | 4/27/1865 | 117 |
19 | Rutherford B. Hayes | 1461 | 1 | 10/4/1822 | 277 |
20 | James A. Garfield | 199 | 1 | 11/19/1831 | 323 |
21 | Chester A. Arthur | 1262 | 1 | 10/5/1829 | 278 |
22 | Grover Cleveland | 2922 | 4 | 3/18/1837 | 77 |
23 | Benjamin Harrison | 1461 | 1 | 8/20/1833 | 232 |
25 | William McKinley | 1654 | 2 | 1/29/1856 | 29 |
26 | Theodore Roosevelt | 2728 | 2 | 10/27/1858 | 300 |
27 | William Howard Taft | 1461 | 1 | 9/15/1843 | 258 |
28 | Woodrow Wilson | 2922 | 2 | 12/28/1856 | 362 |
29 | Warren G. Harding | 881 | 1 | 11/2/1857 | 306 |
30 | Calvin Coolidge | 2041 | 2 | 7/4/1872 | 185 |
31 | Herbert Hoover | 1461 | 1 | 8/10/1874 | 222 |
32 | Franklin D. Roosevelt | 4422 | 4 | 1/30/1882 | 30 |
33 | Harry S. Truman | 2840 | 2 | 5/8/1884 | 128 |
34 | Dwight D. Eisenhower | 2922 | 2 | 10/14/1890 | 287 |
35 | John F. Kennedy | 1036 | 1 | 5/29/2917 | 149 |
36 | Lyndon B. Johnson | 1886 | 1 | 8/27/1908 | 239 |
37 | Richard Nixon | 2027 | 2 | 1/9/1913 | 9 |
38 | Gerald Ford | 895 | 1 | 7/14/1913 | 194 |
39 | James Earl Carter | 1461 | 1 | 10/1/1924 | 274 |
40 | Ronald Reagan | 2922 | 2 | 2/6/1911 | 37 |
41 | George H. W. Bush | 1461 | 1 | 6/12/1924 | 163 |
42 | William Jefferson Clinton | 2922 | 2 | 8/19/1946 | 231 |
43 | George W. Bush | 2922 | 2 | 7/6/1946 | 187 |
44 | Barack Obama | 2922 | 2 | 8/4/1961 | 216 |
45 | Donald Trump | 394 | 1 | 6/14/1946 | 165 |
Then I wrote a Python script to read in this csv file and find the day of the year which has the smallest average distance to the birthdays. Some notes. After you find the number of days and make sure it’s positive, you have to subtract if it’s more than 182.5 days away. And once you find the distance from a day to a birthday, you have to square the distance before adding it to your sum. There are mathematical reasons that I don’t want to get into but just trust me that it’s similar to finding a line of best fit for a graph.
Here is the csv file for the presidents and here is the python file since wordpress seems intent on screwing up my code.
import csv import datetime day=[] # day of the year for birthday term=[] # number of terms in office numDays=[] # number of days in office avgDay=0 # Here's the magic that opens the csv file with open('bdays.csv', 'rU') as csvfile: reader = csv.reader(csvfile) # skips the header line next(reader, None) # Reads in a row at at time and if it's not empty # it appends that data to the array for row in reader: if row[0] != '': day.append(int(row[5])) #day of the year term.append(int(row[3])) #number of terms numDays.append(int(row[2])) #days in office # For loop - from 0 to 365, we go through every day of the year # For each day, we determine the total distance from all the presidential # birthdays and use the day with the smallest distance as the average. # Remember that the calendar is a circle so you have to subtract 365 if over 182.5 # Also, you must square that value before adding to sum # For weighting just multiply this square by the weighting value # Calculating just the average birthday for j in range(366): sum=0 for i in range(len(day)): val = j - day[i] val = abs(val) if(val > 182): val = 365-val; sum += val*val; # If the sum is less than the min sum so far or if it's the first time, # set minSum to current Sum and the average to the current day if j==0 or minSum > sum: minSum = sum avgDay = j # Prints day, sum, and best average so far for debugging #print('j: ' + str(j) + ', sum: ' + str(sum) + ' avg: ' + str(avgDay)) # Convert the day of the year to the actual date and print d = (datetime.datetime(2018, 1, 1) + datetime.timedelta(avgDay - 1)).strftime('%B %d') print ('Average presidential birthday is the ' + str(avgDay) + 'th day of the year: ' + d) # Calculting average birthday weighted by number of terms for j in range(366): sum=0 for i in range(len(day)): val = j - day[i] val = abs(val) if(val > 182): val = 365-val; sum += val*val*term[i]; # If the sum is less than the min sum so far or if it's the first time, # set minSum to current Sum and the average to the current day if j==0 or minSum > sum: minSum = sum avgDay = j # Prints day, sum, and best average so far for debugging #print('j: ' + str(j) + ', sum: ' + str(sum) + ' avg: ' + str(avgDay)) # Convert the day of the year to the actual date and print d = (datetime.datetime(2018, 1, 1) + datetime.timedelta(avgDay - 1)).strftime('%B %d') print ('Average presidential birthday weighted by term is the ' + str(avgDay) + 'th day of the year: ' + d) # Calculting average birthday weighted by number of days in office for j in range(366): sum=0 for i in range(len(day)): val = j - day[i] val = abs(val) if(val > 182): val = 365-val; sum += val*val*numDays[i]; # If the sum is less than the min sum so far or if it's the first time, # set minSum to current Sum and the average to the current day if j==0 or minSum > sum: minSum = sum avgDay = j # Prints day, sum, and best average so far for debugging #print('j: ' + str(j) + ', sum: ' + str(sum) + ' avg: ' + str(avgDay)) # Convert the day of the year to the actual date and print d = (datetime.datetime(2018, 1, 1) + datetime.timedelta(avgDay - 1)).strftime('%B %d') print ('Average presidential birthday weighted by days in office is the ' + str(avgDay) + 'th day of the year: ' + d) <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>
Run the code and you get November 22nd as the average birthday, where average is defined as the date with the least distance to the actual president birthdays. You also get March 6th when weighting for number of terms served and March 10th when weighting for number of days in office. And of course, you could find the average birthday including years, which would be March 16, 1840.
Questions? Comments?
JFK’s birth year in your table is I think wrong, unless you know something we don’t… 🙂