## Tuesday, December 29, 2015

### Calendar Math

One of my favorite questions to ask my statistics students is "What is a leap year?"  Many of them get a general idea, but don't actually know.

A leap year is any non-century year divisible by 4, and any century year that is divisible by 400.

What is nice about determining whether a year is divisible by 4 or not, is that you only have to pay attention to the last two numbers.  For example, 2016 will be a leap year since 16 is divisible by 4.  However, 2100 will not be a leap year. Even though it is divisible by 4 it is a century year, and needs to be divisible by 400, which it is not.

This is the second time I've run across a problem involving calendar math this month. I first encountered one when solving Problem 19 on Project Euler.  It had me determine the number of times during the twentieth century that January 1st was a Sunday.  I won't spoil that one.

FiveThirtyEight posted their 4th riddle which asks a series of questions about calendars.  They are reprinted below for convenience.
1. How many different calendars would you need to represent all possible years — accounting for all day and date combinations? (Don’t forget about leap years!)
2. Now that we have all the calendars we could possibly need, it’d be nice to know how often we’re using them. When is the next time we’ll use the 2015 calendar?
3. What is the smallest total number of years that will pass between using the same non-leap-year calendar twice?
4. What is the largest?
5. What is the smallest total number of years that will pass between using a leap year calendar twice?
6. What is the largest?
Every calendar can be completely defined by the day of the week on which the first day (Jan 1) and the last day (Dec 31) falls.  Using Su, M, T, W, R, F, and Sa for the days of the week, we can represent our 2015 calendar as (R,R) since the first and last day of the 2015 calendar were both Thursdays.

The 2016 calendar will be (F,Sa) because the extra day during this leap year will shift the last day forward one.  The 2017 calendar will then be (Su,Su).  There are 7 calendars in which both days are the same, and you can quickly see that there are also 7 calendars in which the the first day is one day of the week, while the last day is the next day of the week.  This gives us a total of 14 possible calendars.  It doesn't take much analysis to notice that we do indeed use every one of those 14 possible calendars.

If we follow the pattern from 2017 for a while, we find that 2023 will be another (Su,Su) calendar, suggesting the potential answer to #3 being 2023-2017=6.

With 2023=(Su,Su), 2024=(M,T), and 2025=(W,W), we find that 2026=(R,R) will be the next time we use our 2015 calendar, a difference of 11 years, which gives our answer to #2 and suggests a potential answer to #4.

When we explore leap years for a while, we see 2016=(F,Sa), 2020=(W,R), 2024=(M,T), 2028=(Sa,Su), 2032=(R,F), 2036=(T,W), 2040=(Su,M), which brings us back to the 2016 calendar in 2044, a difference in 28 years.

If one were not careful, and forgot about skipping a leap year 3 out of every 4 centuries, we may think that the answer to #5 and #6 were the same number 28, being both the smallest and largest number of years that will pass between using a leap year calendar twice.

So, let's move up closer to 2100.  We'll take the leap year 2032=(R,F) that I analyzed above and add 28 years to it twice to get us to 2088=(R,F).  Let's observe the next several years beyond 2088.

 2088 (R,F) 2092 (T,W) 2096 (Su,M) 2100 (F,F) 2104 (T,W) 2089 (Sa,Sa) 2093 (R,R) 2097 (T,T) 2101 (Sa,Sa) 2105 (R,R) 2090 (Su,Su) 2094 (F,F) 2098 (W,W) 2102 (Su,Su) 2106 (F,F) 2091 (M,M) 2095 (Sa,Sa) 2099 (R,R) 2103 (M,M) 2107 (Sa,Sa) 2108 (Su,M)

The interesting thing that happens here is that we go 8 years between leap years from 2096 to 2104.  What does this do to the calendar?  It allows us to see a (T,W) leap year calendar in 12 years rather than the usual 28, and the (Su,M) calendar of 2096 again in 2108.

When will the next leap year calendar of (R,F) from 2088 be seen?  Usually, it would be 28 years later, or 24 years from the next (T,W) leap year calendar.  Because of the century year in between, we see that the (T,W) leap year calendar repeats on the other side of the century year, and will need another 24 years until a (R,F) calendar. So, a total of 40 years will pass before we see the leap year calendar of (R,F) again.

Also worth observing is the 2090 calendar (Su,Su). Why?  Because we don't see it again until 2102 which is 12 years later.  This increases the maximum number between non-leap year calendars I had previously guessed by 1.  So, it would appear that the answers to #4 and #5 are the same.

So, my answers for this riddle are as follows.
1. 14
2. 2026
3. 6
4. 12
5. 12
6. 40
As always, it is very possible that I've missed something. Please chime in if I have.

## Thursday, December 24, 2015

### Those Distracting Smartphones

The most recent Riddle from FiveThirtyEight has me stumped and feeling what it is like to be a frustrated graduate student again.  No matter how much I worked through different probabilities using different approaches, I couldn't find a pattern that would simplify the problem.  And what's worse, there is no professor I can go visit in office hours. I've even tried Tweeting the writers of the problem for a hint, but to no avail.

The Riddle is quoted below for convenience.
By using a random uniform number generator in R, I created two sequences of random integers from 1 to 5, call them X and Y.  To illustrate, suppose the first set of random sequences were given as X = {4,3,4,1,5,...} and Y={3,5,1,2,5,...}.

I would then ask R to compare the first two values of 4 and 3.  If they happened to match, I would ask for that matching number. If one was smaller than the other (as is the case with this example), I would ask R to take the smaller value and add to it the next number in the sequence.  In this case, we get 3+5=8.

Now, I ask R to compare 4 and 8.  No match?  Then 4+3=7 in the X sequence. Compare 7 and 8.  No match?  Then 7+4 = 11 still in the X sequence.  Compare 8 and 11.  Still no match, so now jump back over to the Y sequence and get 8+1=9.  Compare 9 and 11, and find no match, so 9+2 =11 in the Y sequence.  Finally! Since 11 and 11 match, I have R report that number.

Then, I asked R to do that 1,000 times and give me a mean of all the lengths it produced.  Again.  Again.  Again.

The pattern that results makes me want to guess that 9 is either the correct answer or very close to the correct answer.

Please, please, please, somebody chime in with the trick to solving this problem without simulation.  I'm going insane.

For the coding geeks out there, this is what I did:

## Friday, December 18, 2015

### Gushing Geysers

It's Friday!! Time for "Riddle" 2, which isn't much of a riddle, but that's OK.  It is titled, "Which Geyser Gushes First?"  The riddle is quoted below.
You arrive at the beautiful Three Geysers National Park. You read a placard explaining that the three eponymous geysers — creatively named A, B and C — erupt at intervals of precisely two hours, four hours and six hours, respectively. However, you just got there, so you have no idea how the three eruptions are staggered. Assuming they each started erupting at some independently random point in history, what are the probabilities that A, B and C, respectively, will be the first to erupt after your arrival?
Here is a quick explanation of how independence works.  If there is a 50% chance of rain today, and a 50% chance of rain tomorrow, and the events of rain today and/or tomorrow are independent, then the chance that it rains both today and tomorrow is (0.5)(0.5)=0.25, or a 25% chance.  In other words, independence allows us to multiply probabilities together.

So, to our problem.  We know for a fact that we will definitely see one of the three geysers erupt within 2 hours of our arrival.  So, let's focus on that 2 hour interval.

Case 1: All three geysers erupt within this 2 hour interval.  There is a 100% chance that geyser A will erupt in this interval, a 50% chance geyser B will, and a 33.3% chance that geyser C will (or 1/3 probability).  Thus, the probability that all three will erupt within 2 hours of arrival is 1(1/2)(1/3)=1/6.
Now, if all three are within that interval, they can erupt in the following six orders: ABC, ACB, BAC, BCA, CAB, CBA.  Each of these are equally probable (given the condition), so the probability that A is the first geyser you see is 1/3, and likewise for B and C.  So, the probability that all three geysers erupt in the first two hours AND any one of them go first is (1/6)(1/3)=1/18.

Case 2: Geysers A and B erupt in the first two hours, and C erupts after 2 hours.  There is a 100% chance A will erupt within the first two hours, a 50% chance B will, and a 66.7% chance (or 2/3 probability) that C will not.  Thus, the probability of of this case is 1(1/2)(2/3)=1/3.

Now, if geysers A and B erupt in the first two hours, they can erupt in order AB or BA with the equal conditional probability of 1/2.  So, the probability of Case 2 occurring AND A or B going first is (1/3)(1/2)=1/6.

Case 3: Geysers A and C erupt in the first two hours, and B erupts after 2 hours. There is a 100% chance A will erupt within the first two hours, a 33.3% chance (or 1/3 probability) that geyser C will, and a 50% chance that geyser B will erupt after 2 hours.  Thus, the probability of this case is 1(1/3)(1/2)=1/6.

Analogous to Case 2, we have a 1/2 conditional probability that either A or C goes first. So, the probability of Case 3 occurring AND A or C going first is (1/6)(1/2)=1/12.

Case 4: Geyser A is the only geyser to erupt in the first two hours.  There is a 100% chance that A will erupt within the first two hours, a 50% chance that B will erupt after 2 hours, and a 66.7% (or 2/3 probability) that geyser C will erupt after 2 hours.  Thus, the probability of this is 1(1/2)(2/3)=1/3.

Now, summing the cases that A goes first, we get (1/18)+(1/6)+(1/12)+(1/3)=23/36.  Summing the cases that B goes first, we get (1/18)+(1/6)=8/36.  Summing the cases that C goes first, we get (1/18)+(1/12)=5/36.

And you thought you understood probability.

## Saturday, December 12, 2015

### Smartphone Droppings

On December 11, 2015, FiveThirtyEight began The Riddler!  This is going to be fun.

This week's riddle is quoted below:
You work for a tech firm developing the newest smartphone that supposedly can survive falls from great heights. Your firm wants to advertise the maximum height from which the phone can be dropped without breaking.
You are given two of the smartphones and access to a 100-story tower from which you can drop either phone from whatever story you want. If it doesn’t break when it falls, you can retrieve it and use it for future drops. But if it breaks, you don’t get a replacement phone.
Using the two phones, what is the minimum number of drops you need to ensure that you can determine exactly the highest story from which a dropped phone does not break? (Assume you know that it breaks when dropped from the very top.) What if, instead, the tower were 1,000 stories high?
Think on it a little bit if you want to solve it yourself, but the rest of this post is devoted to explaining the evolution of a solution.

The wording of the problem is a little confusing, but I think I get the jist.  There are two adjacent floors of this tower the lower of which, when the smartphone is dropped it will not break, the higher of which, the smartphone will break.  Our job is to find these floors so that we can report the highest story from which a dropped phone does not break.

Obviously, we could get lucky and guess the correct floor right away, drop a phone that doesn't break, walk up a flight and drop a phone that does.  The minimum number of drops would then be 2. But that isn't what the riddle is really asking.

What it is really asking is that if we were the most unlucky as unlucky can be, and we had an optimal way of dropping smartphones out of building, what is the maximum number of drops that will take place before we get the answer?  That answer I believe is 18.  Here's my train of thought.

A 100 story building has a nice number of stories, since it is a perfect square (10 * 10 = 100).

Drop the first smartphone from the 10th story.  If it breaks, you have a potential nine droppings to go from stories 1-9 (technically, we could skip story 1 since the assumption is that there exists a floor from which the phone doesn't break) with your only remaining smartphone for 10 total droppings before finding your answer.

If it doesn't break, make your 2nd drop from the 20th story.  If it breaks, you have a potential nine more droppings to go with your only remaining smartphone for 11 total droppings before finding the answer.

This process continues.  If you are unlucky, and the answer is the 88th or 89th floor, then you will need 18 droppings.  This is because your 9th drop will be from the 90th story, on which the first smartphone breaks.  Your 10th through 17th drop will be from stories 81-88, none of which will result in a break.  On your 18th drop, if the second phone breaks, the answer is the 88th story.  If it doesn't break, the answer is the 89th story.

You may wonder what if the story is the 98th or 99th floor.  It turns out, because of the assumption that a smartphone will break on the 100th floor, it will take more droppings if it is the 97th or 98th floor.

Remember that your first phone hasn't broken yet after the 9th dropping from the 90th floor.  So, we can enter another optimal subroutine of dropping from the 93rd, 96th, and 99th until a break, and then going though two more droppings with the last remaining smartphone until we find our answer.  Thus, if it is the 97th or 98th floor, the first phone breaks on the 99th floor, which would be your 12th dropping, then you will need 2 more droppings for only 14.

There may be a more optimal solution, so please chime in if I'm wrong.

Using the same idea for 1000 stories, I get a potential 62 total droppings (if we are unlucky and it happens to be on floor 990 or 991).
____________________________________________________________________________

And.... I was wrong.  Instead of rewriting the entry above, let's learn from my mistake and correct it.  Using the idea of doing a subroutine if you get to 90 without having it break, if we instead enter a subroutine each time, we can end with fewer droppings.

To illustrate, here are the first 15 potential droppings: Story 10, 20, 29, 38, 46, 54, 61, 68, 74, 80, 85, 89, 93, 96, 98.  Using this routine, you will only need 16 droppings if the solution is one of floors 92, or 94-99.

I built a program in R to accomplish the task for any number of floors.
_________________________________________________________________________________

This third portion has been added after the first comment below which reveals the correct answer. However, I was lucky enough to think of this solution before reading his comment.  As I was drinking coffee on the morning after giving the solution above, I began thinking that now that I know that 16 drops is all that it takes, why not start at the 16th floor, or better yet, the 15th or 14th to try and get a smaller solution.

By beginning on the 14th floor, you will have at most 14 drops if the phone breaks on the 14th.  With a second drop 13 floors higher on floor 27, you will again have at most 14 drops if the phone breaks on the 27th.  Continue this process and you will find that you need only 14 drops for even a 105 story building.

Since summing 1 through 44 gives 990 and summing 1 through 45 gives 1035, it appears that you need 45 drops for a 1000 story building.

Now, for the generalization.  What if you were given x smartphones, where x ranges from 1 to the number of floors?