The 538 Riddler question this week isn’t especially interesting. What are the chances that somebody’s phone number has the exact same seven digits as your phone number, just in a different order? But as I was driving home thinking about it, I realized that there were several different ways to go about this and I wondered which was faster.
Since this problem isn’t terribly interesting, lets just get the solution out of the way first. Below is the code and when run 100 million times, it shows that about 0.01685% of random phone numbers will match. Dang, that’s small! (The mathematical answer is 0.01676 for what it’s worth. Link here.)
import random import time # Keep track of the number of matches matches = 0 # Start the clock t0 = time.clock() # Do 1000000 trials for i in range(1000000): # Get a random seven digit list for x num = random.randint(1,9999999) strnum = format(num, '07') x = [int(d) for d in strnum] # Get a random seven digit list for y num = random.randint(1,9999999) strnum = format(num, '07') y = [int(d) for d in strnum] # Sort the lists x.sort() y.sort() # If the sorted lists are equal, then they match if x == y: matches+=1 # Stop the clock t1 = time.clock() # Print the data print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds"
The first thing to consider was how to generate the seven digit phone number. The riddle allows all phone numbers from 000-0000 to 999-9999 and ignores the area code because it assume the two people are from the same town.
One method would be to just generate a number from 0 to 9999999, but it would require a little bit of work to convert from a single integer to a list of integers for each digit. This isn’t strictly necessary, since there are ways to pull out the individual numbers to work with, but damn, making it a list is so much easier. This method would also need a way to “pad” the numbers to make sure they have 7 digits, even if the first ones are zero. The function GetNumber() below does this. It first generates a random number from 0 to 9999999 inclusive. Then, it converts to it a string and pads with leading zeros to get to seven digits using the format function. And finally, it converts that string to a list of int digits and returns it.
def GetNumber(): num = random.randint(1,9999999) strnum = format(num, '07') listnum = [int(d) for d in strnum] return listnum
Another method would be to create an empty list, then generate a random digit and add it to the list nine separate times. This has the benefit of not having to worry about converting from an int to a string to a list and not having to worry about padding zeros. And it’s also super straightforward and could be understood (read: maintained) by anybody. This is shown in the function GetNumber2() below.
def GetNumber2(): listnum = [] listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) return listnum
Unfortunately, while GetNumber2() is more straightforward, it is also about 50% slower. If this was something that was only called once or twice, I’d let it be since I like how clear it is. But it’s going to get called twice in every loop so it would be a big waste of time using it rather than GetNumber(). Another interesting thing to note, when I run the program using the GetNumber() function, it runs a few percent faster than if I put that contents of that function in the main function. This is the exact opposite as I expected, since it has the overhead of a function call and return, but I tried many times and it was always the same. If you have any idea what kind of optimizations Python is doing that might cause that, please share them in the comments below.
Now that we know how we’ll be generating numbers, lets get to the fun part. I’ve come up with four methods for calculating the odds and made a function for each of them. Each function has the same basic surrounding code of starting a timer, running through a loop many times, then stopping the timer and printing out info. It’s what’s in those loops that is interesting.
The first method, shown below, gets two random seven digit lists, then sorts them and compares the lists. If they are equal, then that means they have the same number of each digit and they are a match. I initially expected this one to be pretty slow since it’s using two sort functions each iteration. But, as it turns out, the sorts are pretty optimized and since each phone number is only 7 digits long, it’s not too bad.
# This method gets two random numbers, then sorts each list # and then compares those lists to see if they're the same. # This is both the simplest and fastest method. def Method1(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numTrials' number of trials for i in range(numTrials): # Get two random seven digit lists then sort x=GetNumber() y=GetNumber() x.sort() y.sort() # If the sorted lists are equal, then they match if x == y: matches+=1 # Stop the clock t1 = time.clock() # Print the data print "Method 1:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds"
The second method, shown below, gets two random seven digit lists just like the first method did. Then, it creates a list to keep track of how often the number 0-9 appear in the phone numbers. It uses a for loop to iterate through 0-9 and counts the number of times each occurs in the two phone numbers. Afterward, it compares the count lists and if they are equal there is a match. I did initially expect this one to be about the same speed as the first method, but it turns out to be about 40-50% slower. This is probably due to iterating through the list 10 times and having to append to a list, while the first used a sort that never had to iterate through the list that many times.
# This method gets two random phone numbers, then counts the number of # times each number occurs for each phone number and compares those lists. # This is the slowest method, though is relatively straightforward def Method2(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numTrials' number of trials for i in range(numTrials): # Get two random numbers x=GetNumber() y=GetNumber() # Create a list to track each number digitListX = [] digitListY = [] # Iterate from 0 to 9, counting the amount each number # occurs in the phone numbers and appending to their lists for j in range(10): digitListX.append(x.count(j)) digitListY.append(y.count(j)) # If the lists are the same, they are a match if digitListX == digitListY: matches+=1 # Stop the clock t1 = time.clock() # Print info print "Method 2:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds"
The third method is similar to the second, except that instead of adding these counts to a list and comparing after they are all done, it compares each number as it counts it. If they don’t match on one, it sets a flag that it was a failure and breaks out of the checking for loop. This saves a lot of time since most of the phone numbers fail and the earlier you can catch that, the faster and more efficient the code will be. This winds up being about the same speed as method 1, though slightly harder to read.
# This method is similar to Method 2, but instead of waiting until # it is done counting all nubmers to compare, it compares as it goes, # which makes it much faster, about the same as method 1. def Method3(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numtrials' number of trials for i in range(numTrials): # Get two random seven digit numbers in list form x=GetNumber() y=GetNumber() # Creates a flag for matches. 0 is no match flag=1 # Iterates through numbers 0-9 checking to see if counts match. # If counts don't match, the flag is changed to denote no match # and the loop is borken to save time. for j in range(10): if x.count(j) != y.count(j): flag = 0 break # If there was a match, increase match count if flag==1: matches+=1 # Stop clock t1 = time.clock() # Print info print "Method 3:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds"
The fourth method is almost exactly the same as the third but with a slight improvement. Instead of counting in the phone numbers using 0-9, it uses the actual numbers of the first phone number for the counting. So if the number was 867-5309, it would count the 8s then the 6s then the 7s, then the 5s, then the 3s, then the 0s, then the 9s. This has an advantage of only having to iterate through the loop 7 times instead of 10, which makes it faster than method 3. It has the disadvantage of sometimes counting the same numbers multiple times, for instance in the phone number 123-4321, but that’s a pretty small penalty and checking for it takes about as much time as you’d save while also making the code less clear.
# This method is almost the same as Method 3 in that it doesn't wait until # it is done counting all nubmers to compare, it compares as it goes. But # it also only counts the numbers that are in the phone number so it only # has to iterate 7 times, not 10. def Method4(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numtrials' number of trials for i in range(numTrials): # Get two random seven digit numbers in list form x=GetNumber() y=GetNumber() # Creates a flag for matches. 0 is no match flag=1 # Iterates through numbers in the first phone number checking to see if counts match. # If counts don't match, the flag is changed to denote no match # and the loop is borken to save time. for j in x: if x.count(j) != y.count(j): flag = 0 break # If there was a match, increase match count if flag==1: matches+=1 # Stop clock t1 = time.clock() # Print info print "Method 4:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds"
So in the end, Method 4 was the fastest but only by about 3%. If it were me, I’d go with Method 1 just because it’s easier to read while only sacrificing a small time penalty. The full source code is below. If you have any tips or thoughts, please share them in the comments below.
Download python source code here.
import random import time # This method generates a random number, then converts # it to a string and pads with the apprioriate number of # leading zeros, then converts to a list of ints. def GetNumber(): num = random.randint(1,9999999) strnum = format(num, '07') listnum = [int(d) for d in strnum] return listnum # This method generates a seven digit number by generating # one random digit at a time. It is about 50% slower than # just generating the whole number at once. def GetNumber2(): listnum = [] listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) listnum.append(random.randint(1,9)) return listnum # This method gets two random numbers, then sorts each list # and then compares those lists to see if they're the same. # This is both the simpliest and fastest method. def Method1(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numTrials' number of trials for i in range(numTrials): # Get two random seven digit lists then sort x=GetNumber() y=GetNumber() x.sort() y.sort() # If the sorted lists are equal, then they match if x == y: matches+=1 # Stop the clock t1 = time.clock() # Print the data print "Method 1:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds" # This method gets two random phone numbers, then counts the number of # times each number occurs for each phone number and compares those lists. # This is the slowest method, though is relatively straightforward def Method2(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numTrials' number of trials for i in range(numTrials): # Get two random numbers x=GetNumber() y=GetNumber() # Create a list to track each number digitListX = [] digitListY = [] # Iterate from 0 to 9, counting the amount each number # occurs in the phone numbers and appending to their lists for j in range(10): digitListX.append(x.count(j)) digitListY.append(y.count(j)) # If the lists are the same, they are a match if digitListX == digitListY: matches+=1 # Stop the clock t1 = time.clock() # Print info print "Method 2:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds" # This method is similar to Method 2, but instead of waiting until # it is done counting all nubmers to compare, it compares as it goes, # which makes it much faster, about the same as method 1. def Method3(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numtrials' number of trials for i in range(numTrials): # Get two random seven digit numbers in list form x=GetNumber() y=GetNumber() # Creates a flag for matches. 0 is no match flag=1 # Iterates through numbers 0-9 checking to see if counts match. # If counts don't match, the flag is changed to denote no match # and the loop is borken to save time. for j in range(10): if x.count(j) != y.count(j): flag = 0 break # If there was a match, increase match count if flag==1: matches+=1 # Stop clock t1 = time.clock() # Print info print "Method 3:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds" # This method is almost the same as Method 3 in that it doesn't wait until # it is done counting all nubmers to compare, it compares as it goes. But # it also only counts the numbers that are in the phone number so it only # has to iterate 7 times, not 10. def Method4(numTrials): matches = 0 # Start the clock t0= time.clock() # Do 'numtrials' number of trials for i in range(numTrials): # Get two random seven digit numbers in list form x=GetNumber() y=GetNumber() # Creates a flag for matches. 0 is no match flag=1 # Iterates through numbers in the first phone number checking to see if counts match. # If counts don't match, the flag is changed to denote no match # and the loop is borken to save time. for j in x: if x.count(j) != y.count(j): flag = 0 break # If there was a match, increase match count if flag==1: matches+=1 # Stop clock t1 = time.clock() # Print info print "Method 4:" print str(matches) + " matches were made" print matches*100.0/numTrials print "took " + str(t1-t0) + " seconds" if __name__ == '__main__': num = 10000000 Method1(num) #Method2(num) #Method3(num) #Method4(num)