Using Data Science to Improve Super Bowl Pool

Apart from the game, half-time show and the advertisements, there is one more thing associated with Super Bowl – Super Bowl Pool. However, there is a known problem with the traditional Super Bowl Pool – the unequal distribution of probabilities of win across different squares. Certain squares in the pool are much liklier to win whereas some others are least likely to win. There are news articles like this that try to help people make right selection when the numbers on the rows and columns are already assigned at the time of picking the squares. But when that is not the case, many players end up saying “My box sucks!”. This is buyers remorse and it ends up minimizing the interest in participation as certain squares are perceived to be useless even before the game starts.

The popularity of the super bowl pool and its shortcoming got me thinking about possible modifications. After some pondering, I came up with an idea – what if we pair the most likely number with least likely number in a single row/column. This would lead to a 5×5 pool with less disparity of win probabilities across different squares.

The next challenge was to get the frequency of the digits between 0-9 for past Super Bowls to come-up with the pairing.

While looking for the digit frequencies on the web, I came across an article in dataists that not only discussed the frequency disparity but also listed functions in R to grab the score data from Wikipedia using YQL. Borrowing the concept from there, I wrote a process in PHP to grab the frequencies for each digit using YQL.

Digit Frequency Probability
0 101 27.45
1 21 5.71
2 9 2.45
3 56 15.22
4 38 10.33
5 10 2.72
6 30 8.15
7 77 20.92
8 9 2.45
9 17 4.62


With that information in hand, here is the number pairing to ensure better probability distribution.

Digit Pair Combined Frequency Probability
0, 2 110 29.89
7, 8 86 23.37
3, 5 66 17.93
4, 9 55 14.95
6, 1 51 13.86


As you can see, the probabilities in the modified pool is much more uniform compared to the traditional pool.

Below is a sample modified super bowl pool initialized with random pairs on rows and columns. In practice, the pool squares are ‘taken’ by players before the numbers are assigned.