{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Contingency Tables\n", "\n", "So far, we have introduced basic distributions for discrete random variables — our first _models_! But a model of a single discrete random variable isn't all that interesting... Contingency tables allow us to model and reason about the joint distribution of _two_ categorical random variables. Two might not sound like a lot, but it turns out plenty of important questions boil down to understanding the relationship between two variables.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Demo: College Football Playoffs\n", "Let's start with a little demo. The College Football Playoffs are underway, and the Super Bowl is coming up in a few weeks! If you go to a watch party, you might like to play the following game with your friends. Before the football game starts, create a 10x10 board with the rows and columns numbered 0 through 9. Each cell represents a possible final score of the home and away team, mod 10. You and your friends select cells in round robin order until all 100 cells are taken. Whoever has the cell with the final score (mod 10) wins! \n", "\n", "Let's play together, using the upcoming Cotton Bowl between Ohio State and Texas as our example. Fill out [this poll](https://docs.google.com/forms/d/e/1FAIpQLSec3E2Wi8h5YyxVtnxrbgDH2nnN7XRt5MzJB-xtzUHhvZBChg/viewform?usp=sharing) to enter your guess." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The final scores (mod 10) are discrete random variables! \n", "\n", "Let's look at some data from this season and see if we can make an informed prediction." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/scott/anaconda3/lib/python3.10/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "# Setup\n", "import torch \n", "from torch.distributions import Chi2, Multinomial\n", "import matplotlib.pyplot as plt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Poll Results" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Timestamp | \n", "Name | \n", "Net ID | \n", "Are you comfortable programming in Python? | \n", "Have you used PyTorch before? | \n", "How many points (mod 10) do you think the Ohio State Buckeyes will score in the Cotton Bowl Friday Night? | \n", "How many points (mod 10) do you think the Texas Longhorns will score in the Cotton Bowl Friday Night? | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1/4/2025 13:40:46 | \n", "Scott Linderman | \n", "swl1 | \n", "Yes | \n", "Yes | \n", "4 | \n", "7 | \n", "
\n", " | Id | \n", "Season | \n", "Week | \n", "Season Type | \n", "Start Date | \n", "Start Time Tbd | \n", "Completed | \n", "Neutral Site | \n", "Conference Game | \n", "Attendance | \n", "... | \n", "Away Conference | \n", "Away Division | \n", "Away Points | \n", "Away Line Scores | \n", "Away Post Win Prob | \n", "Away Pregame Elo | \n", "Away Postgame Elo | \n", "Excitement Index | \n", "Highlights | \n", "Notes | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "401550883 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T17:00:00.000Z | \n", "False | \n", "True | \n", "False | \n", "False | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
1 | \n", "401525434 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T18:30:00.000Z | \n", "False | \n", "True | \n", "True | \n", "False | \n", "49000.0 | \n", "... | \n", "American Athletic | \n", "fbs | \n", "3.0 | \n", "NaN | \n", "0.001042 | \n", "1471.0 | \n", "1385.0 | \n", "1.346908 | \n", "NaN | \n", "NaN | \n", "
2 | \n", "401540199 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T19:30:00.000Z | \n", "False | \n", "True | \n", "True | \n", "False | \n", "NaN | \n", "... | \n", "UAC | \n", "fcs | \n", "7.0 | \n", "NaN | \n", "0.025849 | \n", "NaN | \n", "NaN | \n", "6.896909 | \n", "NaN | \n", "NaN | \n", "
3 | \n", "401520145 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T21:30:00.000Z | \n", "False | \n", "True | \n", "False | \n", "True | \n", "17982.0 | \n", "... | \n", "Conference USA | \n", "fbs | \n", "14.0 | \n", "NaN | \n", "0.591999 | \n", "1369.0 | \n", "1370.0 | \n", "6.821333 | \n", "NaN | \n", "NaN | \n", "
4 | \n", "401525450 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T23:00:00.000Z | \n", "False | \n", "True | \n", "False | \n", "False | \n", "15356.0 | \n", "... | \n", "FBS Independents | \n", "fbs | \n", "41.0 | \n", "NaN | \n", "0.760751 | \n", "1074.0 | \n", "1122.0 | \n", "5.311493 | \n", "NaN | \n", "NaN | \n", "
5 | \n", "401532392 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T23:00:00.000Z | \n", "False | \n", "True | \n", "False | \n", "False | \n", "23867.0 | \n", "... | \n", "Mid-American | \n", "fbs | \n", "13.0 | \n", "NaN | \n", "0.045531 | \n", "1482.0 | \n", "1473.0 | \n", "6.547378 | \n", "NaN | \n", "NaN | \n", "
6 | \n", "401540628 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T23:00:00.000Z | \n", "False | \n", "True | \n", "False | \n", "False | \n", "NaN | \n", "... | \n", "Patriot | \n", "fcs | \n", "13.0 | \n", "NaN | \n", "0.077483 | \n", "NaN | \n", "NaN | \n", "5.608758 | \n", "NaN | \n", "NaN | \n", "
7 | \n", "401520147 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T23:30:00.000Z | \n", "False | \n", "True | \n", "False | \n", "False | \n", "21407.0 | \n", "... | \n", "Mountain West | \n", "fbs | \n", "28.0 | \n", "NaN | \n", "0.819154 | \n", "1246.0 | \n", "1241.0 | \n", "5.282033 | \n", "NaN | \n", "NaN | \n", "
8 | \n", "401539999 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-26T23:30:00.000Z | \n", "False | \n", "True | \n", "True | \n", "False | \n", "NaN | \n", "... | \n", "MEAC | \n", "fcs | \n", "7.0 | \n", "NaN | \n", "0.001097 | \n", "NaN | \n", "NaN | \n", "3.122344 | \n", "NaN | \n", "NaN | \n", "
9 | \n", "401523986 | \n", "2023 | \n", "1 | \n", "regular | \n", "2023-08-27T00:00:00.000Z | \n", "False | \n", "True | \n", "False | \n", "False | \n", "63411.0 | \n", "... | \n", "Mountain West | \n", "fbs | \n", "28.0 | \n", "NaN | \n", "0.001769 | \n", "1462.0 | \n", "1412.0 | \n", "1.698730 | \n", "NaN | \n", "NaN | \n", "
10 rows × 33 columns
\n", "