HW1: Logistic Regression

HW1: Logistic Regression#

This class is about models and algorithms for discrete data. This homework will have all 3 ingredients:

Data: the results from all college football games in the 2023 season
Model: The Bradely-Terry model for predicting the winners of football game. The Bradley-Terry model is just logistic regression.
Algorithm: We will implement two ways of fitting logistic regression: gradient descent and Newton’s method

The Bradley-Terry Model#

In the Bradley-Terry Model, we give team \(k\) a team-effect \(\beta_k\). Basically, higher \(\beta_k\) (relatively speaking), means that team \(k\) is a better team. The Bradley-Terry model formalizes this intution by modeling the log odds of team \(k\) beating team \(k'\) by the difference in their team effects, \(\beta_k - \beta_{k'}\).

Let \(i = 1,\ldots, n\) index games, and let \(h(i) \in \{1,\ldots,K\}\) and \(a(i) \in \{1,\ldots,K\}\) denote the indices of the home and away teams, respectively. Let \(Y_i \in \{0,1\}\) denote whether the home team won. Under the Bradley-Terry model,

\[\begin{equation*} Y_i \sim \mathrm{Bern}\big(\sigma(\beta_{h(i)} - \beta_{a(i)}) \big), \end{equation*}\]

where \(\sigma(\cdot)\) is the sigmoid function. We can view this model as a logistic regression model with covariates \(x_i \in \mathbb{R}^K\) where,

\[\begin{align*} x_{i,k} &= \begin{cases} +1 &\text{if } h(i) = k \\ -1 &\text{if } a(i) = k \\ 0 &\text{o.w.}, \end{cases} \end{align*}\]

and parameters \(\beta \in \mathbb{R}^K\).

Data#

We use the results of college football games in the fall 2023 season, which are available from the course github page and loaded for you below.

The data comes as a list of the outcomes of individual games. You’ll need to wrangle the data to get it into a format that you can feed into the Bradley-Terry model.

import torch
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm

allgames = pd.read_csv("https://raw.githubusercontent.com/slinderman/stats305b/winter2024/data/01_allgames.csv")

	Id	Season	Week	Season Type	Start Date	Start Time Tbd	Completed	Neutral Site	Conference Game	Attendance	...	Away Conference	Away Division	Away Points	Away Line Scores	Away Post Win Prob	Away Pregame Elo	Away Postgame Elo	Excitement Index	Highlights	Notes
0	401550883	2023	1	regular	2023-08-26T17:00:00.000Z	False	True	False	False	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	401525434	2023	1	regular	2023-08-26T18:30:00.000Z	False	True	True	False	49000.0	...	American Athletic	fbs	3.0	NaN	0.001042	1471.0	1385.0	1.346908	NaN	NaN
2	401540199	2023	1	regular	2023-08-26T19:30:00.000Z	False	True	True	False	NaN	...	UAC	fcs	7.0	NaN	0.025849	NaN	NaN	6.896909	NaN	NaN
3	401520145	2023	1	regular	2023-08-26T21:30:00.000Z	False	True	False	True	17982.0	...	Conference USA	fbs	14.0	NaN	0.591999	1369.0	1370.0	6.821333	NaN	NaN
4	401525450	2023	1	regular	2023-08-26T23:00:00.000Z	False	True	False	False	15356.0	...	FBS Independents	fbs	41.0	NaN	0.760751	1074.0	1122.0	5.311493	NaN	NaN
5	401532392	2023	1	regular	2023-08-26T23:00:00.000Z	False	True	False	False	23867.0	...	Mid-American	fbs	13.0	NaN	0.045531	1482.0	1473.0	6.547378	NaN	NaN
6	401540628	2023	1	regular	2023-08-26T23:00:00.000Z	False	True	False	False	NaN	...	Patriot	fcs	13.0	NaN	0.077483	NaN	NaN	5.608758	NaN	NaN
7	401520147	2023	1	regular	2023-08-26T23:30:00.000Z	False	True	False	False	21407.0	...	Mountain West	fbs	28.0	NaN	0.819154	1246.0	1241.0	5.282033	NaN	NaN
8	401539999	2023	1	regular	2023-08-26T23:30:00.000Z	False	True	True	False	NaN	...	MEAC	fcs	7.0	NaN	0.001097	NaN	NaN	3.122344	NaN	NaN
9	401523986	2023	1	regular	2023-08-27T00:00:00.000Z	False	True	False	False	63411.0	...	Mountain West	fbs	28.0	NaN	0.001769	1462.0	1412.0	1.698730	NaN	NaN

10 rows × 33 columns

Problem 0: Preprocessing#

Preprocess the data to drop games with nan scores, construct the covariate matrix \(X\), construct the response vector \(y\), and do any other preprocessing you find useful.

# your code here

Problem 1: Loss function#

Write a function to compute the loss, \(L(\beta)\) defined be

\[\begin{equation*} L(\beta) = -\frac{1}{n} \sum_{i=1}^n \log p(y_i \mid x_i; \beta) + \frac{\gamma}{2} \| \beta \|_2^2 \end{equation*}\]

where \(\gamma\) is a hyperparameter that controls the strength of your \(\ell_2\) regularization.

You may want to use the torch.distributions.Bernoulli class.

# your code here

Problem 2: Gradient Descent#

Problem 2.1 Implementing and checking your gradients#

Write a function to compute the gradient of the average negative log likelihood and check your output against the results obtained by PyTorch’s automatic differentiation functionality.

# your code here

Problem 2.2: Implement Gradient Descent#

Now, use gradient descent to fit your Bradley-Terry model to the provided data.

Deliverables for this question:

Code the implements gradient descent to fit your Bradley-Terry model to the provided data.
A plot of the loss curve of your algorithm and a brief discussion if it makes sense or not
A plot of the histogram of the fitted values of \(\beta\)
The top 10 teams from your ranking, and a discussion of whether this ranking makes sense or not.

# your code here (you can use multiple code and markdown cells to organize your answer)

Problem 3: Newton’s Method#

Now, use Newton’s method to fit your Bradley-Terry model to the provided data.

Problem 3.1 The Hessian#

Problem 3.1.1. Implement and check the Hessian#

Write a function to compute the Hessian of the average negative log likelihood and check your answer against the output of from torch.autograd.functional.hessian.

Problem 3.1.2: Positive definiteness#

Compute the Hessian at the point \(\beta = 0\) without regularization (set \(\gamma = 0\)). Unless you’ve done sort of pre-processing, it’s probably singular.

# your code here

Problem 3.1.3#

Describe intuitively and mathematically what it means for the Hessian of the negative log likelihood to be singular in the context of this data and model

your answer here

Problem 3.1.4#

Give a hypothesis for why the Hessian in this dataset and model is singular, and provide empirical evidence to support your hypothesis.

your answer here

# your code here

Problem 3.1.5#

Explain why the Hessian is invertible when \(\gamma > 0\).

your answer here

Problem 3.2: Implement Newton’s method#

Now, use Newton’s method to fit your \(\ell_2\)-regularized Bradley-Terry model to the provided data.

Deliverables for this question:

Code the implements Newton’s method to fit your Bradley-Terry model to the provided data.
A plot of the loss curves from Newton’s method and from gradient descent, using the same regularization strength \(\gamma\) and initialization \(\beta_0\). Briefly discuss the results and compare their rates of convergence.
A plot of the histogram of the fitted values of \(\beta\)
The top 10 teams from your ranking, and a discussion of whether this ranking makes sense or not.

# your code here (you can use multiple code and markdown cells to organize your answer)

Problem 4: Model criticism and revision#

Let’s take another look the Bradley-Terry model from earlier and think about improvements we can make.

Problem 4.1: Improvements to Bradley-Terry Model#

Choose one way to improve the Bradley-Terry model. Discuss a priori why you think this change will improve the model and implement your change.

your answer here

Problem 4.2: Evaluation#

Assess whether or not your change was an improvement or not. Provide empirical evidence by evaluating performance on a held out test set and include at least one plot supporting your assessment.

your answer here

Problem 4.3: Reflection#

Reflecting on the analysis we’ve conducted in this assignemnt, which conference is best? Is there a significant difference? Please justify your answer.

your answer here

Submission Instructions#

Formatting: check that your code does not exceed 80 characters in line width. You can set Tools → Settings → Editor → Vertical ruler column to 80 to see when you’ve exceeded the limit.

Converting to PDF The simplest way to convert to PDF is to use the “Print to PDF” option in your browser. Just make sure that your code and plots aren’t cut off, as it may not wrap lines.

Alternatively You can download your notebook in .ipynb format and use the following commands to convert it to PDF. Then run the following command to convert to a PDF:

jupyter nbconvert --to pdf <yourlastname>_hw<number>.ipynb

(Note that for the above code to work, you need to rename your file <yourlastname>_hw<number>.ipynb)

Installing nbconvert:

If you’re using Anaconda for package management,

conda install -c anaconda nbconvert

Upload your .pdf file to Gradescope. Please tag your questions correctly! I.e., for each question, all of and only the relevant sections are tagged.

Please post on Ed or come to OH if there are any other problems submitting the HW.

HW1: Logistic Regression

Contents

HW1: Logistic Regression#

The Bradley-Terry Model#

Data#

Problem 0: Preprocessing#

Problem 1: Loss function#

Problem 2: Gradient Descent#

Problem 2.1 Implementing and checking your gradients#

Problem 2.2: Implement Gradient Descent#

Problem 3: Newton’s Method#

Problem 3.1 The Hessian#

Problem 3.1.1. Implement and check the Hessian#

Problem 3.1.2: Positive definiteness#

Problem 3.1.3#

Problem 3.1.4#

Problem 3.1.5#

Problem 3.2: Implement Newton’s method#

Problem 4: Model criticism and revision#

Problem 4.1: Improvements to Bradley-Terry Model#

Problem 4.2: Evaluation#

Problem 4.3: Reflection#

Submission Instructions#