HW1: Logistic Regression#

This class is about models and algorithms for discrete data. This homework will have all 3 ingredients:

  • Data: the results from all college football games in the 2023 season

  • Model: The Bradely-Terry model for predicting the winners of football game. The Bradley-Terry model is just logistic regression.

  • Algorithm: We will implement two ways of fitting logistic regression: gradient descent and Newton’s method

The Bradley-Terry Model#

In the Bradley-Terry Model, we give team \(k\) a team-effect \(\beta_k\). Basically, higher \(\beta_k\) (relatively speaking), means that team \(k\) is a better team. The Bradley-Terry model formalizes this intution by modeling the log odds of team \(k\) beating team \(k'\) by the difference in their team effects, \(\beta_k - \beta_{k'}\).

Let \(i = 1,\ldots, n\) index games, and let \(h(i) \in \{1,\ldots,K\}\) and \(a(i) \in \{1,\ldots,K\}\) denote the indices of the home and away teams, respectively. Let \(Y_i \in \{0,1\}\) denote whether the home team won. Under the Bradley-Terry model,

\[\begin{equation*} Y_i \sim \mathrm{Bern}\big(\sigma(\beta_{h(i)} - \beta_{a(i)}) \big), \end{equation*}\]

where \(\sigma(\cdot)\) is the sigmoid function. We can view this model as a logistic regression model with covariates \(x_i \in \mathbb{R}^K\) where,

\[\begin{align*} x_{i,k} &= \begin{cases} +1 &\text{if } h(i) = k \\ -1 &\text{if } a(i) = k \\ 0 &\text{o.w.}, \end{cases} \end{align*}\]

and parameters \(\beta \in \mathbb{R}^K\).

Data#

We use the results of college football games in the fall 2023 season, which are available from the course github page and loaded for you below.

The data comes as a list of the outcomes of individual games. You’ll need to wrangle the data to get it into a format that you can feed into the Bradley-Terry model.

import torch
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
allgames = pd.read_csv("https://raw.githubusercontent.com/slinderman/stats305b/winter2025/data/01_allgames.csv")

Problem 0: Preprocessing#

Preprocess the data to drop games with nan scores, construct the covariate matrix \(X\), construct the response vector \(y\), and do any other preprocessing you find useful.

# your code here

Problem 1: Loss function#

Write a function to compute the loss, \(L(\beta)\) defined be

\[\begin{equation*} L(\beta) = -\frac{1}{n} \sum_{i=1}^n \log p(y_i \mid x_i; \beta) + \frac{\gamma}{2} \| \beta \|_2^2 \end{equation*}\]

where \(\gamma\) is a hyperparameter that controls the strength of your \(\ell_2\) regularization.

You may want to use the torch.distributions.Bernoulli class.

# your code here

Problem 2: Gradient Descent#

Problem 2.1 Implementing and checking your gradients#

Write a function to compute the gradient of the average negative log likelihood and check your output against the results obtained by PyTorch’s automatic differentiation functionality.

# your code here

Problem 2.2: Implement Gradient Descent#

Now, use gradient descent to fit your Bradley-Terry model to the provided data.

Deliverables for this question:

  1. Code the implements gradient descent to fit your Bradley-Terry model to the provided data.

  2. A plot of the loss curve of your algorithm and a brief discussion if it makes sense or not

  3. A plot of the histogram of the fitted values of \(\beta\)

  4. The top 10 teams from your ranking, and a discussion of whether this ranking makes sense or not.

# your code here (you can use multiple code and markdown cells to organize your answer)

Problem 3: Newton’s Method#

Now, use Newton’s method to fit your Bradley-Terry model to the provided data.

Problem 3.1 The Hessian#

Problem 3.1.1. Implement and check the Hessian#

Write a function to compute the Hessian of the average negative log likelihood and check your answer against the output of from torch.autograd.functional.hessian.

Problem 3.1.2: Positive definiteness#

Compute the Hessian at the point \(\beta = 0\) without regularization (set \(\gamma = 0\)). Unless you’ve done sort of pre-processing, it’s probably singular.

# your code here

Problem 3.1.3#

Describe intuitively and mathematically what it means for the Hessian of the negative log likelihood to be singular in the context of this data and model

your answer here

Problem 3.1.4#

Give a hypothesis for why the Hessian in this dataset and model is singular, and provide empirical evidence to support your hypothesis.

your answer here

# your code here

Problem 3.1.5#

Explain why the Hessian is invertible when \(\gamma > 0\).

your answer here

Problem 3.2: Implement Newton’s method#

Now, use Newton’s method to fit your \(\ell_2\)-regularized Bradley-Terry model to the provided data.

Deliverables for this question:

  1. Code the implements Newton’s method to fit your Bradley-Terry model to the provided data.

  2. A plot of the loss curves from Newton’s method and from gradient descent, using the same regularization strength \(\gamma\) and initialization \(\beta_0\). Briefly discuss the results and compare their rates of convergence.

  3. A plot of the histogram of the fitted values of \(\beta\)

  4. The top 10 teams from your ranking, and a discussion of whether this ranking makes sense or not.

# your code here (you can use multiple code and markdown cells to organize your answer)

Problem 4: Model criticism and revision#

Let’s take another look the Bradley-Terry model from earlier and think about improvements we can make.

Problem 4.1: Improvements to Bradley-Terry Model#

Choose one way to improve the Bradley-Terry model. Discuss a priori why you think this change will improve the model and implement your change.

your answer here

Problem 4.2: Evaluation#

Assess whether or not your change was an improvement or not. Provide empirical evidence by evaluating performance on a held out test set and include at least one plot supporting your assessment.

your answer here

Problem 4.3: Reflection#

Reflecting on the analysis we’ve conducted in this assignemnt, which conference is best? Is there a significant difference? Please justify your answer.

your answer here

Submission Instructions#

Formatting: check that your code does not exceed 80 characters in line width. You can set Tools → Settings → Editor → Vertical ruler column to 80 to see when you’ve exceeded the limit.

Converting to PDF The simplest way to convert to PDF is to use the “Print to PDF” option in your browser. Just make sure that your code and plots aren’t cut off, as it may not wrap lines.

Alternatively You can download your notebook in .ipynb format and use the following commands to convert it to PDF. Then run the following command to convert to a PDF:

jupyter nbconvert --to pdf <yourlastname>_hw<number>.ipynb

(Note that for the above code to work, you need to rename your file <yourlastname>_hw<number>.ipynb)

Installing nbconvert:

If you’re using Anaconda for package management,

conda install -c anaconda nbconvert

Upload your .pdf file to Gradescope. Please tag your questions correctly! I.e., for each question, all of and only the relevant sections are tagged.

Please post on Ed or come to OH if there are any other problems submitting the HW.