{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "P_JIQE4w9xbB"
   },
   "source": [
    "# HW1: Logistic Regression\n",
    "\n",
    "This class is about models and algorithms for discrete data. This homework will have all 3 ingredients:\n",
    "* **Data**: the results from all college football games in the 2023 season\n",
    "* **Model**: The *Bradely-Terry* model for predicting the winners of football game. The Bradley-Terry model is just logistic regression.\n",
    "* **Algorithm**: We will implement two ways of fitting logistic regression: gradient descent and Newton's method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oi2v2m5yCJE9"
   },
   "source": [
    "## The Bradley-Terry Model\n",
    "\n",
    "In the Bradley-Terry Model, we give team $k$ a team-effect $\\beta_k$. Basically, higher $\\beta_k$ (relatively speaking), means that team $k$ is a better team.\n",
    "The Bradley-Terry model formalizes this intution by modeling the log odds of team $k$ beating team $k'$ by the difference in their team effects, $\\beta_k - \\beta_{k'}$.\n",
    "\n",
    "Let $i = 1,\\ldots, n$ index games, and let $h(i) \\in \\{1,\\ldots,K\\}$ and $a(i) \\in \\{1,\\ldots,K\\}$ denote the indices of the home and away teams, respectively.\n",
    "Let $Y_i \\in \\{0,1\\}$ denote whether the home team won.\n",
    "Under the Bradley-Terry model,\n",
    "\\begin{equation*}\n",
    "  Y_i \\sim \\mathrm{Bern}\\big(\\sigma(\\beta_{h(i)} - \\beta_{a(i)}) \\big),\n",
    "\\end{equation*}\n",
    "where $\\sigma(\\cdot)$ is the sigmoid function. We can view this model as a logistic regression model with covariates $x_i \\in \\mathbb{R}^K$ where,\n",
    "\\begin{align*}\n",
    "x_{i,k} &=\n",
    "\\begin{cases}\n",
    "+1 &\\text{if } h(i) = k \\\\\n",
    "-1 &\\text{if } a(i) = k \\\\\n",
    "0 &\\text{o.w.},\n",
    "\\end{cases}\n",
    "\\end{align*}\n",
    "and parameters $\\beta \\in \\mathbb{R}^K$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "toIIF0ej-a7I"
   },
   "source": [
    "## Data\n",
    "\n",
    "We use the results of college football games in the fall 2023 season, which are available from the course github page and loaded for you below.\n",
    "\n",
    "The data comes as a list of the outcomes of individual games. You'll need to wrangle the data to get it into a format that you can feed into the Bradley-Terry model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qvTw_232nr-v"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from tqdm import tqdm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 638
    },
    "id": "WIYCdEBqnvJG",
    "outputId": "00e407b9-75af-46de-be25-bec38f06f02d"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "  <div id=\"df-3ad19eb9-5130-4bf5-9d47-42f55f23f130\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Id</th>\n",
       "      <th>Season</th>\n",
       "      <th>Week</th>\n",
       "      <th>Season Type</th>\n",
       "      <th>Start Date</th>\n",
       "      <th>Start Time Tbd</th>\n",
       "      <th>Completed</th>\n",
       "      <th>Neutral Site</th>\n",
       "      <th>Conference Game</th>\n",
       "      <th>Attendance</th>\n",
       "      <th>...</th>\n",
       "      <th>Away Conference</th>\n",
       "      <th>Away Division</th>\n",
       "      <th>Away Points</th>\n",
       "      <th>Away Line Scores</th>\n",
       "      <th>Away Post Win Prob</th>\n",
       "      <th>Away Pregame Elo</th>\n",
       "      <th>Away Postgame Elo</th>\n",
       "      <th>Excitement Index</th>\n",
       "      <th>Highlights</th>\n",
       "      <th>Notes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>401550883</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T17:00:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>401525434</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T18:30:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>49000.0</td>\n",
       "      <td>...</td>\n",
       "      <td>American Athletic</td>\n",
       "      <td>fbs</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.001042</td>\n",
       "      <td>1471.0</td>\n",
       "      <td>1385.0</td>\n",
       "      <td>1.346908</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>401540199</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T19:30:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>UAC</td>\n",
       "      <td>fcs</td>\n",
       "      <td>7.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.025849</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.896909</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>401520145</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T21:30:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>17982.0</td>\n",
       "      <td>...</td>\n",
       "      <td>Conference USA</td>\n",
       "      <td>fbs</td>\n",
       "      <td>14.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.591999</td>\n",
       "      <td>1369.0</td>\n",
       "      <td>1370.0</td>\n",
       "      <td>6.821333</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>401525450</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T23:00:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>15356.0</td>\n",
       "      <td>...</td>\n",
       "      <td>FBS Independents</td>\n",
       "      <td>fbs</td>\n",
       "      <td>41.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.760751</td>\n",
       "      <td>1074.0</td>\n",
       "      <td>1122.0</td>\n",
       "      <td>5.311493</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>401532392</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T23:00:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>23867.0</td>\n",
       "      <td>...</td>\n",
       "      <td>Mid-American</td>\n",
       "      <td>fbs</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.045531</td>\n",
       "      <td>1482.0</td>\n",
       "      <td>1473.0</td>\n",
       "      <td>6.547378</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>401540628</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T23:00:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>Patriot</td>\n",
       "      <td>fcs</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.077483</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.608758</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>401520147</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T23:30:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>21407.0</td>\n",
       "      <td>...</td>\n",
       "      <td>Mountain West</td>\n",
       "      <td>fbs</td>\n",
       "      <td>28.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.819154</td>\n",
       "      <td>1246.0</td>\n",
       "      <td>1241.0</td>\n",
       "      <td>5.282033</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>401539999</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-26T23:30:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>MEAC</td>\n",
       "      <td>fcs</td>\n",
       "      <td>7.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.001097</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.122344</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>401523986</td>\n",
       "      <td>2023</td>\n",
       "      <td>1</td>\n",
       "      <td>regular</td>\n",
       "      <td>2023-08-27T00:00:00.000Z</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>63411.0</td>\n",
       "      <td>...</td>\n",
       "      <td>Mountain West</td>\n",
       "      <td>fbs</td>\n",
       "      <td>28.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.001769</td>\n",
       "      <td>1462.0</td>\n",
       "      <td>1412.0</td>\n",
       "      <td>1.698730</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 33 columns</p>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3ad19eb9-5130-4bf5-9d47-42f55f23f130')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-3ad19eb9-5130-4bf5-9d47-42f55f23f130 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-3ad19eb9-5130-4bf5-9d47-42f55f23f130');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-9a7b4731-2e5e-4a2b-a602-a0c365e5453b\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-9a7b4731-2e5e-4a2b-a602-a0c365e5453b')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-9a7b4731-2e5e-4a2b-a602-a0c365e5453b button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "          Id  Season  Week Season Type                Start Date  \\\n",
       "0  401550883    2023     1     regular  2023-08-26T17:00:00.000Z   \n",
       "1  401525434    2023     1     regular  2023-08-26T18:30:00.000Z   \n",
       "2  401540199    2023     1     regular  2023-08-26T19:30:00.000Z   \n",
       "3  401520145    2023     1     regular  2023-08-26T21:30:00.000Z   \n",
       "4  401525450    2023     1     regular  2023-08-26T23:00:00.000Z   \n",
       "5  401532392    2023     1     regular  2023-08-26T23:00:00.000Z   \n",
       "6  401540628    2023     1     regular  2023-08-26T23:00:00.000Z   \n",
       "7  401520147    2023     1     regular  2023-08-26T23:30:00.000Z   \n",
       "8  401539999    2023     1     regular  2023-08-26T23:30:00.000Z   \n",
       "9  401523986    2023     1     regular  2023-08-27T00:00:00.000Z   \n",
       "\n",
       "   Start Time Tbd  Completed  Neutral Site  Conference Game  Attendance  ...  \\\n",
       "0           False       True         False            False         NaN  ...   \n",
       "1           False       True          True            False     49000.0  ...   \n",
       "2           False       True          True            False         NaN  ...   \n",
       "3           False       True         False             True     17982.0  ...   \n",
       "4           False       True         False            False     15356.0  ...   \n",
       "5           False       True         False            False     23867.0  ...   \n",
       "6           False       True         False            False         NaN  ...   \n",
       "7           False       True         False            False     21407.0  ...   \n",
       "8           False       True          True            False         NaN  ...   \n",
       "9           False       True         False            False     63411.0  ...   \n",
       "\n",
       "     Away Conference Away Division  Away Points Away Line Scores  \\\n",
       "0                NaN           NaN          NaN              NaN   \n",
       "1  American Athletic           fbs          3.0              NaN   \n",
       "2                UAC           fcs          7.0              NaN   \n",
       "3     Conference USA           fbs         14.0              NaN   \n",
       "4   FBS Independents           fbs         41.0              NaN   \n",
       "5       Mid-American           fbs         13.0              NaN   \n",
       "6            Patriot           fcs         13.0              NaN   \n",
       "7      Mountain West           fbs         28.0              NaN   \n",
       "8               MEAC           fcs          7.0              NaN   \n",
       "9      Mountain West           fbs         28.0              NaN   \n",
       "\n",
       "  Away Post Win Prob Away Pregame Elo  Away Postgame Elo  Excitement Index  \\\n",
       "0                NaN              NaN                NaN               NaN   \n",
       "1           0.001042           1471.0             1385.0          1.346908   \n",
       "2           0.025849              NaN                NaN          6.896909   \n",
       "3           0.591999           1369.0             1370.0          6.821333   \n",
       "4           0.760751           1074.0             1122.0          5.311493   \n",
       "5           0.045531           1482.0             1473.0          6.547378   \n",
       "6           0.077483              NaN                NaN          5.608758   \n",
       "7           0.819154           1246.0             1241.0          5.282033   \n",
       "8           0.001097              NaN                NaN          3.122344   \n",
       "9           0.001769           1462.0             1412.0          1.698730   \n",
       "\n",
       "   Highlights  Notes  \n",
       "0         NaN    NaN  \n",
       "1         NaN    NaN  \n",
       "2         NaN    NaN  \n",
       "3         NaN    NaN  \n",
       "4         NaN    NaN  \n",
       "5         NaN    NaN  \n",
       "6         NaN    NaN  \n",
       "7         NaN    NaN  \n",
       "8         NaN    NaN  \n",
       "9         NaN    NaN  \n",
       "\n",
       "[10 rows x 33 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "allgames = pd.read_csv(\"https://raw.githubusercontent.com/slinderman/stats305b/winter2024/data/01_allgames.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 0: Preprocessing\n",
    "\n",
    "Preprocess the data to drop games with nan scores, construct the covariate matrix $X$, construct the response vector $y$, and do any other preprocessing you find useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZjUJOkAWHWD0"
   },
   "source": [
    "## Problem 1: Loss function\n",
    "\n",
    "Write a function to compute the loss, $L(\\beta)$ defined be\n",
    "\n",
    "\\begin{equation*}\n",
    "  L(\\beta) = -\\frac{1}{n} \\sum_{i=1}^n \\log p(y_i \\mid x_i; \\beta) + \\frac{\\gamma}{2} \\| \\beta \\|_2^2\n",
    "\\end{equation*}\n",
    "where $\\gamma$ is a hyperparameter that controls the strength of your $\\ell_2$ regularization.\n",
    "\n",
    "You may want to use the `torch.distributions.Bernoulli` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "WTaCXlvSHuxh"
   },
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8Cx0wyYytSb7"
   },
   "source": [
    "## Problem 2: Gradient Descent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xuNBMXGsO-7q"
   },
   "source": [
    "### Problem 2.1 Implementing and checking your gradients\n",
    "\n",
    "\n",
    "Write a function to compute the gradient of the average negative log likelihood and check your output against the results obtained by PyTorch's automatic differentiation functionality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ROj5lRuOsASh"
   },
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Cl9CAUpTPtpw"
   },
   "source": [
    "### Problem 2.2: Implement Gradient Descent\n",
    "\n",
    "\n",
    "Now, use gradient descent to fit your Bradley-Terry model to the provided data.\n",
    "\n",
    "Deliverables for this question:\n",
    "1. Code the implements gradient descent to fit your Bradley-Terry model to the provided data.\n",
    "2. A plot of the loss curve of your algorithm and a brief discussion if it makes sense or not\n",
    "3. A plot of the histogram of the fitted values of $\\beta$\n",
    "4. The top 10 teams from your ranking, and a discussion of whether this ranking makes sense or not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kPSNWKE8sKIH"
   },
   "outputs": [],
   "source": [
    "# your code here (you can use multiple code and markdown cells to organize your answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lBPDg-5QtXQV"
   },
   "source": [
    "## Problem 3: Newton's Method\n",
    "\n",
    "Now, use Newton's method to fit your Bradley-Terry model to the provided data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Pi_R1fgkFbQ0"
   },
   "source": [
    "### Problem 3.1 The Hessian\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RS0kTKtVLDlQ"
   },
   "source": [
    "#### Problem 3.1.1. Implement and check the Hessian\n",
    "Write a function to compute the Hessian of the average negative log likelihood and check your answer against the output of `from torch.autograd.functional.hessian`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TtSlxUAkLE-y"
   },
   "source": [
    "#### Problem 3.1.2: Positive definiteness\n",
    "\n",
    "Compute the Hessian at the point $\\beta = 0$ without regularization (set $\\gamma = 0$). Unless you've done sort of pre-processing, it's probably singular."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "6KQjQZtfsUZ6"
   },
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GKVxT91XLbSL"
   },
   "source": [
    "#### Problem 3.1.3\n",
    "\n",
    "Describe intuitively and mathematically what it means for the Hessian of the negative log likelihood to be singular in the context of this data and model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yAsLFSGXsWXO"
   },
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TvClzEjJLk52"
   },
   "source": [
    "#### Problem 3.1.4\n",
    "\n",
    "Give a hypothesis for why the Hessian in this dataset and model is singular, and provide empirical evidence to support your hypothesis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dFphHjnxsjE2"
   },
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-wxWOBOQslRc"
   },
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EtKlPKs9LyNw"
   },
   "source": [
    "#### Problem 3.1.5\n",
    "\n",
    "Explain why the Hessian is invertible when $\\gamma > 0$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CgvigoXaspaw"
   },
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "szaThYwMMuf4"
   },
   "source": [
    "### Problem 3.2: Implement Newton's method\n",
    "\n",
    "Now, use Newton's method to fit your $\\ell_2$-regularized Bradley-Terry model to the provided data.\n",
    "\n",
    "Deliverables for this question:\n",
    "1. Code the implements Newton's method to fit your Bradley-Terry model to the provided data.\n",
    "2. A plot of the loss curves from Newton's method and from gradient descent, using the same regularization strength $\\gamma$ and initialization $\\beta_0$. Briefly discuss the results and compare their rates of convergence.\n",
    "3. A plot of the histogram of the fitted values of $\\beta$\n",
    "4. The top 10 teams from your ranking, and a discussion of whether this ranking makes sense or not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "FPYUCllcsri7"
   },
   "outputs": [],
   "source": [
    "# your code here (you can use multiple code and markdown cells to organize your answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "J9R91iI5NCMs"
   },
   "source": [
    "## Problem 4: Model criticism and revision\n",
    "\n",
    "Let's take another look the Bradley-Terry model from earlier and think about improvements we can make.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yPSnL3odcj12"
   },
   "source": [
    "### Problem 4.1: Improvements to Bradley-Terry Model\n",
    "Choose one way to improve the Bradley-Terry model. Discuss *a priori* why you think this change will improve the model and implement your change."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gngpLxYpczp0"
   },
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Xt9Yn0NPc3nS"
   },
   "source": [
    "### Problem 4.2: Evaluation\n",
    "Assess whether or not your change was an improvement or not. Provide empirical evidence by evaluating performance on a held out test set and include at least one plot supporting your assessment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yQvtv-eHdBM5"
   },
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "87F609vpdEq0"
   },
   "source": [
    "### Problem 4.3: Reflection\n",
    "Reflecting on the analysis we've conducted in this assignemnt, which conference is best? Is there a significant difference? Please justify your answer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-4YnbZWVdmWv"
   },
   "source": [
    "*your answer here*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9TL_LAYoyI2T"
   },
   "source": [
    "## Submission Instructions\n",
    "\n",
    "**Formatting:** check that your code does not exceed 80 characters in line width. You can set _Tools &rarr; Settings &rarr; Editor &rarr; Vertical ruler column_ to 80 to see when you've exceeded the limit.\n",
    "\n",
    "**Converting to PDF** The simplest way to convert to PDF is to use the \"Print to PDF\" option in your browser. Just make sure that your code and plots aren't cut off, as it may not wrap lines.\n",
    "\n",
    "**Alternatively** You can download your notebook in .ipynb format and use the following commands to convert it to PDF.  Then run the following command to convert to a PDF:\n",
    "```\n",
    "jupyter nbconvert --to pdf <yourlastname>_hw<number>.ipynb\n",
    "```\n",
    "(Note that for the above code to work, you need to rename your file `<yourlastname>_hw<number>.ipynb`)\n",
    "\n",
    "**Installing nbconvert:**\n",
    "\n",
    "If you're using Anaconda for package management,\n",
    "```\n",
    "conda install -c anaconda nbconvert\n",
    "```\n",
    "\n",
    "**Upload** your .pdf file to Gradescope. Please tag your questions correctly! I.e., for each question, all of and only the relevant sections are tagged.\n",
    "\n",
    "Please post on Ed or come to OH if there are any other problems submitting the HW."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "A100",
   "machine_shape": "hm",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}