Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lab 1: Bayesian Linear Regression

STATS305C, Stanford University, Spring 2023

Your name:

Collaborators:

Hours spent:

Please let us know how many hours in total you spent on this assignment so we can calibrate for future assignments. Your feedback is always welcome!


# Setup
import torch
from torch.distributions import Normal, Gamma, \
    TransformedDistribution, MultivariateNormal
from torch.distributions.transforms import PowerTransform

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context("notebook")

Bayesian Linear Regression

STATS 305A was all about linear regression. In this assignment, you’ll revisit that classic model from a Bayesian perspective.

Let {xn,yn}n=1N\{\mathbf{x}_n, y_n\}_{n=1}^N denote a dataset with covariates xnRD\mathbf{x}_n \in \mathbb{R}^D and scalar outcomes ynRy_n \in \mathbb{R}. Let XRN×D\mathbf{X} \in \mathbb{R}^{N \times D} denote the design matrix where each row is a vector of covariates and yRN\mathbf{y} \in \mathbb{R}^N denote the vector of outcomes.

We will model the outcomes as conditionally independent Gaussian random variables given the covariates and the parameters,

p(yw,σ2,X)=n=1NN(ynwxn,σ2),\begin{align*} p(\mathbf{y} \mid \mathbf{w}, \sigma^2, \mathbf{X}) &= \prod_{n=1}^N \mathcal{N}(y_n \mid \mathbf{w}^\top \mathbf{x}_n, \sigma^2), \end{align*}

where wRD\mathbf{w} \in \mathbb{R}^D are the weights and σ2R+\sigma^2 \in \mathbb{R}_+ is the conditional variance.

In a Bayesian linear regression, we place a prior on the parameters,

p(w,σ2η)=χ2(σ2ν0,σ02)N(wμ0,σ2Λ01),\begin{align*} p(\mathbf{w}, \sigma^2 | \boldsymbol{\eta}) &= \chi^{-2}(\sigma^2 \mid \nu_0, \sigma_0^2) \, \mathcal{N}(\mathbf{w} \mid \boldsymbol{\mu}_0, \sigma^2 \boldsymbol{\Lambda}_0^{-1}), \end{align*}

where ν0R+\nu_0 \in \mathbb{R}_+ sets the degrees of freedom, σ02\sigma_0^2 is the prior mean of the variance parameter, μ0\boldsymbol{\mu}_0 is the prior mean of the weights, and Λ0SD\boldsymbol{\Lambda}_0 \in \mathcal{S}_D is a positive definite D×DD \times D precision matrix. We collect these hyperparameters into the vector η=(ν0,σ02,μ0,Λ0)\boldsymbol{\eta} = (\nu_0, \sigma_0^2, \boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0).

PyTorch

As in the notebooks from class, you will use PyTorch to complete the coding portions of this assignment. If you are unfamiliar with PyTorch, this webpage provides an introductory tutorial to PyTorch tensors. You should also make sure you can solve the problems in homework 0.

Problem 1: Derive the Posterior [Math]

Derive the posterior distribution p(w,σ2y,X,η)p(\mathbf{w}, \sigma^2 \mid \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}) where η=(ν0,σ02,μ0,Λ0)\boldsymbol{\eta} = (\nu_0, \sigma_0^2, \boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0). It should be of the same form as the prior,

p(w,σ2y,X,η)=χ2(σ2νN,σN2)N(wμN,σ2ΛN1)\begin{align*} p(\mathbf{w}, \sigma^2 \mid \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}) &= \chi^{-2}(\sigma^2 \mid \nu_N, \sigma_N^2) \mathcal{N}(\mathbf{w} \mid \boldsymbol{\mu}_N, \sigma^2 \boldsymbol{\Lambda}_N^{-1}) \end{align*}

for some νN\nu_N, σN2\sigma_N^2, μN\boldsymbol{\mu}_N, and ΛN\boldsymbol{\Lambda}_N.

Hint: Remember that the “standard procedure” for deriving the posterior distribution is to write down the joint distribution (on both parameters and data), and then to only foucs on the terms you care about (the parameters). But, in this setting, you have to be very careful to keep both w\mathbf{w} and σ2\sigma^2, because we are asking for the joint posterior.


Your answer here.


Problem 2: The Posterior Mean [Math]

a. What does the posterior mean E[wy,X,η]\mathbb{E}[\mathbf{w} \mid \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}] equal in the uninformative limit where Λ00\boldsymbol{\Lambda}_0 \to 0 and ν00\nu_0 \to 0?

b. What does the posterior mean E[σ2y,X,η]\mathbb{E}[\sigma^2 \mid \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}] equal in the uninformative limit where Λ00\boldsymbol{\Lambda}_0 \to 0 and ν00\nu_0 \to 0? Write your answer in terms of the hat matrix H=X(XX)1X\mathbf{H} = \mathbf{X} (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top.


Your answer here.


Synthetic Data

We’ll do some simple analysis of a synthetic dataset with N=20N =20 data points. Each data point has covariates xn=(1,xn)R2\mathbf{x}_n = (1, x_n) \in \mathbb{R}^2 and scalar outcomes ynRy_n \in \mathbb{R}. It looks like this:

!wget -nc https://raw.githubusercontent.com/slinderman/stats305c/main/assignments/hw1/hw1.pt

# Load the data.
# X = [[1, x_1]
#      [1, x_2]
#         ...
#      [1, x_N]]
#
# y = [y_1, ..., y_N]
X, y = torch.load("hw1.pt")

plt.plot(X[:, 1], y, 'ko')
plt.xlabel("$x$")
plt.ylabel("$y$")

Here, the outcomes were simulated from a linear regression with Gaussian noise according to some true parameters (not given). You will compute and visualize the posterior distribution over the weights and variance given the data.

Problem 3: Compute the posterior [Code]

Write a function to compute the posterior parameters given data and hyperparameters.

Hints: You may find the following commands in PyTorch useful:

  • If a is a tensor, a.shape is a tuple containing the shape of a.

  • If a is a tensor, a.T returns the transpose of a.

  • torch.linalg.solve

  • * denotes element-wise multiplication while @ denotes standard matrix-matrix or matrix-vector multiplication.

def compute_posterior(X, y, nu_0, sigmasq_0, mu_0, Lambda_0):
    """
    Compute the posterior parameters nu_N, sigmasq_N, mu_N, and Lambda_N 
    given covariates X, outcomes y, and hyperparameters.

    Args:
        X:          (N, D) tensor of covariates
        y:          (N,) tensor of outcomes
        nu_0:       prior degrees of freedom
        sigmasq_0:  prior mean of the variance parameter
        mu_0:       prior mean of the weights
        Lambda_0:   prior precision of the weights

    Returns:
        nu_N:       posterior degrees of freedom
        sigmasq_N:  posterior scale of the variance parameter
        mu_N:       posterior mean of the weights
        Lambda_N:   posterior precision of the weights
    """
    ##
    # YOUR CODE HERE
    #
    ##
    
    return nu_N, sigmasq_N, mu_N, Lambda_N

Please run the following code to print your answers:

# Test:
hypers = dict(
    nu_0=torch.tensor(1.0),
    sigmasq_0=torch.tensor(1.0),
    mu_0=torch.zeros(2),
    Lambda_0=0.1 * torch.eye(2)
)

nu_N, sigmasq_N, mu_N, Lambda_N = compute_posterior(X, y, **hypers)
print("nu_N:       \n", nu_N)
print("")
print("sigmasq_N:  \n", sigmasq_N)
print("")
print("mu_N:       \n", mu_N)
print("")
print("Lambda_N:   \n", Lambda_N)

Problem 4: Plot the posterior density of the variance [Code]

Plot p(σ2X,y,η)p(\sigma^2 \mid X, y, \eta) vs σ2\sigma^2 over the interval [103,2][10^{-3}, 2], where XX and yy continue to be the synthetic data we downloaded and used in Problem 3.

You may use the ScaledInvChiSq distribution object below, which we copied from the demo for Lecture 1.

Hint: In Python, you can use dir(object) to list the attributes and functions that an object supports.

Hint: To learn more about PyTorch distributions, see the docs.

class ScaledInvChiSq(TransformedDistribution):
    
    def __init__(self, dof, scale):
        """
        Implementation of the scaled inverse \chi^2 distribution,
        
        ..math:
            \chi^{-2}(\nu_0, \sigma_0^2)

        It is equivalent to an inverse gamma distribution, which we implement
        as a transformation of a Gamma distribution. Thus, this class inherits
        functions like `log_prob` from its parent.

        Args:
            dof:   degrees of freedom parameter
            scale: scale of the $\chi^{-2}$ distribution.
        """
        base = Gamma(dof / 2, dof * scale / 2)
        transforms = [PowerTransform(-1)]
        TransformedDistribution.__init__(self, base, transforms)
        self.dof = dof
        self.scale = scale        
##
# YOUR CODE HERE
#
##

Problem 5: Plot posterior samples of the regression function. [Code]

Draw 50 samples from the posterior marginal distribution over the weights wR2\mathbf{w} \in \mathbb{R}^2. For each sample, compute the expected value of yy on a grid of points xx evenly spaced between [3,3][-3, 3]. Remember that our covariates were defined as x=(1,x)\mathbf{x} = (1, x) so that for each sample of the weights you get a line for E[yx,w]\mathbb{E}[y \mid x, \mathbf{w}] as a function of xx. Plot these 50 lines on top of each other to get a sense of the posterior uncertainty in the regression function. (You may want to plot each line with some transparency, like alpha=0.1.) Overlay the observed data points.

Hint: You may find torch.inverse useful.

Hint: Remember that in the generative model we have posited, the distribution of w\mathbf{w} depends on σ2\sigma^2.

##
# YOUR CODE HERE
#
##

Problem 6: Posterior Predictive Distribution [Math]

The subparts of this problem will walk you through deriving the posterior predictive distribution of the outcome at a new input xN+1\mathbf{x}_{N+1}. That is, computing,

p(yN+1xN+1,y,X,η)\begin{align*} p(y_{N+1} \mid x_{N+1}, \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}) \end{align*}

integrating over the posterior distribution on the weights w\mathbf{w} and variance σ2\sigma^2. Remember that you found this posterior distribution in Problem 1, but for the purpose of this question it’s enough to leave it in the form

p(w,σ2y,X,η)=χ2(σ2νN,σN2)N(wμN,σ2ΛN1),\begin{align*} p(\mathbf{w}, \sigma^2 \mid \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}) &= \chi^{-2}(\sigma^2 \mid \nu_N, \sigma_N^2) \mathcal{N}(\mathbf{w} \mid \boldsymbol{\mu}_N, \sigma^2 \boldsymbol{\Lambda}_N^{-1}), \end{align*}

i.e. you don’t need to plug in the values for for some νN\nu_N, σN2\sigma_N^2, μN\boldsymbol{\mu}_N, and ΛN\boldsymbol{\Lambda}_N that you found in Problem 1.

Problem 6a

Using the product rule of probability, write out the joint distribution of the posterior over the parameters and the observation of the next data point yN+1y_{N+1}:

p(yN+1,w,σ2xN+1,y,X,η).\begin{align*} p(y_{N + 1}, \mathbf{w}, \sigma^2 | x_{N + 1}, \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}). \end{align*}

You can (and please do) replace any densities from a known family with the notation symbol for the family(variable nameparameters)\text{symbol for the family}( \text{variable name} | \text{parameters}). (We follow this notation in how we write out the posterior above).


Your answer here.


Problem 6b

Now using the sum rule of probability, compute the posterior predictive distribution

p(yN+1,xN+1,y,X,η)=p(yN+1,w,σ2xN+1,y,X,η)dwdσ2.\begin{align*} p(y_{N + 1}, | x_{N + 1}, \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}) = \int p(y_{N + 1}, \mathbf{w}, \sigma^2 | x_{N + 1}, \mathbf{y}, \mathbf{X}, \boldsymbol{\eta}) d\mathbf{w} d\sigma^2. \end{align*}

Hint: You can do this integral without taking any integrals! Thinks about conjugate families and how the Student T distribution came up in lectures.


Your answer here.


Submission Instructions

Formatting: check that your code does not exceed 80 characters in line width. You can set Tools → Settings → Editor → Vertical ruler column to 80 to see when you’ve exceeded the limit.

Download your notebook in .ipynb format and use the following commands to convert it to PDF. Then run the following command to convert to a PDF:

jupyter nbconvert --to pdf <yourlastname>_hw1.ipynb

(Note that for the above code to work, you need to rename your file <yourlastname>_hw1.ipynb)

Possible causes for errors:

  • the “Open in colab” button. Just delete the code that creates this button (go to the top cell and delete it)

  • Latex errors: many latex errors aren’t visible in the notebook. Try binary search: comment out half of the latex at a time, until you find the bugs

Getting this HW into PDF form isn’t meant to be a burden. One quick and easy approach is to open it as a Jupyter notebook, print, save to pdf. Just make sure your latex math answers aren’t cut off so I can grade them.

Please post on Ed or come to OH if there are any other problems submitting the HW.

Installing nbconvert:

If you’re using Anaconda for package management,

conda install -c anaconda nbconvert

Upload your .pdf file to Gradescope. Please tag your questions!