Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Bayesian Modeling

This course is about probabilistic modeling and inference with high-dimensional data.

Throughout this course we will encounter data in many forms — vectors of measurements, images, documents, time series, spike trains — and our goal will be to build probabilistic models of that data and use them to reason about the world. Depending on the application, we might want to:

A central theme is that probabilistic models provide a unified language for all of these tasks. By specifying a joint distribution over observed and latent variables, we can address prediction, simulation, and uncertainty quantification within a single coherent framework.

Box’s Loop

How should we go about building and using probabilistic models? A useful guiding framework is Box’s loop Blei, 2014, named after the statistician George Box (of “all models are wrong, but some are useful” fame). The loop has three stages:

  1. Build: propose a probabilistic model — a joint distribution over data and parameters — that encodes your assumptions about how the data were generated.

  2. Compute: perform inference to find the posterior distribution of the parameters given the observed data.

  3. Critique: evaluate how well the model explains the data, check for systematic failures, and use those failures to motivate improvements.

Then repeat. Good probabilistic modeling is an iterative process: a simpler model helps us understand the data structure, and that understanding guides us toward richer, more accurate models.

Box’s loop: the iterative cycle of model building, inference, and criticism. Figure from .

Figure 1:Box’s loop: the iterative cycle of model building, inference, and criticism. Figure from Blei, 2014.

The Bayesian Approach

The Bayesian approach to statistical modeling has three core components:

  1. A model is a joint distribution of parameters θ\mbtheta and data X\mbX,

    p(θ,Xη)=p(θη)p(Xθ,η),\begin{align} p(\mbtheta, \mbX \mid \mbeta) = p(\mbtheta \mid \mbeta) \, p(\mbX \mid \mbtheta, \mbeta), \end{align}

    where p(θη)p(\mbtheta \mid \mbeta) is the prior distribution encoding beliefs about the parameters before seeing data, and p(Xθ,η)p(\mbX \mid \mbtheta, \mbeta) is the likelihood of the data given parameters. The symbol η\mbeta denotes hyperparameters — parameters of the prior that we treat as fixed and known.

  2. An inference algorithm computes the posterior distribution of parameters given data — a complete probabilistic description of what we have learned about θ\mbtheta after observing X\mbX.

  3. Model criticism and downstream tasks are based on posterior expectations — averages of quantities of interest under the posterior.

The fundamental formula connecting all three components is Bayes’ rule,

p(θX;η)posterior=p(θ;η)prior  p(Xθ;η)likelihoodp(X;η)marginal likelihood=p(θ,X;η)p(θ,X;η) ⁣dθ.\begin{align} \underbrace{p(\mbtheta \mid \mbX; \mbeta)}_{\text{posterior}} &= \frac{\overbrace{p(\mbtheta; \mbeta)}^{\text{prior}} \; \overbrace{p(\mbX \mid \mbtheta; \mbeta)}^{\text{likelihood}}}{\underbrace{p(\mbX; \mbeta)}_{\text{marginal likelihood}}} = \frac{p(\mbtheta, \mbX; \mbeta)}{\int p(\mbtheta, \mbX; \mbeta) \dif \mbtheta}. \end{align}

The marginal likelihood p(X;η)=p(θ,X;η) ⁣dθp(\mbX; \mbeta) = \int p(\mbtheta, \mbX; \mbeta) \dif \mbtheta is the probability of the data averaged over all parameter values. It plays a key role in model comparison and hyperparameter selection, but computing it is often the hardest part — most of this course is about methods for dealing with this integral.

Notation. Throughout these notes: lowercase bold letters denote vectors (e.g., x,θ\mbx, \mbtheta); uppercase bold letters denote matrices (e.g., X,Σ\mbX, \mbSigma); and regular characters denote scalars (e.g., x,μ,σ2x, \mu, \sigma^2). We write X={x1,,xN}\mbX = \{\mbx_1, \ldots, \mbx_N\} for a dataset of NN observations.

TODO: Worked Example(s)

TODO: Project Description

Conclusion

This chapter introduced the Bayesian approach to probabilistic modeling. The core framework — prior, likelihood, posterior, and marginal likelihood — provides a unified language for learning from data under uncertainty. Box’s loop (build, compute, critique) captures the iterative nature of good probabilistic modeling: no model is final, and systematic criticism of model fit drives improvements. The hardest computational step is evaluating the marginal likelihood integral, and most of the course is devoted to algorithms that handle this challenge.

References
  1. Blei, D. M. (2014). Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models. Annual Review of Statistics and Its Application, 1, 203–232.