Stochastic differential equations (SDEs) describe the continuous-time evolution of systems driven by random noise. They are the continuous-time counterpart of the linear dynamical systems studied in Part IV and underpin a broad range of models in statistics, physics, finance, and machine learning. In this course they arise most prominently in two places: as the principled mathematical framework for denoising diffusion models, and as a lens through which many stationary Gaussian processes can be formulated and computed with efficiently.
Brownian Motion¶
The fundamental source of randomness in continuous-time stochastic systems is Brownian motion (also called the Wiener process).
A -dimensional Brownian motion is a vector of independent scalar Brownian motions, so .
Brownian motion is nowhere differentiable almost surely: sample paths have unbounded variation on every interval and are far too rough for a classical derivative to exist. This is not a pathology — it is essential for the central limit theorem intuition underlying the construction. As we add finer and finer independent noise increments, the accumulated noise scales as (not ), and this scaling of increments is what forces us to treat Brownian motion differently from smooth driving signals.
The Ito Integral¶
Because is not differentiable, integrals of the form must be defined with care. The Ito integral constructs this as an limit using left-endpoint Riemann sums:
The integrand must be adapted to the Brownian filtration , meaning may depend only on the history up to time , not on future values of .
Two key properties of the Ito integral follow directly from the construction.
Zero mean. Each summand has zero conditional expectation given , because the Brownian increment is independent of the past and has mean zero. By the tower property, every partial sum has mean zero, and the limit inherits this:
Ito isometry. Cross-terms in vanish for by the same independence argument. Only the diagonal terms contribute, using :
The isometry says that the norm of the integral equals the norm of the integrand in the product space .
Stochastic Differential Equations¶
Starting from Euler’s method for the ODE ,
we augment each step with a Gaussian noise term whose variance scales as — matching the variance of a Brownian increment over the same interval:
Recognizing and taking yields the stochastic differential equation
interpreted rigorously as the integral equation
The function is the drift coefficient and is the diffusion coefficient. The process solving the SDE is called an Ito process or diffusion process. Under standard Lipschitz and growth conditions on and , existence and uniqueness of a strong solution are guaranteed.
Why Gaussian noise? Beyond analytical tractability, the central limit theorem gives physical justification: a large number of small independent perturbations accumulate into something Gaussian. The SDE framework can be generalized to other driving processes (any semimartingale), but Gaussian noise is by far the most common choice.
Ito’s Lemma¶
The ordinary chain rule says: if is a smooth function and is smooth, then . For an Ito process, there is an additional correction term arising from the quadratic variation of Brownian motion.
Derivation sketch. Expand to second order via Taylor’s theorem and substitute :
The key step is evaluating . Using the multiplication rules , , and — which follow from the behavior of Brownian motion — gives . The higher-order terms vanish. The extra term, absent in ordinary calculus, is called the Ito correction.
Multivariate form. For satisfying and a smooth function ,
Example: geometric Brownian motion. The SDE is the classical model for asset prices. Applying Ito’s lemma to gives
so drifts linearly and is log-normally distributed. The correction relative to the drift is a direct consequence of the Ito term.
The Fokker-Planck Equation¶
Instead of tracking individual trajectories, one can describe the evolution of the marginal density of the process. For the SDE , the marginal density satisfies the Fokker-Planck equation (also called the forward Kolmogorov equation):
In the scalar case with state-independent diffusion , this simplifies to
The first term describes probability transport due to the drift — it has the form of a continuity equation. The second term is a diffusion equation that spreads probability mass. Together they give a precise PDE governing how the distribution of evolves over time.
The backward Kolmogorov equation is the adjoint of the Fokker-Planck operator and describes how expected functions of the terminal state evolve backward in time from to . It is the continuous-time analogue of the backwards recursions in dynamic programming.
Simulation: Euler–Maruyama¶
Given initial condition and a time grid with step , the Euler–Maruyama method approximates the SDE solution by
This is the direct stochastic analogue of Euler’s method and has strong convergence rate . The Milstein scheme adds a correction using the Ito lemma applied to ,
improving the strong convergence rate to when depends on the state. When the transition density is known in closed form — as it is for linear SDEs — exact simulation is possible without any discretization error.
Linear SDEs and the Ornstein–Uhlenbeck Process¶
The most tractable family of SDEs has a drift that is linear in the state:
Because the noise enters linearly, solutions have Gaussian transition densities and therefore define Gaussian processes.
Solution via integrating factor. For the scalar time-invariant case , apply Ito’s lemma to :
Integrating both sides from 0 to and rearranging,
The Ito integral is Gaussian (as an limit of Gaussian sums), so the transition density is
For , the mean decays exponentially toward zero and the variance saturates at as .
The Ornstein–Uhlenbeck process. The canonical mean-reverting SDE is
The drift pulls toward the long-run mean at rate ; controls the noise level. The transition density is
The marginal distribution converges to the stationary distribution regardless of the initial condition, with the exponential covariance kernel .
Multivariate case. For with stable , the solution is
with transition covariance . The stationary covariance is the unique positive definite solution to the continuous-time Lyapunov equation
Stationary Distributions¶
When does the density converge as ? A stationary distribution satisfies the time-independent Fokker-Planck equation (right-hand side set to zero). For the scalar constant-diffusion case , one can integrate the Fokker-Planck equation twice to obtain
This is the Boltzmann distribution with energy .
Langevin dynamics. This connection between drift and stationary distribution is the foundation of gradient-based MCMC. To sample from a target distribution , one runs the SDE
which has as its unique stationary distribution. In the Bayesian context, , and Langevin dynamics provides a continuous-time limit of gradient-based MCMC. Discretization via Euler–Maruyama gives the unadjusted Langevin algorithm (ULA); adding a Metropolis–Hastings correction step recovers the Metropolis-adjusted Langevin algorithm (MALA).
Linear SDEs as Gaussian Processes¶
Since solutions to linear SDEs have Gaussian finite-dimensional distributions, they are Gaussian processes. The GP mean and covariance can be computed from the SDE coefficients:
both of which satisfy ODEs derivable from the SDE.
Conversely, many classical stationary covariance kernels correspond exactly to the stationary GPs generated by specific linear SDEs:
| Kernel | Corresponding SDE |
|---|---|
| Exponential: $k(r) = \sigma^2 e^{-\ell | r |
| Matérn-3/2: | Second-order linear SDE (two-state system) |
| Matérn-5/2: | Third-order linear SDE (three-state system) |
| Squared exponential (approximate) | Infinite-order SDE, approximated by truncating a series expansion |
This SDE–GP equivalence has an important computational consequence. Standard GP regression with observations requires time to compute (and invert) the covariance matrix. When the covariance function corresponds to a linear SDE, the Kalman filter performs the same inference in time, by exploiting the Markov structure of the SDE solution. This makes SDE-based GP approximation a key technique for large-scale time series Solin & Särkkä, 2020.
The Reverse-Time SDE¶
A foundational result due to Anderson, 1982 shows that the time-reverse of a diffusion process is also a diffusion. If satisfies the forward SDE
then the reverse-time process satisfies
where is a new Brownian motion (running backward) and is the score function of the marginal density at time .
The reverse SDE has the same marginal distributions as the forward SDE, but run in reverse time. Starting from (a simple noise distribution) and simulating the reverse SDE to time produces a sample from (the data distribution).
This result is the mathematical heart of score-based generative modeling: design a forward SDE that converts data into noise, estimate the score using a neural network trained by denoising score matching, then run the reverse SDE to generate new data. The connection between the reverse-time correction and the score function explains why learning to denoise is equivalent to learning to generate — a remarkable duality discussed further in the Denoising Diffusion Models chapter.
Conclusion¶
Stochastic differential equations extend ordinary differential equations by adding Brownian noise, yielding continuous-time models whose sample paths are nowhere differentiable. The Ito integral and Ito’s lemma provide the calculus needed to work with these processes. Linear SDEs are especially tractable: their solutions are Gaussian processes with kernels that correspond to classical covariance functions (exponential, Matérn), and the Kalman filter performs GP inference in linear time by exploiting the Markov structure. Stationary distributions of SDEs are Boltzmann densities, connecting SDEs to Langevin MCMC. The reverse-time SDE of Anderson (1982) is the mathematical foundation of score-based generative modeling, providing a principled framework for understanding denoising diffusion models.
- Solin, A., & Särkkä, S. (2020). Hilbert space methods for reduced-rank Gaussian process regression. Statistics and Computing, 30(2), 419–446.
- Anderson, B. D. O. (1982). Reverse-time diffusion equation models. Stochastic Processes and Their Applications, 12(3), 313–326.