Poisson processes#

Definition#

Poisson processes are stochastic processes that generate random sets of points \(\{\mbx_n\}_{n=1}^N \subset \cX\).

Poisson processes are governed by an intensity function, \(\lambda(\mbx): \cX \to \reals_+\).

Property #1: The number of points in any interval is a Poisson random variable,

\[\begin{align*} N(\cA) &\sim \mathrm{Po}\left(\int_{\cA} \lambda(\mbx) \dif \mbx \right) \end{align*}\]

Property #2: Disjoint intervals are independent,

\[\begin{align*} N(\cA) \perp N(\cB) \iff \cA \cap \cB = \varnothing \end{align*}\]

Example applications of Poisson processes#

  • Modeling neural firing rates

  • Locations of trees in a forest

  • Locations of stars in astronomical surveys

  • Arrival times of customers in a queue (or HTTP requests to a server)

  • Locations of bombs in London during World War II

  • Times of photon detections on a light sensor

  • Others?

Four ways to sample a Poisson process#

We will derive four ways of sampling a Poisson process, each of which provides different insight into the generative model and alternative pathways for inference:

  1. The top-down approach

  2. The interval approach

  3. The time-rescaling approach

  4. The thinning approach

Top-down sampling of a Poisson process#

Given \(\lambda(\mbx)\) (and a domain \(\cX\)):

  1. Sample the total number of points

    \[\begin{align*} N &\sim \mathrm{Po} \left( \int_{\cX} \lambda(\mbx) \dif \mbx \right) \end{align*}\]
  2. Sample the locations of the points

    \[\begin{align*} \mbx_n &\iid{\sim} \frac{\lambda(\mbx)}{\int_{\cX} \lambda(\mbx') \dif \mbx'} \end{align*}\]

    for \(n=1,\ldots,N\).

Question

What assumptions are necessary for this procedure to be tractable?

Deriving the Poisson process likelihood#

Exercise

From the top-down sampling process, derive the Poisson process likelihood,

\[\begin{align*} p\left(\{\mbx_n\}_{n=1}^N \mid \lambda(\mbx) \right) \end{align*}\]

Intervals of a homogeneous Poisson process#

A Poisson process is homogeneous if its intensity is constant, \(\lambda(\mbx) \equiv \lambda\).

Property #3: A homogeneous Poisson process on \([0, T] \subset \reals\) (e.g. where points correspond to arrival times) has independent, exponentially distributed intervals,

\[\begin{align*} \Delta_n = x_n - x_{n-1} &\iid{\sim} \mathrm{Exp}(\lambda) \end{align*}\]

Property #4: A homogeneous Poisson process is memoryless — the amount of time until the next point arrives is independent of the time elapsed since the previous point arrived.

Sampling a homogeneous Poisson process by simulating intervals#

We can sample a homogeneous Poisson process on \([0, T]\) by simulating intervals as follows:

  1. Initialize \(\mbX = \varnothing\) and \(x_0 = 0\)

  2. For \(n=1, 2, \ldots\):

    1. Sample \(\Delta_n \sim \mathrm{Exp}(\lambda)\).

    2. Set \(x_n = x_{n-1} + \Delta_n\).

    3. If \(x_n > T\), break and return \(\mbX\),

    4. Else, set \(\mbX \leftarrow \mbX \cup \{x_n\}\).

Deriving the likelihood of a homogeneous Poisson process#

Exercise

From the interval sampling process, derive the likelihood of a homogeneous Poisson process. Show that it is the same as what you derived from the top-down sampling process.

Sampling an inhomogeneous Poisson process by time-rescaling#

Now consider an inhomogeneous Poisson process on \([0, T]\); i.e. one with a non-constant intensity.

Apply the change of variables,

\[\begin{align*} x &\mapsto \int_0^x \lambda(t) \dif t \triangleq \Lambda(x) \end{align*}\]

Note that this is an invertible transformation when \(\lambda(x) > 0\).

Sample a homogeneous Poisson process with unit rate on \([0, \Lambda(T)]\) to get points \(\mbU = \{u_n\}_{n=1}^N\). Then set,

\[\begin{align*} \mbX &= \{\Lambda^{-1}(u_n): u_n \in \mbU\}. \end{align*}\]

Sanity check: what is the expected value of \(N\)?

A Goodness of fit test for inhomogeneous Poisson processes#

Brown et al. [BBV+02] used the time-rescaling sampling procedure to develop a goodness-of-fit test for inhomogeneous Poisson processes.

Suppose you observe a set of points \(\{x_n\}_{n=1}^N \subset [0,T]\) and you want to test whether they are well-modeled by an inhomogeneous Poisson process with rate \(\lambda(x)\). Let \(\Delta_n = \Lambda(x_n) - \Lambda(x_{n-1}\)) with \(\Lambda(x_0) =0\). If the model is a good fit, then \(\Delta_n \iid{\sim} \mathrm{Exp}(1)\). Perform a further transformation \(z_n = 1 - e^{-\Delta_n}\). Then \(z_n \iid{\sim} \mathrm{Unif}([0, 1])\).

Now sort the \(z_n\)’s in increasing order into \((z_{(1)}, \ldots, z_{(N)})\), so \(z_{(1)}\) is the smallest value. Intuitively, the points \(\big(\tfrac{n-1/2}{N}, z_{(n)}\big)\) should like along a 45\(^\circ\) line. We can check for significant departures from the 45\(^\circ\) line using a simple visual test.

For a more rigorous test, note that the order statistics \(z_{(n)}\) are marginally beta distributed,

\[\begin{align*} z_{(n)} &\sim \mathrm{Beta}(n, N-n+1) \end{align*}\]

The mean is \(\tfrac{n}{N+1}\) and its mode is \(\tfrac{n-1}{N-1}\). Then, use the 2.5% and 97.5% quantiles of the beta distribution to obtain confidence intervals around the 45\(^\circ\) line.

The Poisson Superposition Principle#

Property #5: The union (a.k.a. superposition) of independent Poisson processes is also a Poisson process.

Suppose we have two independent Poisson processes on the same domain \(\cX\),

\[\begin{align*} \{\mbx_n\}_{n=1}^N &\sim \mathrm{PP}(\lambda_1(\mbx)) \\ \{\mbx'_m\}_{m=1}^M &\sim \mathrm{PP}(\lambda_2(\mbx)) \end{align*}\]

Then

\[\begin{align*} \{\mbx_n\}_{n=1}^N \cup \{\mbx'_m\}_{m=1}^M &\sim \mathrm{PP}(\lambda_1(\mbx) + \lambda_2(\mbx)) \end{align*}\]

This is the analog of the fact that the sum of independent Poisson random variables is Poisson.

Poisson thinning#

The opposite of Poisson superposition is Poisson thinning.

Suppose we have points \(\{\mbx_n\}_{n=1}^N \sim \mathrm{PP}(\lambda(\mbx))\) where \(\lambda(\mbx) = \lambda_1(\mbx) + \lambda_2(\mbx)\).

Sample independent binary variables

\[\begin{align*} z_n \sim \mathrm{Bern}\left(\frac{\lambda_1(\mbx_n)}{\lambda_1(\mbx_n) + \lambda_2(\mbx_n)} \right). \end{align*}\]

Then \(\{\mbx_n: z_n = 1\} \sim \mathrm{PP}(\lambda_1(\mbx))\) and \(\{\mbx_n: z_n = 0\} \sim \mathrm{PP}(\lambda_2(\mbx))\).

Sampling a Poisson process by thinning#

Exercise

Use Poisson thinning to sample an inhomogeneous Poisson process with a bounded intensity,~\(\lambda(\mbx) \leq \lambda_{\mathsf{max}}\).

Question: What Monte Carlo sampling method is this akin to?

Beyond Poisson Processes#

What’s not to love about Poisson processes? Well, the independence assumptions that lead to such nice properties are also fundamentally limiting from a modeling perspective. What if the points are not independent?

Conditional intensity functions#

One way of introducing dependence is via an autoregressive model. Consider a point process on a time interval \([0, T]\).

Let \(\lambda(t \mid \cH_t)\) denote a conditional intensity function where \(\cH_t\) is the history of points before time \(t\).

Technically, \(\cH_t\) is a filtration in the language of stochastic processes.

Allowing past points to influence the intensity function enables more complex, non-Poisson models.

Hawkes processes#

Hawkes processes [Haw71] are self-exciting point processes.

Their conditional intensity function is modeled as,

\[\begin{align*} \lambda(t \mid \cH_t) &= \lambda_0 + \sum_{t_n \in \cH_t} h(t - t_n), \end{align*}\]

where \(h: \reals_+ \mapsto \reals_+\) is a positive impulse response or influence function.

For example, the impulse responses could be modeled as exponential functions,

\[\begin{align*} h(\Delta t) &= \tfrac{w}{\tau} e^{-\tfrac{\Delta t}{\tau}} = w \cdot g(\Delta t), \end{align*}\]

where \(w \in \reals_+\) is a weight and \(g\) is a normalized probability density on \(\Delta t \in \reals_+\). For example, we could choose an exponential density,

\[\begin{align*} g(\Delta t) &= \mathrm{Exp}(\Delta t; \tau) \end{align*} \]

with time-constant \(\tau \in \reals_+\) governing the rate of decay.

If \(g\) is a normalized density, then \(\int_0^\infty h(\Delta t) \dif \Delta t = w\).

Maximum likelihood estimation for Hawkes processes#

Suppose we observe a collection of time points \(\{t_n\}_{n=1}^N \subset [0, T]\) and want to estimate the parameters \(\mbtheta = (\lambda_0, w)\) of a Hawkes process with an exponential impulse response function. (Consider \(\tau\) to be fixed.)

The Hawkes process log likelihood is just like that of a Poisson process,

\[\begin{align*} \log p(\{t_n\}_{n=1}^N \mid \mbtheta) &= -\int_0^T \lambda_{\mbtheta}(t \mid \cH_t) \dif t + \sum_{n=1}^N \log \lambda_{\mbtheta}(t_n \mid \cH_t) \end{align*}\]

Substituting in the form of the conditional intensity, we can simplify the log likelihood to,

\[\begin{align*} \nonumber \log p(\{t_n\}_{n=1}^N \mid \mbtheta) &= -\int_0^T \Big[\lambda_0 + w\sum_{t_n \in \cH_t} g(t-t_n) \dif t \Big] \\ &\hspace{8em}+ \sum_{n=1}^N \log \Big(\lambda_0 + w \sum_{t_m \in \cH_{t_n}} g(t_n - t_m) \Big) \\ &\approx -\mbtheta^\top \mbphi_0 + \sum_{n=1}^N \log \Big(\mbtheta^\top \mbphi_n \Big) \end{align*}\]

where \(\mbphi_0 = (T, N)^\top\) and \(\mbphi_n = \Big(1, \, \sum_{t_m \in \cH_{t_n}} g(t_n - t_m) \Big)^\top\).

Question

What approximation did we make? How would you maximize the log likelihood as a function of \(\mbtheta\)?

Nonlinear Hawkes processes#

Hawkes processes only allow for excitatory impulse responses (\(w \in \reals_+\)). What if we want to model self-inhibitory influences as well? A natural generalization of the Hawkes process is,

\[\begin{align*} \lambda(t \mid \cH_t) &= f(\beta_0 + \beta \sum_{t_n \in \cH_t} g(t - t_n)) \end{align*}\]

where \(f: \reals \mapsto \reals_+\) is a rectifying nonlinear function and \(\beta_0, \beta \in \reals\) can be positive or negative.

These models are a bit more challenging because exact computation of the Poisson process likelihood requires the integrated intesity function, which is generally intractable.

Instead, we can approximate the integral with a Riemann sum,

\[\begin{align*} \int \lambda(t \mid \cH_t) \dif t &\approx \sum_{i=0}^{\lfloor {T}/{\Delta}\rfloor - 1} \lambda(i \Delta \mid \cH_{i \Delta}) \Delta \end{align*}\]

which essentially treats the intensity as piecewise constant within each interval of length \(\Delta\).

In that approximation, the Poisson process likelihood is equivalent, up to a scale factor, to a product of Poisson likelihoods,

\[\begin{align*} p(\{t_n\}_{n=1}^N) &\approx \prod_{i=0}^{\lfloor {T}/{\Delta}\rfloor - 1} \mathrm{Po}(N_i \mid \lambda(i \Delta \mid \cH_{i\Delta}) \Delta) \\ &= \prod_{i=0}^{\lfloor {T}/{\Delta}\rfloor - 1} \mathrm{Po}\left(N_i \mid f\left(\beta_0 + \beta \sum_{t_n \in \cH_{i\Delta}} g(t - t_n) \right) \Delta \right) \end{align*}\]

where \(N_i\) is the number of events in the \(i\)-th interval, \([iD, (i+1)D)\).

This is a Poisson GLM with weights \(\mbbeta = (\beta_0, \beta)\)! We can fit it using all the techniques we developed earlier in the course.

Marked point processes#

Now suppose we observed points from \(S\) difference sources.

We can represent the points as a set of tuples, \(\{(t_n, s_n)\}_{n=1}^N\) where \(t_n \in [0,T]\) denotes the time and \(s_n \in \{1,\ldots,S\}\) denotes the source of the \(n\)-th point.

We will model them as a marked point process.

Like before, we have a (conditional) intensity function, but this time is defined over time and marks,

\[\begin{align*} \lambda(t, s \mid \cH_t): \; [0, T] \times \{1, \ldots, S\} \mapsto \reals_+ \end{align*}\]

When \(s\) takes on a discrete set of values, we often use the shorthand,

\[\begin{align*} \lambda_s(t \mid \cH_t) &\triangleq \lambda(t, s \mid \cH_t) \end{align*}\]

to denote the intensity for the \(s\)-th source.

Multivariate Hawkes processes#

A multivariate Hawkes process is a marked point process with mutually excitatory interactions.

It is defined by the conditional intensity functions,

\[\begin{align*} \label{eq:mvhawkes} \lambda_s(t \mid \cH_t) &= \lambda_{s,0} + \!\!\!\!\sum_{(t_n, s_n) \in \cH_t}\!\!\!\! h_{s_n, s}(t - t_n). \end{align*}\]

where \(h_{s, s'}(\Delta t)\) is a directed impulse response from points on source \(s\) to the intensity of~\(s'\).

Again, it is common to model the impulse responses as weighted probability densities; e.g.,

\[\begin{align*} h_{s, s'}(\Delta t) &= w_{s,s'} \cdot \mathrm{Exp}(\Delta t; \tau_{s,s'}) \end{align*}\]

where \(w_{s,s'}\) is the weight.

Like before, the weights can be estimated using maximum likelihood estimation.

Discovering latent network structure in point process data#

We can think of the weights as defining a directed network,

\[\begin{align*} \mbW &= \begin{bmatrix} w_{1,1} & \ldots & w_{1,S} \\ \vdots & & \vdots \\ w_{S,1} & \ldots & w_{S,S} \end{bmatrix} \end{align*}\]

where \(w_{s,s'} \in \reals_+\) is the strength of influence that events (points) on source \(s\) induce on the intensity of source \(s'\).

However, we don’t directly observe the network. We only observed it indirectly through the point process.

Question

When is a multivariate Hawkes process stable, in that the intensity tends to a finite value in the infinite time limit?

Multivariate Hawkes processes as Poisson clustering processes#

Note that the conditional intensity above is a sum of a background intensity and a bunch of non-negative impulse responses.

\[\begin{align*} \lambda_s(t \mid \cH_t) &= \lambda_{0,s} + \!\!\!\!\sum_{(t_n, s_n) \in \cH_t}\!\!\!\! h_{s_n, s}(t - t_n). \end{align*}\]

Question

Which property of Poisson processes applied to such intensities?

Using the Poisson superposition principle, we can partition the points \(\cT_s = \{t_n: s_n=s\}\) from source \(s\) into clusters attributed to either the background or to one of the impulse responses.

\[\begin{align*} \cT_s = \bigcup_{n=0}^N \cT_{s,n} \end{align*} \]

where

\[\begin{align*} \cT_{s,0} &\sim \mathrm{PP}(\lambda_{s,0}) & & \text{[background points]}\\ \cT_{s,n} &\sim \mathrm{PP}(h_{s_n,s}(t - t_n)) & & \text{[points induced by $(t_n, s_n)$]} \end{align*}\]

Now the weights have an intuitive interpretation. Plugging in the definition of the impulse response,

\[\begin{align*} \cT_{s,n} &\sim \mathrm{PP}\Big(w_{s_n,s} \cdot \mathrm{Exp}(t - t_n; \tau_{s_n,s}) \Big). \end{align*}\]

Question

What is the expected number of points induced by this impulse response, i.e. \(\mathbb{E}[|\cT_{s,n}|]\)?

Conjugate Bayesian inference for multivariate Hawkes processes#

Let’s put a gamma prior on the weights,

\[\begin{align*} w_{s,s'} &\sim \mathrm{Ga}(\alpha, \beta). \end{align*}\]

Question

Suppose we know the partition of points; i.e. we knew the clusters~\(\cT_{s,n}\). What is the conditional distribution,

\[\begin{align*} p\left(w_{s,s'} \mid \{\{\cT_{s,n}\}_{n=0}^N\}_{s=1}^S \right) \end{align*}\]

We don’t know the partition of spikes in general, but we do know its conditional distribution!

Let \(z_n \in \{0, \ldots, n-1\}\) denote the cluster to which the \(n\)-th spike is assigned, with \(z_n=0\) denoting the background cluster. With this notation,

\[\begin{align*} \cT_{s,n} &= \{(t_{n'}, s_{n'}): s_{n'}=s \wedge z_{n'}=n\}. \end{align*}\]

Question

What is the conditional distribution of the cluster assignment,

\[\begin{align*} p(z_n \mid \{(t_n, s_n)\}_{n=1}^N; \mbtheta) \end{align*}\]

Using these two conditional distributions, we can derive a simple Gibbs sampling algorithm that alternates between sampling the weights given the clusters and the clusters given the weights.

Doubly stochastic processes#

Hawkes processes are only one way of going beyond Poisson processes.

Whereas Hawkes processes take an autoregressive approach, doubly stochastic point processes (a.k.a. Cox processes) take a latent variable approach.

In these models, the intensity itself is modeled as a stochastic process,

\[\begin{align*} \lambda(\mbx) &\sim p(\lambda). \end{align*}\]

For example, consider the model,

\[\begin{align*} \lambda(\mbx) &= g(f(\mbx)) \quad \text{where} \quad f \sim \mathrm{GP}(\mu(\cdot), \, K(\cdot, \cdot)). \end{align*}\]

When \(g\) is the exponential function, this is called a log Gaussian Cox process. When \(g\) is the sigmoid function, this is called a sigmoidal Gaussian Cox process [AMM09].

Aternatively, take \(\lambda\) to be a convolution of a Poisson process with a non-negative kernel; this is called a Neyman-Scott process [WDWL22].

Conclusion#

Poisson processes are stochastic processes that generate discrete sets of points.

They are defined by an intensity function \(\lambda(\mbx)\), which specifies the expected number of points in each interval of time or space.

We can build in dependencies by conditioning on past points or introducing latent variables.

Poisson process modeling boils down to inferring the intensity. We can take various parametric and nonparametric approaches.

The hardness comes about when the integral in the Poisson process likelihood is intractable.

As we will see next time, Poisson processes are also mathematical building blocks for Bayesian nonparametric models with random measures.