Denoising diffusion models (DDMs) are currently the state-of-the-art approach for image generation, but can they be used for generating discrete data like language and protein sequences? We derived the basic principles for DDMs with continuous-valued data in Denoising Diffusion Models. Here, we show how these concepts extend to discrete data, and how ideas like continuous-time diffusion and the score function in the reverse diffusion SDE extend to the discrete setting.
where p0∣t+1(x0∣xt+1;θ) is a learned, approximate denoising distribution. We then sample from the approximate reverse process one step at a time from T down to 0:
In the continuous-time limit, the noising and reverse processes become continuous-time Markov chains (CTMCs) — for a self-contained treatment see Continuous-Time Markov Chains. The reversal of a CTMC is another CTMC, and Campbell et al. (2022) showed how to parameterize the reverse process of a discrete-state, continuous-time DDM in terms of the backward rates.
where the density ratio qt(xt=j)/qt(xt=i) is the discrete analog of the score function.
Sampling the backward process is tricky because the reverse rate is inhomogeneous, and Gillespie’s algorithm for inhomogeneous processes requires integrating rate matrices. Campbell et al. (2022) propose tau-leaping to approximately sample the backward process, followed by corrector steps to compensate for discretization error. Recent work shows how to develop more informative correctors for discrete diffusion with masking processes Zhao et al., 2024.
Discrete DDMs extend the denoising diffusion framework to discrete state spaces. The discrete-time formulation mirrors the continuous case: a noising Markov chain gradually corrupts data toward a noise distribution, and the model learns to approximate the reverse denoising distribution. The ELBO decomposes into per-step KL divergences, so the choice of noising process — in particular whether it admits closed-form marginals and interpolating distributions — is critical. Masking diffusion, where tokens are absorbed into a MASK state, is currently among the most effective approaches and has been successfully applied to language modeling and protein sequence design.
In the continuous-time limit, the noising and reverse processes become CTMCs. The backward rate matrices — expressible as density ratios that are the discrete analog of the score function — characterize the reverse process, but sampling it exactly is intractable and requires approximate methods such as tau-leaping and corrector steps Campbell et al., 2022. Recent work shows that the structure of masking diffusion enables more informed correctors that substantially improve sample quality Zhao et al., 2024.
Campbell, A., Benton, J., De Bortoli, V., Rainforth, T., Deligiannidis, G., & Doucet, A. (2022). A Continuous Time Framework for Discrete Denoising Models. Advances in Neural Information Processing Systems, 35.
Zhao, Y., Shi, J., Mackey, L., & Linderman, S. (2024). Informed correctors for discrete diffusion models. arXiv Preprint arXiv:2407.21243.
Rao, V., & Teh, Y. W. (2013). Fast MCMC sampling for Markov jump processes and extensions. The Journal of Machine Learning Research, 14(1), 3295–3320.