References

References#

[Agr02]

Alan Agresti. Categorical data analysis. Volume 792. John Wiley & Sons, 2002. URL: https://onlinelibrary.wiley.com/doi/book/10.1002/0471249688.

[AC93]

James H Albert and Siddhartha Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, pages 669–679, 1993.

[Ald81]

David J Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581–598, 1981.

[Bis06]

Christopher Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[Ble14]

David M Blei. Build, compute, critique, repeat: Data analysis with latent variable models. Annual Review of Statistics and Its Application, 1:203–232, 2014.

[BH19]

Joseph K Blitzstein and Jessica Hwang. Introduction to Probability. CRC Press, 2019.

[BRO18]

Benjamin Bloem-Reddy and Peter Orbanz. Random-walk models of network formation and sequential Monte Carlo methods for graphs. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(5):871–898, 2018.

[BV04]

Stephen P Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge university press, 2004.

[CBDB+22]

Andrew Campbell, Joe Benton, Valentin De Bortoli, Thomas Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time framework for discrete denoising models. Advances in Neural Information Processing Systems, 35:28266–28279, 2022.

[Efr22]

Bradley Efron. Exponential families in theory and practice. Cambridge University Press, 2022.

[FHT10]

Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.

[GCS+95]

Andrew Gelman, John B Carlin, Hal S Stern, Aki Vehtari, and Donald B Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, 1995.

[GBC16]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

[GD23]

Albert Gu and Tri Dao. Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.

[HJA20]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.

[Hof07]

Peter Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. Advances in neural information processing systems, 2007.

[HRH02]

Peter D Hoff, Adrian E Raftery, and Mark S Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002.

[Hoo79]

Douglas Hoover. Relations on probability spaces and arrays of random variables. Technical Report, Institute for Advanced Study, Princeton, NJ, 1979.

[KW19]

Diederik P Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.

[LSS14]

Jason D Lee, Yuekai Sun, and Michael A Saunders. Proximal Newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420–1443, 2014.

[MMA+23]

Ann C McKee, Jesse Mez, Bobak Abdolmohammadi, Morgane Butler, Bertrand Russell Huber, Madeline Uretsky, Katharine Babcock, Jonathan D Cherry, Victor E Alvarez, Brett Martin, and others. Neuropathologic and clinical findings in young contact sport athletes exposed to repetitive head impacts. JAMA neurology, 80(10):1037–1050, 2023.

[OR13]

P Orbanz and DM Roy. Bayesian models of graphs, arrays, and other exchangeable random structures. arXiv, 37(02):1–25, 2013.

[PSW13]

Nicholas G Polson, James G Scott, and Jesse Windle. Bayesian inference for logistic models using Pólya–gamma latent variables. Journal of the American statistical Association, 108(504):1339–1349, 2013.

[SWL23]

Jimmy T.H. Smith, Andrew Warrington, and Scott Linderman. Simplified state space layers for sequence modeling. In The Eleventh International Conference on Learning Representations. 2023.

[SDWMG15]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, 2256–2265. PMLR, 2015.

[SE19]

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 2019.

[SSDK+20]

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.

[Tie08]

Tijmen Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, 1064–1071. 2008.

[Tur23]

Richard E Turner. An introduction to transformers. arXiv preprint arXiv:2304.10557, 2023.

[TDM+24]

Richard E Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew YK Foong, and Bruno Mlodozeniec. Denoising diffusion probabilistic models in six simple steps. arXiv preprint arXiv:2402.04384, 2024.

[WPS14]

Jesse Windle, Nicholas G Polson, and James G Scott. Sampling pólya-gamma random variates: alternate and approximate techniques. arXiv preprint arXiv:1405.0506, 2014.