References#
Alan Agresti. Categorical data analysis. Volume 792. John Wiley & Sons, 2002. URL: https://onlinelibrary.wiley.com/doi/book/10.1002/0471249688.
James H Albert and Siddhartha Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, pages 669–679, 1993.
Christopher Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
David M Blei. Build, compute, critique, repeat: Data analysis with latent variable models. Annual Review of Statistics and Its Application, 1:203–232, 2014.
Stephen P Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge university press, 2004.
Bradley Efron. Exponential families in theory and practice. Cambridge University Press, 2022.
Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.
Andrew Gelman, John B Carlin, Hal S Stern, Aki Vehtari, and Donald B Rubin. Bayesian Data Analysis. Chapman and Hall/CRC, 1995.
Albert Gu and Tri Dao. Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
Diederik P Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
Jason D Lee, Yuekai Sun, and Michael A Saunders. Proximal Newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420–1443, 2014.
Nicholas G Polson, James G Scott, and Jesse Windle. Bayesian inference for logistic models using Pólya–gamma latent variables. Journal of the American statistical Association, 108(504):1339–1349, 2013.
Jimmy T.H. Smith, Andrew Warrington, and Scott Linderman. Simplified state space layers for sequence modeling. In The Eleventh International Conference on Learning Representations. 2023.
Richard E Turner. An introduction to transformers. arXiv preprint arXiv:2304.10557, 2023.
Richard E Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew YK Foong, and Bruno Mlodozeniec. Denoising diffusion probabilistic models in six simple steps. arXiv preprint arXiv:2402.04384, 2024.
Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2):1–305, 2008.
Jesse Windle, Nicholas G Polson, and James G Scott. Sampling pólya-gamma random variates: alternate and approximate techniques. arXiv preprint arXiv:1405.0506, 2014.