Instructor: Aaron Schein
TAs: Jimmy Lederman, Sean O'Hagan, Tannistha Mondal
Term: Spring 2026
The University of Chicago
- Time: Tuesday and Thursday, 3:30am-4:50pm
- Place: Eckhart room 133
- TA office hours (starting week of March 30):
- Jimmy: Thu 6-7pm (Jones 204B)
- Sean: Fri 11am-12pm (Jones 226)
- Tannistha: Tue 4-5pm (Harper Memorial 151)
- Assignment 1: Bayesian linear regression. Due Sunday April 5 at 11:59pm on GradeScope.
- Assignment 2: Hierarchical models and Gibbs sampling. Due Monday April 13 at 11:59pm on GradeScope.
- Assignment 3: Mixture models and EM. Due Monday April 20 at 11:59pm on GradeScope.
- Assignment 4: HMMs and the sum-product algorithm. Due Monday April 27 at 11:59pm on GradeScope.
- Assignment 5: Poisson matrix factorization and CAVI. Due Sunday May 10 at 11:59pm on GradeScope.
- Assignment 6: BBVI for non-conjugate models. Due Sunday May 17 at 11:59pm on GradeScope.
- Assignment 7: Neural networks and VAEs. Due Sunday May 24 at 11:59pm on GradeScope.
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Materials for L1-L2 from Mathew Stephens' STAT 348 (2021) on the two-class problem and decision theory
- Section 8.6 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on generative vs discriminative classifiers
- Section 3.5 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on Naive Bayes classifiers
- Wikipedia on "Additive smoothing" aka "Laplace smoothing"
- Section 3.3 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on the beta-binomial model
- Chapter VI "On Induction" of Bertrand Russell's Problems of Philosophy on "Bertrand's chicken"
- Chapter 2.2 "The meaning of probability" of David Mackay's Information Theory, Inference, and Learning Algorithms (2003), on frequentist versus subjectivist interpretations of probability
- "Probabilities as betting odds ad the Dutch book" by Caves (2000)
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 9 "Linear Regression" of Deisenroth et al.'s Mathematics for Machine Learning which contains many derivations for quantities in Bayesian linear regression
- Jeffrey Miller's slides on Bayesian linear regression
- Scott Linderman's slides on Bayesian analysis of Gaussian models
- "Conjugate Bayesian analysis of the Gaussian distribution" by Murphy (2007)
- Chapter 28 "Model Comparison and Occam’s Razor" of David Mackay's Information Theory, Inference, and Learning Algorithms (2003)
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Materials for L4 from Mathew Stephens' STAT 348 (2021) on shrinkage, empirical Bayes, the "Normal means" problem
- Scott Linderman's slides on Bayesian analysis of Gaussian models
- Chapter 5 "Hierarchical models" of Andrew Gelman et al.'s Bayesian Data Analysis
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Parts 1 and 2 of David Blei's Lecture materials on the basics of directed PGMs
- Chapters 11.2-11.3 of Bishop (2006) Pattern Recognition and Machine Learning on MCMC and Gibbs sampling
- Matthew Stephen's vignette on Gibbs sampling
- Scott Linderman's slides on MCMC
- "Getting it Right: Joint Distribution Tests of Posterior Simulators" by Geweke (2004) (the original Geweke testing paper)
- Roger Grosse's blogpost on Geweke testing"
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapters 9.1-9.2 of Bishop (2006) Pattern Recognition and Machine Learning on mixture models
- Scott Linderman's slides on Bayesian mixture models
- David Blei's Lecture materials on Bayesian mixture models
- "Dealing with label switching in mixture models" by Stephens (2000)
- David Blei's lectures notes on conjugacy and exponential families
- Jeffrey Miller's slides on conjugate priors
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 9 of Bishop (2006) Pattern Recognition and Machine Learning on mixture models and EM
- Scott Linderman's slides on EM
- Section 6.2.1 (and related sections) of "Graphical models, exponential families, and variational inference" by Wainwright & Jordan (2008) on EM in exponential families
- "Homeomorphic-Invariance of EM..." by Kunstner et al. (2021) on the convergence properties of EM
-
Lecture materials:
Lecture 7 (April 14): Inference and learning in Hidden Markov models (HMMs)
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Scott Linderman's slides on HMMs
- Chapter 17 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on Markov and hidden Markov models
- Chapter 15 "The Navy Searches" of The Theory That Would Not Die on the search for the USS Scorpion
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- David Blei's Lecture materials on the basics of directed and undirected PGMs
- David Blei's Lecture materials on the inference in PGMs
- Chapter 2 of Michael Jordan's Lecture materials on the basics of directed and undirected PGMs
- Chapter 3 of Michael Jordan's Lecture materials on the variable elimination algorithm
- Chapter 4 of Michael Jordan's Lecture materials on sum-product and belief propagation
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 1 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on an intro to information theory
- Chapter 4 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on the source coding theorem
- James Gleick' The Information (2011); a fantastically entertaining general-audience book on the the history / context of information theory.
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 8 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on mutual information
- Chapter 28.3 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on the minimum description length
- Chapter 14.3 of John Duchi's lecture materials on exponential families as maximum entropy distributions
- David Blei's lectures notes on variational inference
-
Lecture materials:
- iPad notes (from last time)
- iPad notes
Lecture 11 (April 28): Coordinate ascent variational inference (CAVI) and latent Dirichlet allocation (LDA)
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- "Latent Dirichlet Allocation" by Blei, Ng, Jordan (2003) the original LDA paper
- "Inference of Population Structure Using Multilocus Genotype Data" by Pritchard, Stephens, Donnelly (2000) the other original LDA paper
- Chapters 10.1-10.4 of Bishop (2006) Pattern Recognition and Machine Learning on variational inference
- David Blei's lectures notes on variational inference
- "Variational inference: A review for statisticians" by Blei, Kucukelbir & McAuliffe (2017) an excellent review paper on VI
- Scott Linderman's slides on CAVI for LDA
- Jeffrey Miller's slides on CAVI for LDA
-
Lecture materials:
Lecture 12 (May 5): Poisson matrix factorization, data augmentation, stochastic variational inference (SVI)
-
Reading / resources:
- "Variational inference: A review for statisticians" by Blei, Kucukelbir & McAuliffe (2017) an excellent review paper on VI and SVI
- Slides from STAT 451 on CAVI and SVI
- "Scalable Recommendation with Poisson factorization" by Gopalan et al. (2014) Poisson MF for recommendation
- "Cookbook-based Scalable Music Tagging with Poisson Matrix Factorization" by Liang, Paisley & Ellis (2014) great example of CAVI/SVI for Poisson MF
- "On the Connection Between Non-Negative Matrix Factorization and Latent Dirichlet Allocation" by Geiger (2024) on the connection
-
Lecture materials:
- iPad notes (from last time; updated)
- iPad notes
-
Reading / resources:
- Scott Linderman's slides on gradient-based VI
- "Black box variational inference" by Ranganath et al. (2013) introduced VI with score function gradients
- "Advances in Variational Inference" by Zhang et al. (2019) excellent survey of modern VI
-
Lecture materials:
-
Reading / resources:
- Scott Linderman's slides on PCA and connection to VAEs
- Scott Linderman's slides on VAEs and amortized VI
- "Auto-Encoding Variational Bayes" by Kingma & Welling (2014) one of the two original VAE papers
- "Stochastic Backpropagation and Approximate Inference in Deep Generative Models" by Rezende et al. (2014) one of the two original VAE papers
- "Advances in Variational Inference" by Zhang et al. (2019) excellent survey of modern VI
- "Amortized Variational Inference: When and Why?" by Margossian & Blei (2024) on the amortization gap
- "Backprop is not just the chain rule" by Tim Vieira famous blogpost on backprop
- "Lossless compression with latent variable models using bits-back coding" by Brian Keng blogpost explaining the "bits-back" argument
-
Lecture materials:
-
Reading / resources:
- "Understanding diffusion models: A unified perspective" by Luo (2022) excellent tutorial on diffusion math
- "Denoising Diffusion Probabilistic Models" by Ho et al. (2020) introduced notion of a forward noising process and learned reverse process
- "Score-Based Generative Modeling through Stochastic Differential Equations" by Song et al. (2021) introduced continuous-time limit using SDEs and connected diffusion to score-based generative modeling
- "Variational diffusion models" by Kingma et al. (2021) introduced variational inference interpretation of diffusion
- Scott Linderman's slides on SDEs and diffusion
-
Lecture materials:
-
Reading / resources:
- Scott Linderman's slides on Poisson point processes
- Scott Linderman's slides on Dirichlet process mixture models
- Peter Orbanz's lecture notes on Bayesian nonparametrics
-
Lecture materials:
-
Reading / resources:
- "Nonparametric learning from Bayesian models with randomized objective functions" by Lyddon, Walker, Holmes (2018)
- "Martingale posterior distributions" by (Fong, Holmes, Walker (2023)
- "Exchangeability, Prediction and Predictive Modeling in Bayesian Statistics" by Fortini & Petrone (2025)
- Post-Bayes seminar series
-
Lecture materials: