Instructor: Aaron Schein
TAs: Jimmy Lederman, Sean O'Hagan, Jinwen Yang
Term: Spring 2025
The University of Chicago
- Time: Tuesday and Thursday, 3:30am-4:50pm
- Place: Eckhart room 133
- TA office hours (starting week of March 31):
- Jimmy: Fri 9-10am (Jones 304)
- Sean: Wed 10-11am (Jones 304)
- Jinwen: Thu 11am-12pm (Jones 303)
- Assignment 1: Bayesian linear regression. Due Sunday April 6 at 11:59pm on GradeScope.
- Assignment 2: Hierarchical models and Gibbs sampling. Due Sunday April 13 at 11:59pm on GradeScope.
- Assignment 3: Mixture models and EM. Due Sunday April 20 at 11:59pm on GradeScope.
- Assignment 4: HMMs and the sum-product algorithm. Due Tuesday May 6 at 11:59pm on GradeScope.
- Assignment 5: Poisson matrix factorization and CAVI. Due Wednesday May 14 at 11:59pm on GradeScope.
- Assignment 6: Neural networks and variational autoencoders (VAEs). Due Wednesday May 21 at 11:59pm on GradeScope.
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Materials for L1-L2 from Mathew Stephens' STAT 348 (2021) on the two-class problem and decision theory
- Section 8.6 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on generative vs discriminative classifiers
- Section 3.5 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on Naive Bayes classifiers
- Wikipedia on "Additive smoothing" aka "Laplace smoothing"
- Section 3.3 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on the beta-binomial model
- Chapter VI "On Induction" of Bertrand Russell's Problems of Philosophy on "Bertrand's chicken"
- Chapter 2.2 "The meaning of probability" of David Mackay's Information Theory, Inference, and Learning Algorithms (2003), on frequentist versus subjectivist interpretations of probability
-
Lecture materials:
- iPad notes (apologies for the handwriting)
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 9 "Linear Regression" of Deisenroth et al.'s Mathematics for Machine Learning which contains many derivations for quantities in Bayesian linear regression
- Jeffrey Miller's slides on Bayesian linear regression
- Scott Linderman's slides on Bayesian analysis of Gaussian models
- "Conjugate Bayesian analysis of the Gaussian distribution" by Murphy (2007)
- Chapter 28 "Model Comparison and Occam’s Razor" of David Mackay's Information Theory, Inference, and Learning Algorithms (2003)
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Materials for L4 from Mathew Stephens' STAT 348 (2021) on shrinkage, empirical Bayes, the "Normal means" problem
- Scott Linderman's slides on Bayesian analysis of Gaussian models
- Chapter 5 "Hierarchical models" of Andrew Gelman et al.'s Bayesian Data Analysis
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Parts 1 and 2 of David Blei's Lecture materials on the basics of directed PGMs
- Chapters 11.2-11.3 of Bishop (2006) Pattern Recognition and Machine Learning on MCMC and Gibbs sampling
- Matthew Stephen's vignette on Gibbs sampling
- Scott Linderman's slides on MCMC
- "Getting it Right: Joint Distribution Tests of Posterior Simulators" by Geweke (2004) (the original Geweke testing paper)
- Roger Grosse's blogpost on Geweke testing"
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapters 9.1-9.2 of Bishop (2006) Pattern Recognition and Machine Learning on mixture models
- Scott Linderman's slides on Bayesian mixture models
- David Blei's Lecture materials on Bayesian mixture models
- "Dealing with label switching in mixture models" by Stephens (2000)
- David Blei's lectures notes on conjugacy and exponential families
- Jeffrey Miller's slides on conjugate priors
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 9 of Bishop (2006) Pattern Recognition and Machine Learning on mixture models and EM
- Scott Linderman's slides on EM
- Section 6.2.1 (and related sections) of "Graphical models, exponential families, and variational inference" by Wainwright & Jordan (2008) on EM in exponential families
- "Homeomorphic-Invariance of EM..." by Kunstner et al. (2021) on the convergence properties of EM
-
Lecture materials:
Lecture 7 (April 15): Inference and learning in Hidden Markov models (HMMs)
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Scott Linderman's slides on HMMs
- Chapter 17 of Kevin Murphy's Machine Learning: a Probabilistic Perspective (2012) on Markov and hidden Markov models
- Chapter 15 "The Navy Searches" of The Theory That Would Not Die on the search for the USS Scorpion
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- David Blei's Lecture materials on the basics of directed and undirected PGMs
- David Blei's Lecture materials on the inference in PGMs
- Chapter 2 of Michael Jordan's Lecture materials on the basics of directed and undirected PGMs
- Chapter 3 of Michael Jordan's Lecture materials on the variable elimination algorithm
- Chapter 4 of Michael Jordan's Lecture materials on sum-product and belief propagation
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 1 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on an intro to information theory
- Chapter 4 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on the source coding theorem
- James Gleick' The Information (2011); a fantastically entertaining general-audience book on the the history / context of information theory.
-
Lecture materials:
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- Chapter 8 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on mutual information
- Chapter 28.3 of David MacKay's Information Theory, Inference, and Learning Algorithms (2005) on the minimum description length
- Chapter 14.3 of John Duchi's lecture materials on exponential families as maximum entropy distributions
- David Blei's lectures notes on variational inference
-
Lecture materials:
- iPad notes (from last time)
- iPad notes
Lecture 11 (May 1): Coordinate ascent variational inference (CAVI) and latent Dirichlet allocation (LDA)
-
Reading / resources (optional; for reference) roughly in the order as they appeared in lecture:
- "Latent Dirichlet Allocation" by Blei, Ng, Jordan (2003) the original LDA paper
- "Inference of Population Structure Using Multilocus Genotype Data" by Pritchard, Stephens, Donnelly (2000) the other original LDA paper
- Chapters 10.1-10.4 of Bishop (2006) Pattern Recognition and Machine Learning on variational inference
- David Blei's lectures notes on variational inference
- "Variational inference: A review for statisticians" by Blei, Kucukelbir & McAuliffe (2017) an excellent review paper on VI
- Scott Linderman's slides on CAVI for LDA
- Jeffrey Miller's slides on CAVI for LDA
-
Lecture materials:
- iPad notes (updated)
Lecture 12 (May 6): Poisson matrix factorization, data augmentation, stochastic variational inference (SVI)
-
Reading / resources:
- "Variational inference: A review for statisticians" by Blei, Kucukelbir & McAuliffe (2017) an excellent review paper on VI and SVI
- Slides from STAT 451 on CAVI and SVI
- "Scalable Recommendation with Poisson factorization" by Gopalan et al. (2014) Poisson MF for recommendation
- "Cookbook-based Scalable Music Tagging with Poisson Matrix Factorization" by Liang, Paisley & Ellis (2014) great example of CAVI/SVI for Poisson MF
- "On the Connection Between Non-Negative Matrix Factorization and Latent Dirichlet Allocation" by Geiger (2024) on the connection
-
Lecture materials:
- iPad notes (from last time; updated)
- iPad notes
-
Reading / resources:
- Chapters 1, 4, 9 of Ballard & Kolda's Tensor Decompositions for Data Science (2024) great intro / reference for tensor decomposition
- "The ALL0CORE Tensor Decomposition..." by Hood & Schein (2024) the central paper of the talk; see references therein
-
Lecture materials:
-
Reading / resources:
- Scott Linderman's slides on VAEs and amortized VI
- Scott Linderman's slides on PCA and connection to VAEs
- "Auto-Encoding Variational Bayes" by Kingma & Welling (2014) one of the two original VAE papers
- "Stochastic Backpropagation and Approximate Inference in Deep Generative Models" by Rezende et al. (2014) one of the two original VAE papers
- "Advances in Variational Inference" by Zhang et al. (2019) excellent survey of modern VI
- "Inference Suboptimality in VAEs" by Cremer et al. (2018) on the amortization gap
- "Amortized Variational Inference: When and Why?" by Margossian & Blei (2024) on the amortization gap
- "Backprop is not just the chain rule" by Tim Vieira famous blogpost on backprop
- "Lossless compression with latent variable models using bits-back coding" by Brian Keng blogpost explaining the "bits-back" argument
-
Lecture materials:
- Reading / resources:
- Scott Linderman's slides on gradient-based VI
- "Black box variational inference" by Ranganath et al. (2013) introduced VI with score function gradients
- "Advances in Variational Inference" by Zhang et al. (2019) excellent survey of modern VI
-
Reading / resources:
- "Understanding diffusion models: A unified perspective" by Luo (2022) excellent tutorial on diffusion math
- "Denoising Diffusion Probabilistic Models" by Ho et al. (2020) introduced notion of a forward noising process and learned reverse process
- "Score-Based Generative Modeling through Stochastic Differential Equations" by Song et al. (2021) introduced continuous-time limit using SDEs and connected diffusion to score-based generative modeling
- "Variational diffusion models" by Kingma et al. (2021) introduced variational inference interpretation of diffusion
- Scott Linderman's slides on SDEs and diffusion
-
Lecture materials:
- Reading / resources:
- "The illustrated transformer by Alammar (2025) friendly overview of transformer architecture
- "Understanding and Coding the Self-Attention Mechanism..." by Raschka (2023) friendly walkthrough of self-attention
- "A mathematical framework for transformer circuits" by Elhage et al. (2021) excellent formal introduction to transformers
- "Whose Opinions Do Language Models Reflect?" by Santurkar et al. (2023) on the political biases of LLMs
- "Measurent in the age of LLMs..." by Hood & Schein (2023) on using LLMs to measure ideology
- "Linear representations of political perspective..." by Kim et al. (2025) on probing LLMs for ideology
- "Text-based ideal points" by Vafa et al. (2020) on ideal point modeling