Quasi-gradients of discrete parameters

December 20, 2022 — May 17, 2024

calculus

classification

probabilistic algorithms

optimization

probability

statistics

Notes on taking gradients through functions that look like they have no gradients because their arguments are discrete. TBC.

1 Stochastic gradients via REINFORCE

The classic REINFORCE/Score function method for estimating gradients of expectations of functions of random variables can be used to estimate gradients of functions of discrete random variables. There are particular tricks used for the case of discrete random variables to reduce variance; see (Grathwohl et al. 2018; Liu et al. 2019; Mnih and Gregor 2014; Tucker et al. 2017).

What even are (Grathwohl et al. 2021; Zhang, Liu, and Liu 2022)? I think they work for quantised continuous vars, or possibly ordinal vars?

Mnih, and Gregor. 2014. “Neural Variational Inference and Learning in Belief Networks.” In Proceedings of The 31st International Conference on Machine Learning. ICML’14.

Shi, Zhou, Hwang, et al. 2022. “Gradient Estimation with Discrete Stein Operators.” In Advances in Neural Information Processing Systems.

Tucker, Mnih, Maddison, et al. 2017. “REBAR: Low-Variance, Unbiased Gradient Estimates for Discrete Latent Variable Models.” In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.

Zhang, Liu, and Liu. 2022. “A Langevin-Like Sampler for Discrete Distributions.” In Proceedings of the 39th International Conference on Machine Learning.