Quasi-gradients of discrete parameters

December 20, 2022 — May 17, 2024

calculus
classification
probabilistic algorithms
optimization
probability
statistics

Notes on taking gradients through functions that look like they have no gradients because their arguments are discrete. TBC.

See also Polya-Gamma

1 Stochastic gradients via REINFORCE

The classic REINFORCE/Score function method for estimating gradients of expectations of functions of random variables can be used to estimate gradients of functions of discrete random variables. There are particular tricks used for the case of discrete random variables to reduce variance; see (Grathwohl et al. 2018; Liu et al. 2019; Mnih and Gregor 2014; Tucker et al. 2017).

2 Gumbel-(soft)max

See Gumbel-max.

3 Other methods

What even are (Grathwohl et al. 2021; Zhang, Liu, and Liu 2022)? I think they work for quantised continuous vars, or possibly ordinal vars?

4 Examples

5 References

Arya, Schauer, Schäfer, et al. 2022. Automatic Differentiation of Programs with Discrete Randomness.” In.
Grathwohl, Choi, Wu, et al. 2018. Backpropagation Through the Void: Optimizing Control Variates for Black-Box Gradient Estimation.” In Proceedings of ICLR.
Grathwohl, Swersky, Hashemi, et al. 2021. Oops I Took A Gradient: Scalable Sampling for Discrete Distributions.”
Liu, Regier, Tripuraneni, et al. 2019. Rao-Blackwellized Stochastic Gradients for Discrete Distributions.” In.
Mnih, and Gregor. 2014. Neural Variational Inference and Learning in Belief Networks.” In Proceedings of The 31st International Conference on Machine Learning. ICML’14.
Prillo, and Eisenschlos. 2020. SoftSort: A Continuous Relaxation for the Argsort Operator.”
Shi, Zhou, Hwang, et al. 2022. Gradient Estimation with Discrete Stein Operators.” In Advances in Neural Information Processing Systems.
Tucker, Mnih, Maddison, et al. 2017. REBAR: Low-Variance, Unbiased Gradient Estimates for Discrete Latent Variable Models.” In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17.
Zhang, Liu, and Liu. 2022. A Langevin-Like Sampler for Discrete Distributions.” In Proceedings of the 39th International Conference on Machine Learning.