Lagrangian and Hamiltonian dynamics for probabilities on the statistical bundle
Abstract
We provide an Information-Geometric formulation of accelerated natural gradient on the Riemannian manifold of probability distributions, which is an affine manifold endowed with a dually-flat connection. In a non-parametric formalism, we consider the full set of positive probability functions on a finite sample space, and we provide a specific expression for the tangent and cotangent spaces over the statistical manifold, in terms of a Hilbert bundle structure that we call the Statistical Bundle. In this setting, we compute velocities and accelerations of a one-dimensional statistical model using the canonical dual pair of parallel transports and define a coherent formalism for Lagrangian and Hamiltonian mechanics on the bundle. We show how our formalism provides a consistent framework for accelerated natural gradient dynamics on the probability simplex, paving the way for direct applications in optimization.
References
- 1. , Foundations of Mechanics,
Advanced Book Program , 2nd edn. (Benjamin/Cummings Publishing Company, Reading, Massachusetts, 1978), MR 515141. Google Scholar - 2. , Optimization Algorithms on Matrix Manifolds (Princeton University Press, 2008), With a foreword by Paul Van Dooren. Crossref, Google Scholar
- 3. , From Nesterov’s estimate sequence to Riemannian acceleration, Proc. Thirty Third Conf. Learning Theory,
Proceedings of Machine Learning Research , Vol. 125, eds. J. Abernethy and S. Agarwal (PMLR, 2020), pp. 84–118. Google Scholar - 4. , The Statistical Analysis of Compositional Data,
Monographs on Statistics and Applied Probability (Chapman & Hall, London, 1986), MR 865647. Crossref, Google Scholar - 5. , A continuous-time perspective for modeling acceleration in Riemannian optimization, Proc. Twenty Third Int. Conf. Artificial Intelligence and Statistics (PMLR, 2019), pp. 1297–1307. Google Scholar
- 6. , Differential-Geometrical Methods in Statistics,
Lecture Notes in Statistics , Vol. 28 (Springer-Verlag, 1985), MR 86m:62053. Crossref, Google Scholar - 7. ,
Dual connections on the Hilbert bundles of statistical models , Geometrization of Statistical Theory (ULDM, Lancaster, 1987), pp. 123–151. Google Scholar - 8. , Natural gradient works efficiently in learning, Neural Comput. 10(2) (1998) 251–276. Crossref, Web of Science, Google Scholar
- 9. , Methods of Information Geometry, (American Mathematical Society, 2000), Translated from the 1993 Japanese original by D. Harada, MR 1 800 071. Google Scholar
- 10. , Mathematical Methods of Classical Mechanics,
Graduate Texts in Mathematics , Vol. 60 (Springer-Verlag, New York, 1989), Translated from the 1974 Russian original by K. Vogtmann and A. Weinstein, Corrected reprint of the second edition (1989), MR 1345386. Crossref, Google Scholar - 11. , Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. 168(1) (2018) 123–175. Crossref, Web of Science, Google Scholar
- 12. , Information Geometry,
Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics] , Vol. 64 (Springer, Cham, 2017), MR 3701408. Crossref, Google Scholar - 13. , On dissipative systems and related variational principles, Phys. Rev. 38 (1931) 815–819. Crossref, Google Scholar
- 14. M. Betancourt, A conceptual introduction to Hamiltonian Monte Carlo, preprint (2017), arXiv:1701.02434. Google Scholar
- 15. , Optimization methods for large-scale machine learning, SIAM Rev. 60(2) (2018) 223–311. Crossref, Web of Science, Google Scholar
- 16. , Rényi Relative Entropy from Homogeneous Kullback-Leibler Divergence Lagrangian,
Geometric Science of Information (Cham) , eds. F. Nielsen and F. Barbaresco (Springer International Publishing, 2021), pp. 744–751. Crossref, Google Scholar - 17. , Statistical mechanics of reparametrization-invariant systems. It takes three to tango., Class. Quantum Grav. 33(4) (2016) 045005. Crossref, Web of Science, Google Scholar
- 18. , Hamilton-Jacobi approach to potential functions in information geometry, J. Math. Phys. 58(6) (2017) 063506. Crossref, Web of Science, Google Scholar
- 19. , Computer Age Statistical Inference: Algorithms, Evidence, and Data Science,
Institute of Mathematical Statistics (IMS) Monographs , Vol. 5 (Cambridge University Press, New York, 2016), MR 3523956. Crossref, Google Scholar - 20. D. Felice and N. Ay, Dynamical systems induced by canonical divergence in dually flat manifolds, preprint (2018), arXiv:1812.04461. Google Scholar
- 21. , On dissipative symplectic integration with applications to gradient-based optimization, J. Stat. Mech. 2021 (2021) 043402. Crossref, Web of Science, Google Scholar
- 22. G. França, J. Sulam, D. P. Robinson and R. Vidal, Conformal symplectic and relativistic optimization, preprint (2019), arXiv:1903.04100. Google Scholar
- 23. , Connections on non-parametric statistical manifolds by Orlicz space geometry, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1(2) (1998) 325–347, MR 1 628 177. Link, Web of Science, Google Scholar
- 24. , A critique of the major approaches to damping in quantum theory, J. Math. Phys. 20(5) (1979) 762–770. Crossref, Web of Science, Google Scholar
- 25. , A variational principle and the classical and quantum mechanics of the damped harmonic oscillator, Am. J. Phys. 54 (1986) 273. Crossref, Web of Science, Google Scholar
- 26. , Geometrical Foundations of Asymptotic Inference,
Wiley Series in Probability and Statistics: Probability and Statistics (John Wiley & Sons, New York, 1997), MR 1461540 (99b:62032). Crossref, Google Scholar - 27. , Riemannian Geometry,
De Gruyter Studies in Mathematics , Vol. 1, 2nd edn. (Walter de Gruyter & Company, Berlin, 1995), MR 1330918. Crossref, Google Scholar - 28. , Accelerated Mirror Descent in Continuous and Discrete Time,
Advances in Neural Information Processing Systems , Vol. 28, eds. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama and R. Garnett (Curran Associates, 2015), pp. 2845–2853. Google Scholar - 29. , Differential geometry of testing hypothesis — a higher order asymptotic theory in multi-parameter curved exponential family, J. Fac. Eng. Univ. Tokyo Ser. B 39(3) (1988) 241–273, MR 1407894. Google Scholar
- 30. , Differential and Riemannian Manifolds,
Graduate Texts in Mathematics , Vol. 160, 3rd edn. (Springer-Verlag, 1995), MR 96d:53001. Crossref, Google Scholar - 31. , Connecting information geometry and geometric mechanics, Entropy 19(10) (2017) 518. Crossref, Web of Science, Google Scholar
- 32. , Accelerated first-order methods for geodesically convex optimization on Riemannian manifolds, Proc. 31st Int. Conf. Neural Information Processing Systems, NIPS’17 (Curran Associates, Red Hook, NY, 2017), p. 4875–4884. Crossref, Google Scholar
- 33. , Combinatorial optimization with information geometry: Newton method, Entropy 16 (2014) 4260–4289. Crossref, Web of Science, Google Scholar
- 34. , Exponential varieties, Proc. Lond. Math. Soc. (3) 112(1) (2016) 27–56, MR 3458144. Crossref, Web of Science, Google Scholar
- 35. , Problem Complexity and Method Efficiency in Optimization (Wiley, Chichester, 1983). Google Scholar
- 36. , A method for solving the convex programming problem with convergence rate , Proc. USSR Acad. Sci. 269 (1983) 543–547. Web of Science, Google Scholar
- 37. , Smooth minimization of non-smooth functions, Math. Program. 103(1) (2005) 127–152. Crossref, Web of Science, Google Scholar
- 38. Y. Nesterov, Gradient methods for minimizing composite objective function, LIDAM Discussion Papers CORE 2007076 (Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2007). Google Scholar
- 39. , Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. (Springer Publishing Company, 2014). Google Scholar
- 40. , Examples of the application of nonparametric information geometry to statistical physics, Entropy 15(10) (2013) 4042–4065. Crossref, Web of Science, Google Scholar
- 41. , Nonparametric information geometry, Geometric Science of Information, Proc. First Int. Conf., GSI 2013,
Lecture Notes in Computer Science , Vol. 8085, eds. F. Nielsen and F. Barbaresco (Springer, Heidelberg, 2013), pp. 5–36, MR 3126029. Crossref, Google Scholar - 42. , Lagrangian function on the finite state space statistical bundle, Entropy 20(2) (2018) 139. Crossref, Web of Science, Google Scholar
- 43. , Information geometry of the probability simplex: A short course, Nonlinear Phenom. Complex Syst. 23(2) (2020) 221–242. Crossref, Web of Science, Google Scholar
- 44. , An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one, Ann. Statist. 23(5) (1995) 1543–1561, MR 97j:62006. Crossref, Web of Science, Google Scholar
- 45. , On measures of entropy and information, Proc. Fourth Berkeley Symposium on Mathematical Statistics and Probability,
Contributions to the Theory of Statistics , Vol. 1 (University of California Press, Berkeley, California, 1961), pp. 547–561. Google Scholar - 46. , The Geometry of Hessian Structures (World Scientific Publishing Company, Hackensack, NJ, 2007), MR 2293045. Link, Google Scholar
- 47. , Structure des Systèmes Dynamiques (Dunod, 1970), Réimpression autorisées, tirage (Jacques Gabay, 2012). Google Scholar
- 48. , A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, J. Mach. Learn. Res. 17(153) (2016) 1–43. Google Scholar
- 49. , Accelerated flow for probability distributions, Proc. Machine Learning Research (Long Beach, California, USA), Vol. 97, eds. K. Chaudhuri and R. Salakhutdinov (PMLR, 2019), pp. 6076–6085. Google Scholar
- 50. Y. Wang and W. Li, Accelerated information gradient flow, preprint (2019), arXiv:1909.02102. Google Scholar
- 51. , A variational perspective on accelerated methods in optimization, Proc. Nat. Acad. Sci. 113(47) (2016) E7351–E7358. Crossref, Web of Science, Google Scholar
- 52. H. Zhang and S. Sra, Towards Riemannian accelerated gradient methods, preprint (2018), arXiv:1806.02812. Google Scholar
Remember to check out the Most Cited Articles! |
---|
Check out new Mathematical Physics books in our Mathematics 2021 catalogue |