World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Lagrangian and Hamiltonian dynamics for probabilities on the statistical bundle

    https://doi.org/10.1142/S0219887822502140Cited by:7 (Source: Crossref)

    We provide an Information-Geometric formulation of accelerated natural gradient on the Riemannian manifold of probability distributions, which is an affine manifold endowed with a dually-flat connection. In a non-parametric formalism, we consider the full set of positive probability functions on a finite sample space, and we provide a specific expression for the tangent and cotangent spaces over the statistical manifold, in terms of a Hilbert bundle structure that we call the Statistical Bundle. In this setting, we compute velocities and accelerations of a one-dimensional statistical model using the canonical dual pair of parallel transports and define a coherent formalism for Lagrangian and Hamiltonian mechanics on the bundle. We show how our formalism provides a consistent framework for accelerated natural gradient dynamics on the probability simplex, paving the way for direct applications in optimization.

    AMSC: 53B12, 62B11, 65K10

    References

    • 1. R. Abraham and J. E. Marsden, Foundations of Mechanics, Advanced Book Program, 2nd edn. (Benjamin/Cummings Publishing Company, Reading, Massachusetts, 1978), MR 515141. Google Scholar
    • 2. P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds (Princeton University Press, 2008), With a foreword by Paul Van Dooren. CrossrefGoogle Scholar
    • 3. K. Ahn and S. Sra, From Nesterov’s estimate sequence to Riemannian acceleration, Proc. Thirty Third Conf. Learning Theory, Proceedings of Machine Learning Research, Vol. 125, eds. J. Abernethy and S. Agarwal (PMLR, 2020), pp. 84–118. Google Scholar
    • 4. J. Aitchison, The Statistical Analysis of Compositional Data, Monographs on Statistics and Applied Probability (Chapman & Hall, London, 1986), MR 865647. CrossrefGoogle Scholar
    • 5. F. Alimisis, A. Orvieto, G. Bécigneul and A. Lucchi, A continuous-time perspective for modeling acceleration in Riemannian optimization, Proc. Twenty Third Int. Conf. Artificial Intelligence and Statistics (PMLR, 2019), pp. 1297–1307. Google Scholar
    • 6. S.-i. Amari, Differential-Geometrical Methods in Statistics, Lecture Notes in Statistics, Vol. 28 (Springer-Verlag, 1985), MR 86m:62053. CrossrefGoogle Scholar
    • 7. S.-i. Amari, Dual connections on the Hilbert bundles of statistical models, Geometrization of Statistical Theory (ULDM, Lancaster, 1987), pp. 123–151. Google Scholar
    • 8. S.-i. Amari, Natural gradient works efficiently in learning, Neural Comput. 10(2) (1998) 251–276. Crossref, Web of ScienceGoogle Scholar
    • 9. S.-i. Amari and H. Nagaoka, Methods of Information Geometry, (American Mathematical Society, 2000), Translated from the 1993 Japanese original by D. Harada, MR 1 800 071. Google Scholar
    • 10. V. I. Arnold, Mathematical Methods of Classical Mechanics, Graduate Texts in Mathematics, Vol. 60 (Springer-Verlag, New York, 1989), Translated from the 1974 Russian original by K. Vogtmann and A. Weinstein, Corrected reprint of the second edition (1989), MR 1345386. CrossrefGoogle Scholar
    • 11. H. Attouch, Z. Chbani, J. Peypouquet and P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. 168(1) (2018) 123–175. Crossref, Web of ScienceGoogle Scholar
    • 12. N. Ay, J. Jost, H. V. Lê and L. Schwachhöfer, Information Geometry, Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics], Vol. 64 (Springer, Cham, 2017), MR 3701408. CrossrefGoogle Scholar
    • 13. H. Bateman, On dissipative systems and related variational principles, Phys. Rev. 38 (1931) 815–819. CrossrefGoogle Scholar
    • 14. M. Betancourt, A conceptual introduction to Hamiltonian Monte Carlo, preprint (2017), arXiv:1701.02434. Google Scholar
    • 15. L. Bottou, F. E. Curtis and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev. 60(2) (2018) 223–311. Crossref, Web of ScienceGoogle Scholar
    • 16. G. Chirco, Rényi Relative Entropy from Homogeneous Kullback-Leibler Divergence Lagrangian, Geometric Science of Information (Cham), eds. F. Nielsen and F. Barbaresco (Springer International Publishing, 2021), pp. 744–751. CrossrefGoogle Scholar
    • 17. G. Chirco, T. Josset and C. Rovelli, Statistical mechanics of reparametrization-invariant systems. It takes three to tango., Class. Quantum Grav. 33(4) (2016) 045005. Crossref, Web of ScienceGoogle Scholar
    • 18. F. M. Ciaglia, F. Di Cosmo, D. Felice, S. Mancini, G. Marmo and J. M. Pérez-Pardo, Hamilton-Jacobi approach to potential functions in information geometry, J. Math. Phys. 58(6) (2017) 063506. Crossref, Web of ScienceGoogle Scholar
    • 19. B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, Institute of Mathematical Statistics (IMS) Monographs, Vol. 5 (Cambridge University Press, New York, 2016), MR 3523956. CrossrefGoogle Scholar
    • 20. D. Felice and N. Ay, Dynamical systems induced by canonical divergence in dually flat manifolds, preprint (2018), arXiv:1812.04461. Google Scholar
    • 21. G. França, M. I. Jordan and R. Vidal, On dissipative symplectic integration with applications to gradient-based optimization, J. Stat. Mech. 2021 (2021) 043402. Crossref, Web of ScienceGoogle Scholar
    • 22. G. França, J. Sulam, D. P. Robinson and R. Vidal, Conformal symplectic and relativistic optimization, preprint (2019), arXiv:1903.04100. Google Scholar
    • 23. P. Gibilisco and G. Pistone, Connections on non-parametric statistical manifolds by Orlicz space geometry, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1(2) (1998) 325–347, MR 1 628 177. Link, Web of ScienceGoogle Scholar
    • 24. D. M. Greenberger, A critique of the major approaches to damping in quantum theory, J. Math. Phys. 20(5) (1979) 762–770. Crossref, Web of ScienceGoogle Scholar
    • 25. L. Herrera, L. Núñez, A. Patiño and H. Rago, A variational principle and the classical and quantum mechanics of the damped harmonic oscillator, Am. J. Phys. 54 (1986) 273. Crossref, Web of ScienceGoogle Scholar
    • 26. R. E. Kass and P. W. Vos, Geometrical Foundations of Asymptotic Inference, Wiley Series in Probability and Statistics: Probability and Statistics (John Wiley & Sons, New York, 1997), MR 1461540 (99b:62032). CrossrefGoogle Scholar
    • 27. W. P. A. Klingenberg, Riemannian Geometry, De Gruyter Studies in Mathematics, Vol. 1, 2nd edn. (Walter de Gruyter & Company, Berlin, 1995), MR 1330918. CrossrefGoogle Scholar
    • 28. W. Krichene, A. Bayen and P. L. Bartlett, Accelerated Mirror Descent in Continuous and Discrete Time, Advances in Neural Information Processing Systems, Vol. 28, eds. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama and R. Garnett (Curran Associates, 2015), pp. 2845–2853. Google Scholar
    • 29. M. Kumon and S.-i. Amari, Differential geometry of testing hypothesis — a higher order asymptotic theory in multi-parameter curved exponential family, J. Fac. Eng. Univ. Tokyo Ser. B 39(3) (1988) 241–273, MR 1407894. Google Scholar
    • 30. S. Lang, Differential and Riemannian Manifolds, Graduate Texts in Mathematics, Vol. 160, 3rd edn. (Springer-Verlag, 1995), MR 96d:53001. CrossrefGoogle Scholar
    • 31. M. Leok and J. Zhang, Connecting information geometry and geometric mechanics, Entropy 19(10) (2017) 518. Crossref, Web of ScienceGoogle Scholar
    • 32. Y. Liu, F. Shang, J. Cheng, H. Cheng and L. Jiao, Accelerated first-order methods for geodesically convex optimization on Riemannian manifolds, Proc. 31st Int. Conf. Neural Information Processing Systems, NIPS’17 (Curran Associates, Red Hook, NY, 2017), p. 4875–4884. CrossrefGoogle Scholar
    • 33. L. Malagò and G. Pistone, Combinatorial optimization with information geometry: Newton method, Entropy 16 (2014) 4260–4289. Crossref, Web of ScienceGoogle Scholar
    • 34. M. Michałek, B. Sturmfels, C. Uhler and P. Zwiernik, Exponential varieties, Proc. Lond. Math. Soc. (3) 112(1) (2016) 27–56, MR 3458144. Crossref, Web of ScienceGoogle Scholar
    • 35. A. Semenovich, A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization (Wiley, Chichester, 1983). Google Scholar
    • 36. Y. Nesterov, A method for solving the convex programming problem with convergence rate o(1/k2), Proc. USSR Acad. Sci. 269 (1983) 543–547. Web of ScienceGoogle Scholar
    • 37. Y. Nesterov, Smooth minimization of non-smooth functions, Math. Program. 103(1) (2005) 127–152. Crossref, Web of ScienceGoogle Scholar
    • 38. Y. Nesterov, Gradient methods for minimizing composite objective function, LIDAM Discussion Papers CORE 2007076 (Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2007). Google Scholar
    • 39. Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. (Springer Publishing Company, 2014). Google Scholar
    • 40. G. Pistone, Examples of the application of nonparametric information geometry to statistical physics, Entropy 15(10) (2013) 4042–4065. Crossref, Web of ScienceGoogle Scholar
    • 41. G. Pistone, Nonparametric information geometry, Geometric Science of Information, Proc. First Int. Conf., GSI 2013, Lecture Notes in Computer Science, Vol. 8085, eds. F. Nielsen and F. Barbaresco (Springer, Heidelberg, 2013), pp. 5–36, MR 3126029. CrossrefGoogle Scholar
    • 42. G. Pistone, Lagrangian function on the finite state space statistical bundle, Entropy 20(2) (2018) 139. Crossref, Web of ScienceGoogle Scholar
    • 43. G. Pistone, Information geometry of the probability simplex: A short course, Nonlinear Phenom. Complex Syst. 23(2) (2020) 221–242. Crossref, Web of ScienceGoogle Scholar
    • 44. G. Pistone and C. Sempi, An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one, Ann. Statist. 23(5) (1995) 1543–1561, MR 97j:62006. Crossref, Web of ScienceGoogle Scholar
    • 45. A. Rényi, On measures of entropy and information, Proc. Fourth Berkeley Symposium on Mathematical Statistics and Probability, Contributions to the Theory of Statistics, Vol. 1 (University of California Press, Berkeley, California, 1961), pp. 547–561. Google Scholar
    • 46. H. Shima, The Geometry of Hessian Structures (World Scientific Publishing Company, Hackensack, NJ, 2007), MR 2293045. LinkGoogle Scholar
    • 47. J.-M. Souriau, Structure des Systèmes Dynamiques (Dunod, 1970), Réimpression autorisées, 2e tirage (Jacques Gabay, 2012). Google Scholar
    • 48. W. Su, S. Boyd and E. J. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights, J. Mach. Learn. Res. 17(153) (2016) 1–43. Google Scholar
    • 49. A. Taghvaei and P. Mehta, Accelerated flow for probability distributions, Proc. Machine Learning Research (Long Beach, California, USA), Vol. 97, eds. K. Chaudhuri and R. Salakhutdinov (PMLR, 2019), pp. 6076–6085. Google Scholar
    • 50. Y. Wang and W. Li, Accelerated information gradient flow, preprint (2019), arXiv:1909.02102. Google Scholar
    • 51. A. Wibisono, A. C. Wilson and M. I. Jordan, A variational perspective on accelerated methods in optimization, Proc. Nat. Acad. Sci. 113(47) (2016) E7351–E7358. Crossref, Web of ScienceGoogle Scholar
    • 52. H. Zhang and S. Sra, Towards Riemannian accelerated gradient methods, preprint (2018), arXiv:1806.02812. Google Scholar
    Remember to check out the Most Cited Articles!

    Check out new Mathematical Physics books in our Mathematics 2021 catalogue
    Featuring authors Bang-Yen Chen, John Baez, Matilde Marcolli and more!