For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability
Abstract
In this paper, we study kernel ridge-less regression, including the case of interpolating solutions. We prove that maximizing the leave-one-out () stability minimizes the expected error. Further, we also prove that the minimum norm solution — to which gradient algorithms are known to converge — is the most stable solution. More precisely, we show that the minimum norm interpolating solution minimizes a bound on stability, which in turn is controlled by the smallest singular value, hence the condition number, of the empirical kernel matrix. These quantities can be characterized in the asymptotic regime where both the dimension () and cardinality () of the data go to infinity (with as ). Our results suggest that the property of stability of the learning algorithm with respect to perturbations of the training set may provide a more general framework than the classical theory of Empirical Risk Minimization (ERM). While ERM was developed to deal with the classical regime in which the architecture of the learning network is fixed and , the modern regime focuses on interpolating regressors and overparameterized models, when both and go to infinity. Since the stability framework is known to be equivalent to the classical theory in the classical regime, our results here suggest that it may be interesting to extend it beyond kernel regression to other overparameterized algorithms such as deep networks.
References
- 1. S. Arora, S. S. Du, W. Hu, Z. Y. Li and R. Wang, Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks, preprint (2019), arXiv:1901.08584. Google Scholar
- 2. , A revisitation of formulae for the Moore–Penrose inverse of modified matrices, Linear Algebra Appl. 372 (2003) 207–224. ISI, Google Scholar
- 3. P. L. Bartlett, P. M. Long, G. Lugosi and A. Tsigler, Benign overfitting in linear regression, preprint (2019), arXiv:1906.11300. Google Scholar
- 4. , Reconciling modern machine-learning practice and the classical bias–variance trade-off, Proc. Natl. Acad. Sci. USA 116(32) (2019) 15849–15854. https://doi.org/10.1073/pnas.1903070116. Crossref, ISI, Google Scholar
- 5. , To understand deep learning we need to understand kernel learning, in Proc. 35th Int. Conf. Machine Learning, Vol. 80, eds. J. DyA. Krause (PMLR, 2018), pp. 541–549, https://proceedings.mlr.press/v80/belkin18a.html. Google Scholar
- 6. , Theory of classification: A survey of some recent advances, ESAIM Probab. Stat. 9 (2005) 323–375. Google Scholar
- 7. , Stability and generalization, J. Mach. Learn. Res. 2 (2002) 499–526. ISI, Google Scholar
- 8. , Statistics for High-Dimensional Data : Methods, Theory and Applications (Springer Science & Business Media, 2011). Google Scholar
- 9. N. El Karoui, The spectrum of kernel random matrices, preprint (2010), arXiv:1001.0492. Google Scholar
- 10. T. Hastie, A. Montanari, S. Rosset and R. J. Tibshirani, Surprises in high-dimensional ridgeless least squares interpolation, preprint (2019), arXiv:1903.08560. Google Scholar
- 11. S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, Technical Report TR-2002-03, University of Chicago, 2002. Google Scholar
- 12. T. Liang, A. Rakhlin and X. Zhai, On the risk of minimum-norm interpolants and restricted lower isometry of kernels, preprint (2019), arXiv:1908.10292. Google Scholar
- 13. , Just interpolate: Kernel “ridgeless” regression can generalize, Ann. Statist. 48(3) (2020) 1329–1347. ISI, Google Scholar
- 14. T. Liang and B. Recht, Interpolating classifiers make few mistakes, preprint (2021), arXiv:2101.11815. Google Scholar
- 15. , Distribution of eigenvalues for some sets of random matrices, Mat. Sb. (N.S.) 72(114)(4) (1967) 457–483. Google Scholar
- 16. S. Mei and A. Montanari, The generalization error of random features regression: Precise asymptotics and double descent curve, preprint (2019), arXiv:1908.05355. Google Scholar
- 17. , Generalized inversion of modified matrices, SIAM J. Appl. Math. 24 (1973) 315–323. ISI, Google Scholar
- 18. , Interpolation of scattered data: Distance matrices and conditionally positive definite functions, Constr. Approx. 2 (1986) 11–22. Crossref, ISI, Google Scholar
- 19. , Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization, Adv. Comput. Math. 25(1) (2006) 161–193. https://doi.org/f10.1007/s10444-004-7634-z. ISI, Google Scholar
- 20. , On optimal interpolation in linear regression, in 35th Conf. Neural Information Processing Systems (NeurIPS 2021), pp. 1–12. Google Scholar
- 21. , Stable Foundations for Learning (Center for Brains, Minds and Machines, 2020). Google Scholar
- 22. T. Poggio, G. Kur and A. Banburski, Double descent in the condition number, Technical Report, MIT Center for Brains Minds and Machines, 2019. Google Scholar
- 23. , General conditions for predictivity in learning theory, Nature 428 (2004) 419–422. Crossref, ISI, Google Scholar
- 24. A. Rakhlin and X. Zhai, Consistency of interpolation with Laplace kernels is a high-dimensional phenomenon, preprint (2018), arXiv:1812.11167. Google Scholar
- 25. ,
Learning with incremental iterative regularization , In Advances in Neural Information Processing Systems, Vol. 28, eds. C. CortesN. D. LawrenceD. D. LeeM. SugiyamaR. Garnett (Curran Associates, 2015), pp. 1630–1638, http://papers.nips.cc/paper/6015-learning-with-incremental-iterative-regularization.pdf. Google Scholar - 26. , Understanding Machine Learning : From Theory to Algorithms (Cambridge University Press, USA, 2014). Google Scholar
- 27. , Learnability, stability and uniform convergence, J. Mach. Learn. Res. 11 (2010) 2635–2670. ISI, Google Scholar
- 28. , Support Vector Machines (Springer Science & Business Media, 2008). Google Scholar
- 29. , Understanding deep learning requires rethinking generalization, in 5th Int. Conf. Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proc. (OpenReview.net, 2017), pp. 1–15, https://openreview.net/forum?id=Sy8gdB9xx. Google Scholar
Remember to check out the Most Cited Articles! |
---|
Check out our Differential Equations and Mathematical Analysis books in our Mathematics 2021 catalogue |