Non-Vacuous Generalisation Bounds for ...
Document type :
Pré-publication ou Document de travail
Title :
Non-Vacuous Generalisation Bounds for Shallow Neural Networks
Author(s) :
Biggs, Felix [Auteur]
Department of Computer science [University College of London] [UCL-CS]
The Inria London Programme [Inria-London]
Guedj, Benjamin [Auteur]
Department of Computer science [University College of London] [UCL-CS]
The Alan Turing Institute
The Inria London Programme [Inria-London]
MOdel for Data Analysis and Learning [MODAL]
Department of Computer science [University College of London] [UCL-CS]
The Inria London Programme [Inria-London]
Guedj, Benjamin [Auteur]
Department of Computer science [University College of London] [UCL-CS]
The Alan Turing Institute
The Inria London Programme [Inria-London]
MOdel for Data Analysis and Learning [MODAL]
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Théorie [stat.TH]
English abstract : [en]
We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear ...
Show more >We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.Show less >
Show more >We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function ("erf") activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.Show less >
Language :
Anglais
Comment :
25 pages, 12 figures
Collections :
Source :
Files
- document
- Open access
- Access the document
- 2202.01627.pdf
- Open access
- Access the document
- 2202.01627
- Open access
- Access the document