Interpreting Neural Networks as Majority ...
Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès sans actes
Permalink :
Title :
Interpreting Neural Networks as Majority Votes through the PAC-Bayesian Theory
Author(s) :
Viallard, Paul [Auteur]
Emonet, Rémi [Auteur]
Germain, Pascal [Auteur]
Habrard, Amaury [Auteur]
Morvant, Emilie [Auteur]
Emonet, Rémi [Auteur]
Germain, Pascal [Auteur]
Habrard, Amaury [Auteur]
Morvant, Emilie [Auteur]
Conference title :
Workshop on Machine Learning with guarantees @ NeurIPS 2019
City :
Vancouver
Country :
Canada
Start date of the conference :
2019-12-14
Publication date :
2019
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We propose a PAC-Bayesian theoretical study of the two-phase learning procedure of a neural network introduced by Kawaguchi et al. (2017). In this procedure, a network is expressed as a weighted combination of all the paths ...
Show more >We propose a PAC-Bayesian theoretical study of the two-phase learning procedure of a neural network introduced by Kawaguchi et al. (2017). In this procedure, a network is expressed as a weighted combination of all the paths of the network (from the input layer to the output one), that we reformulate as a PAC-Bayesian majority vote. Starting from this observation, their learning procedure consists in (1) learning a "prior" network for fixing some parameters, then (2) learning a "posterior" network by only allowing a modification of the weights over the paths of the prior network. This allows us to derive a PAC-Bayesian generalization bound that involves the empirical individual risks of the paths (known as the Gibbs risk) and the empirical diversity between pairs of paths. Note that similarly to classical PAC-Bayesian bounds, our result involves a KL-divergence term between a "prior" network and the "posterior" network. We show that this term is computable by dynamic programming without assuming any distribution on the network weights.Show less >
Show more >We propose a PAC-Bayesian theoretical study of the two-phase learning procedure of a neural network introduced by Kawaguchi et al. (2017). In this procedure, a network is expressed as a weighted combination of all the paths of the network (from the input layer to the output one), that we reformulate as a PAC-Bayesian majority vote. Starting from this observation, their learning procedure consists in (1) learning a "prior" network for fixing some parameters, then (2) learning a "posterior" network by only allowing a modification of the weights over the paths of the prior network. This allows us to derive a PAC-Bayesian generalization bound that involves the empirical individual risks of the paths (known as the Gibbs risk) and the empirical diversity between pairs of paths. Note that similarly to classical PAC-Bayesian bounds, our result involves a KL-divergence term between a "prior" network and the "posterior" network. We show that this term is computable by dynamic programming without assuming any distribution on the network weights.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Administrative institution(s) :
CNRS
Université de Lille
Université de Lille
Submission date :
2020-06-08T14:10:29Z