Capsule networks as recurrent models of ...
Type de document :
Article dans une revue scientifique: Article original
PMID :
URL permanente :
Titre :
Capsule networks as recurrent models of grouping and segmentation
Auteur(s) :
Doerig, Adrien [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Schmittwilken, Lynn [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Technical University of Berlin / Technische Universität Berlin [TUB]
Sayim, Bilge [Auteur]
Laboratoire Sciences Cognitives et Sciences Affectives - UMR 9193 [SCALab]
Manassi, Mauro [Auteur]
University of Aberdeen
Herzog, Michael H [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Schmittwilken, Lynn [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Technical University of Berlin / Technische Universität Berlin [TUB]
Sayim, Bilge [Auteur]
Laboratoire Sciences Cognitives et Sciences Affectives - UMR 9193 [SCALab]
Manassi, Mauro [Auteur]
University of Aberdeen
Herzog, Michael H [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Titre de la revue :
PLoS Computational Biology
Nom court de la revue :
PLoS Comput Biol
Numéro :
16
Pagination :
e1008017
Date de publication :
2020-07-21
ISSN :
1553-7358
Mot(s)-clé(s) en anglais :
Algorithms
Computational Biology
Computer Simulation
Female
Humans
Image Processing, Computer-Assisted
Male
Models, Biological
Neural Networks, Computer
Normal Distribution
Pattern Recognition, Visual
Reproducibility of Results
Vision, Ocular
Computational Biology
Computer Simulation
Female
Humans
Image Processing, Computer-Assisted
Male
Models, Biological
Neural Networks, Computer
Normal Distribution
Pattern Recognition, Visual
Reproducibility of Results
Vision, Ocular
Discipline(s) HAL :
Sciences cognitives
Résumé en anglais : [en]
Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as ...
Lire la suite >Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.Lire moins >
Lire la suite >Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Établissement(s) :
Université de Lille
CNRS
CHU Lille
CNRS
CHU Lille
Équipe(s) de recherche :
Équipe Action, Vision et Apprentissage (AVA)
Date de dépôt :
2020-12-30T22:22:55Z
2021-01-04T15:03:46Z
2021-01-04T15:03:46Z
Fichiers
- Doerig 2020 PlosCompBio.pdf
- Version éditeur
- Accès libre
- Accéder au document