Capsule networks as recurrent models of ...
Document type :
Article dans une revue scientifique: Article original
PMID :
Permalink :
Title :
Capsule networks as recurrent models of grouping and segmentation
Author(s) :
Doerig, Adrien [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Schmittwilken, Lynn [Auteur]
Technical University of Berlin / Technische Universität Berlin [TU]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Sayim, Bilge [Auteur]
Laboratoire Sciences Cognitives et Sciences Affectives - UMR 9193 [SCALab]
Manassi, Mauro [Auteur]
University of Aberdeen
Herzog, Michael H [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Schmittwilken, Lynn [Auteur]
Technical University of Berlin / Technische Universität Berlin [TU]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Sayim, Bilge [Auteur]
Laboratoire Sciences Cognitives et Sciences Affectives - UMR 9193 [SCALab]
Manassi, Mauro [Auteur]
University of Aberdeen
Herzog, Michael H [Auteur]
Ecole Polytechnique Fédérale de Lausanne [EPFL]
Journal title :
PLoS Computational Biology
Abbreviated title :
PLoS Comput Biol
Volume number :
16
Pages :
e1008017
Publication date :
2020-07-21
ISSN :
1553-7358
English keyword(s) :
Algorithms
Computational Biology
Computer Simulation
Female
Humans
Image Processing, Computer-Assisted
Male
Models, Biological
Neural Networks, Computer
Normal Distribution
Pattern Recognition, Visual
Reproducibility of Results
Vision, Ocular
Computational Biology
Computer Simulation
Female
Humans
Image Processing, Computer-Assisted
Male
Models, Biological
Neural Networks, Computer
Normal Distribution
Pattern Recognition, Visual
Reproducibility of Results
Vision, Ocular
HAL domain(s) :
Sciences cognitives
English abstract : [en]
Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as ...
Show more >Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.Show less >
Show more >Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that CapsNets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Administrative institution(s) :
Université de Lille
CNRS
CHU Lille
CNRS
CHU Lille
Research team(s) :
Équipe Action, Vision et Apprentissage (AVA)
Submission date :
2020-12-30T22:22:55Z
2021-01-04T15:03:46Z
2021-01-04T15:03:46Z
Files
- Doerig 2020 PlosCompBio.pdf
- Version éditeur
- Open access
- Access the document