A Probabilistic Model for Joint Learning ...
Type de document :
Communication dans un congrès avec actes
Titre :
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
Auteur(s) :
Ailem, Melissa [Auteur]
USC Viterbi School of Engineering
Machine Learning in Information Networks [MAGNET]
Zhang, Bowen [Auteur]
USC Viterbi School of Engineering
Bellet, Aurelien [Auteur]
Machine Learning in Information Networks [MAGNET]
Denis, Pascal [Auteur]
Machine Learning in Information Networks [MAGNET]
Sha, Fei [Auteur]
USC Viterbi School of Engineering
USC Viterbi School of Engineering
Machine Learning in Information Networks [MAGNET]
Zhang, Bowen [Auteur]
USC Viterbi School of Engineering
Bellet, Aurelien [Auteur]
Machine Learning in Information Networks [MAGNET]
Denis, Pascal [Auteur]
Machine Learning in Information Networks [MAGNET]
Sha, Fei [Auteur]
USC Viterbi School of Engineering
Titre de la manifestation scientifique :
Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)
Ville :
Brussels
Pays :
Belgique
Date de début de la manifestation scientifique :
2018
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features ...
Lire la suite >Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.Lire moins >
Lire la suite >Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01922985/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01922985/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01922985/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- emnlp18.pdf
- Accès libre
- Accéder au document