A Probabilistic Model for Joint Learning ...
Document type :
Communication dans un congrès avec actes
Title :
A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images
Author(s) :
Ailem, Melissa [Auteur]
USC Viterbi School of Engineering
Machine Learning in Information Networks [MAGNET]
Zhang, Bowen [Auteur]
USC Viterbi School of Engineering
Bellet, Aurelien [Auteur]
Machine Learning in Information Networks [MAGNET]
Denis, Pascal [Auteur]
Machine Learning in Information Networks [MAGNET]
Sha, Fei [Auteur]
USC Viterbi School of Engineering
USC Viterbi School of Engineering
Machine Learning in Information Networks [MAGNET]
Zhang, Bowen [Auteur]
USC Viterbi School of Engineering
Bellet, Aurelien [Auteur]
Machine Learning in Information Networks [MAGNET]
Denis, Pascal [Auteur]
Machine Learning in Information Networks [MAGNET]
Sha, Fei [Auteur]
USC Viterbi School of Engineering
Conference title :
Conference on Empirical Methods in Natural Language Processing (EMNLP 2018)
City :
Brussels
Country :
Belgique
Start date of the conference :
2018
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features ...
Show more >Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.Show less >
Show more >Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01922985/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01922985/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01922985/document
- Open access
- Access the document
- document
- Open access
- Access the document
- emnlp18.pdf
- Open access
- Access the document