MMAR: Multilingual and multimodal anaphora ...
Type de document :
Communication dans un congrès avec actes
Titre :
MMAR: Multilingual and multimodal anaphora resolution in instructional videos
Auteur(s) :
Oguz, Cennet [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Denis, Pascal [Auteur]
Machine Learning in Information Networks [MAGNET]
Ostermann, Simon [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Skachkova, Natalia [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Vincent, Emmanuel [Auteur]
Speech Modeling for Facilitating Oral-Based Communication [MULTISPEECH]
van Genabith, Josef [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Denis, Pascal [Auteur]

Machine Learning in Information Networks [MAGNET]
Ostermann, Simon [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Skachkova, Natalia [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Vincent, Emmanuel [Auteur]
Speech Modeling for Facilitating Oral-Based Communication [MULTISPEECH]
van Genabith, Josef [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Titre de la manifestation scientifique :
Findings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ville :
Miami
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2024-11-12
Mot(s)-clé(s) en anglais :
Multilingual, Multimodal, Parallel, Anaphorcity, Zero-Pronoun
Discipline(s) HAL :
Informatique [cs]/Informatique et langage [cs.CL]
Résumé en anglais : [en]
Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, ...
Lire la suite >Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross-and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ∼10% for unseen languages.Lire moins >
Lire la suite >Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross-and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ∼10% for unseen languages.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- oguz_EMNLP24.pdf
- Accès libre
- Accéder au document