MMAR: Multilingual and multimodal anaphora ...
Document type :
Communication dans un congrès avec actes
Title :
MMAR: Multilingual and multimodal anaphora resolution in instructional videos
Author(s) :
Oguz, Cennet [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Denis, Pascal [Auteur]
Machine Learning in Information Networks [MAGNET]
Ostermann, Simon [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Skachkova, Natalia [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Vincent, Emmanuel [Auteur]
Speech Modeling for Facilitating Oral-Based Communication [MULTISPEECH]
van Genabith, Josef [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Denis, Pascal [Auteur]

Machine Learning in Information Networks [MAGNET]
Ostermann, Simon [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Skachkova, Natalia [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Vincent, Emmanuel [Auteur]
Speech Modeling for Facilitating Oral-Based Communication [MULTISPEECH]
van Genabith, Josef [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Conference title :
Findings of the 2024 Conference on Empirical Methods in Natural Language Processing
City :
Miami
Country :
Etats-Unis d'Amérique
Start date of the conference :
2024-11-12
English keyword(s) :
Multilingual, Multimodal, Parallel, Anaphorcity, Zero-Pronoun
HAL domain(s) :
Informatique [cs]/Informatique et langage [cs.CL]
English abstract : [en]
Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, ...
Show more >Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross-and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ∼10% for unseen languages.Show less >
Show more >Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross-and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ∼10% for unseen languages.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- oguz_EMNLP24.pdf
- Open access
- Access the document