MMAR: Multilingual and multimodal anaphora resolution in instructional videos

Oguz, Cennet; Denis, Pascal; Ostermann, Simon; Skachkova, Natalia; Vincent, Emmanuel; van Genabith, Josef

Type de document :

Communication dans un congrès avec actes

Titre :

MMAR: Multilingual and multimodal anaphora resolution in instructional videos

Auteur(s) :

Oguz, Cennet [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Denis, Pascal [Auteur]

Machine Learning in Information Networks [MAGNET]
Ostermann, Simon [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Skachkova, Natalia [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]
Vincent, Emmanuel [Auteur]
Speech Modeling for Facilitating Oral-Based Communication [MULTISPEECH]
van Genabith, Josef [Auteur]
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence [DFKI]

Titre de la manifestation scientifique :

Findings of the 2024 Conference on Empirical Methods in Natural Language Processing

Ville :

Miami

Pays :

Etats-Unis d'Amérique

Date de début de la manifestation scientifique :

2024-11-12

Mot(s)-clé(s) en anglais :

Multilingual, Multimodal, Parallel, Anaphorcity, Zero-Pronoun

Discipline(s) HAL :

Informatique [cs]/Informatique et langage [cs.CL]

Résumé en anglais : [en]

Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, ...
Lire la suite >Multilingual anaphora resolution identifies referring expressions and implicit arguments in texts and links to antecedents that cover several languages. In the most challenging setting, cross-lingual anaphora resolution, training data, and test data are in different languages. As knowledge needs to be transferred across languages, this task is challenging, both in the multilingual and cross-lingual setting. We hypothesize that one way to alleviate some of the difficulty of the task is to include multimodal information in the form of images (i.e. frames extracted from instructional videos). Such visual inputs are by nature language agnostic, therefore cross-and multilingual anaphora resolution should benefit from visual information. In this paper, we provide the first multilingual and multimodal dataset annotated with anaphoric relations and present experimental results for end-to-end multimodal and multilingual anaphora resolution. Given gold mentions, multimodal features improve anaphora resolution results by ∼10% for unseen languages.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Fichiers

document
Accès libre
Accéder au document

oguz_EMNLP24.pdf
Accès libre
Accéder au document

MMAR: Multilingual and multimodal anaphora ... BibTeX CSV Excel RIS

Fichiers

MMAR: Multilingual and multimodal anaphora ...

BibTeX

CSV

Excel

RIS