ROSETTA: Resources for Endangered languages ...
Type de document :
Communication dans un congrès avec actes
Titre :
ROSETTA: Resources for Endangered languages through translated texts
Auteur(s) :
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Fraisse, Amel [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Zhang, Zheng [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Fisher Fishkin, Shelley [Auteur]
Stanford University

Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Fraisse, Amel [Auteur]

Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Zhang, Zheng [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Fisher Fishkin, Shelley [Auteur]
Stanford University
Titre de la manifestation scientifique :
The Center for Spatial and Textual Analysis (CESTA) Seminar Series, Stanford University, USA
Organisateur(s) de la manifestation scientifique :
The Center for Spatial and Textual Analysis (CESTA)
Ville :
Stanford
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2019-04-09
Date de publication :
2019
Mot(s)-clé(s) en anglais :
Digital humanities
Transnational bibliographic data
multilingual bibliographic data
Transnational bibliographic data
multilingual bibliographic data
Discipline(s) HAL :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Résumé en anglais : [en]
Out of the world’s 6000+ languages only a small fraction currently enjoys the benefits of modern language technologies. Languages left behind are called endangered or technologically low-resourced (even though they may ...
Lire la suite >Out of the world’s 6000+ languages only a small fraction currently enjoys the benefits of modern language technologies. Languages left behind are called endangered or technologically low-resourced (even though they may have millions of speakers). This collaborative and interdisciplinary digital humanities research project aims to help salvage those languages by combining computational linguistics, American Literature, and Translation Studies. Much as the Rosetta Stone helped decipher the demotic and hieroglyphic scripts thanks to the presence of the Greek translation, our project intends to preserve contemporary endangered languages and assist with their sur- vival through translation. Our project puts to use the extant translated versions of a single fictional text—Mark Twain’s Adventures of Huckleberry Finn—into a number of low-resourced languages spanning a period of nearly a century and a half. The project relies on the involvement of humans for data collection while natural language processing tools generate language resources (corpora, dictionaries, thesauri, lexicons) for those endangered languages.Lire moins >
Lire la suite >Out of the world’s 6000+ languages only a small fraction currently enjoys the benefits of modern language technologies. Languages left behind are called endangered or technologically low-resourced (even though they may have millions of speakers). This collaborative and interdisciplinary digital humanities research project aims to help salvage those languages by combining computational linguistics, American Literature, and Translation Studies. Much as the Rosetta Stone helped decipher the demotic and hieroglyphic scripts thanks to the presence of the Greek translation, our project intends to preserve contemporary endangered languages and assist with their sur- vival through translation. Our project puts to use the extant translated versions of a single fictional text—Mark Twain’s Adventures of Huckleberry Finn—into a number of low-resourced languages spanning a period of nearly a century and a half. The project relies on the involvement of humans for data collection while natural language processing tools generate language resources (corpora, dictionaries, thesauri, lexicons) for those endangered languages.Lire moins >
Langue :
Anglais
Comité de lecture :
Non
Audience :
Internationale
Vulgarisation :
Non
Source :