Preserving Endangered European Cultural ...
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...)
Titre :
Preserving Endangered European Cultural Heritage and Languages Through Translated Literary Texts
Auteur(s) :
Fraisse, Amel [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Zhang, Zheng [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Fisher Fishkin, Shelley [Auteur]
Stanford University
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Zhang, Zheng [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Fisher Fishkin, Shelley [Auteur]
Stanford University
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Titre de la manifestation scientifique :
First International ConferenceLanguage Technologies for All (LT4All): Enabling Language Diversity & Multilingualism Worldwide
Ville :
Paris
Pays :
France
Date de début de la manifestation scientifique :
2019-12-04
Mot(s)-clé(s) en anglais :
under-resourced languages
parallel corpus
translated literary text
parallel corpus
translated literary text
Discipline(s) HAL :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Résumé en anglais : [en]
We present the interdisciplinary ROSETTA project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced ...
Lire la suite >We present the interdisciplinary ROSETTA project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced languages. Building such corpora is vital to help preserve and expand language and traditional knowledge diversity. These corpora will be useful to handle under-resourced languages in a number of interconnected research fields such as computational linguistics, translation studies and corpus linguistics. Our project taps into a wealth of translated versions of a single fictional text spanning a period of over a century. It consists in collecting, digitizing, transcribing and aligning translations of this text. Our data collection process is based on volunteer work from the scientific and scholarly communities, the power of the crowd and national libraries and archives. Our first experiment was conducted on the world-famous and well-traveled American novel "Adventures of Huckleberry Finn" by the American author Mark Twain. This paper reports on the parallel corpus that are now sentence aligned pairing English with Basque.Lire moins >
Lire la suite >We present the interdisciplinary ROSETTA project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced languages. Building such corpora is vital to help preserve and expand language and traditional knowledge diversity. These corpora will be useful to handle under-resourced languages in a number of interconnected research fields such as computational linguistics, translation studies and corpus linguistics. Our project taps into a wealth of translated versions of a single fictional text spanning a period of over a century. It consists in collecting, digitizing, transcribing and aligning translations of this text. Our data collection process is based on volunteer work from the scientific and scholarly communities, the power of the crowd and national libraries and archives. Our first experiment was conducted on the world-famous and well-traveled American novel "Adventures of Huckleberry Finn" by the American author Mark Twain. This paper reports on the parallel corpus that are now sentence aligned pairing English with Basque.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03086002/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03086002/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03086002/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- poster-88.pdf
- Accès libre
- Accéder au document