Building Multilingual Parallel Corpora for ...
Document type :
Communication dans un congrès avec actes
Title :
Building Multilingual Parallel Corpora for Under-Resourced Languages Using Translated Fictional Texts
Author(s) :
Fraisse, Amel [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Fisher Fishkin, Shelley [Auteur]

Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Fisher Fishkin, Shelley [Auteur]
Scientific editor(s) :
Claudia Soria
Laurent Besacier
Laurette Pretorius
Laurent Besacier
Laurette Pretorius
Conference title :
The 3rd Workshop on Collaboration and Computing for Under-Resourced Languages: Sustaining Knowledge Diversity in the Digital Age (CCURL 2018)
City :
Miyazaki
Country :
Japon
Start date of the conference :
2018-05-12
Book title :
Proceedings of the LREC 2018 Workshop CCURL2018 – Sustaining Knowledge Diversity in the Digital Age
Publication date :
2018-05-12
English keyword(s) :
Transnational texts
Multilingual Bibliographic data
Preserving knowledge diversity
Multilingual Bibliographic data
Preserving knowledge diversity
HAL domain(s) :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Informatique [cs]
Sciences de l'Homme et Société
Informatique [cs]
Sciences de l'Homme et Société
English abstract : [en]
In this paper, we present an ongoing research project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced ...
Show more >In this paper, we present an ongoing research project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced languages. Building such corpora is vital to help preserve and expand language and traditional knowledge diversity. These corpora will be useful to handle under-resourced languages in a number of interconnected research fields such as computational linguistics, translation studies and corpus linguistics. Our project taps into a wealth of translated versions of a single fictional text spanning a period of over a century. It consists in collecting, digitizing, transcribing and aligning translations of this text. Our data collection process is fluid and collaborative. It is based on volunteer work from the scientific and scholarly communities, the power of the crowd and national libraries and archives. Our first experiment was conducted on the world-famous and well-traveled American novel “Adventures of Huckleberry Finn” by the American author Mark Twain. This paper reports on 10 parallel corpus that are now chapter aligned pairing English with Arabic, Basque, Bengali, Bulgarian, Dutch, Hungarian, Polish, Russian, Turkish and Ukrainian processed out of a total of 20 collected translations.Show less >
Show more >In this paper, we present an ongoing research project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced languages. Building such corpora is vital to help preserve and expand language and traditional knowledge diversity. These corpora will be useful to handle under-resourced languages in a number of interconnected research fields such as computational linguistics, translation studies and corpus linguistics. Our project taps into a wealth of translated versions of a single fictional text spanning a period of over a century. It consists in collecting, digitizing, transcribing and aligning translations of this text. Our data collection process is fluid and collaborative. It is based on volunteer work from the scientific and scholarly communities, the power of the crowd and national libraries and archives. Our first experiment was conducted on the world-famous and well-traveled American novel “Adventures of Huckleberry Finn” by the American author Mark Twain. This paper reports on 10 parallel corpus that are now chapter aligned pairing English with Arabic, Basque, Bengali, Bulgarian, Dutch, Hungarian, Polish, Russian, Turkish and Ukrainian processed out of a total of 20 collected translations.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Source :