TransLiTex: A Parallel Corpus of Translated ...
Document type :
Communication dans un congrès avec actes
Title :
TransLiTex: A Parallel Corpus of Translated Literary Texts
Author(s) :
Fraisse, Amel [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Tran, Quoc-Tan [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Paroubek, Patrick [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Fishkin, Shelley [Auteur]

Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Tran, Quoc-Tan [Auteur]
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Jenn, Ronald [Auteur]
Centre d'Études en Civilisations, Langues et Lettres Étrangères - ULR 4074 [CECILLE]
Paroubek, Patrick [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Fishkin, Shelley [Auteur]
Scientific editor(s) :
Erhong Yang
Le Sun
Le Sun
Conference title :
Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Conference organizers(s) :
Beijing Advanced Innovation Center for Language Resources
City :
Miyazaki
Country :
Japon
Start date of the conference :
2018-05-08
Journal title :
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Publisher :
European Language Resources Association (ELRA)
English keyword(s) :
Multilingual corpus
Comparable corpus
Transnational texts
Multilingual Bibliographic data
Comparable corpus
Transnational texts
Multilingual Bibliographic data
HAL domain(s) :
Sciences de l'Homme et Société
Informatique [cs]
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Informatique [cs]
Sciences de l'Homme et Société/Sciences de l'information et de la communication
English abstract : [en]
In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic ...
Show more >In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.Show less >
Show more >In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-01827884/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01827884/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01827884/document
- Open access
- Access the document
- document
- Open access
- Access the document
- 11_W34.pdf
- Open access
- Access the document