Creation of a multilingual aligned corpus ...
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Titre :
Creation of a multilingual aligned corpus with Ukrainian as the target language and its exploitation
Auteur(s) :
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Hamon, Thierry [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Université Paris 13 [UP13]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Hamon, Thierry [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Université Paris 13 [UP13]
Titre de la manifestation scientifique :
Computational Linguistics and Intelligent Systems
Ville :
Kharkiv
Pays :
Ukraine
Date de début de la manifestation scientifique :
2017-04-21
Date de publication :
2017-04-21
Mot(s)-clé(s) en anglais :
Parallel corpora
Ukrainian
Natural Language Processing
Ukrainian
Natural Language Processing
Discipline(s) HAL :
Informatique [cs]
Informatique [cs]/Informatique et langage [cs.CL]
Informatique [cs]/Informatique et langage [cs.CL]
Résumé en anglais : [en]
The question on creation of linguistic resources (such as corpora, lexica or terminologies) occupies an important place in the research areas related to linguistics, Natural Language Processing, Computer Sciences, ...
Lire la suite >The question on creation of linguistic resources (such as corpora, lexica or terminologies) occupies an important place in the research areas related to linguistics, Natural Language Processing, Computer Sciences, psycholinguistics, etc. In this paper, we propose the description of a multilingual corpus in which Ukrainian is the target language, while source languages are Polish, French and English. The corpus contains literary texts and a small subset built with texts provided by medical area. On the whole, the corpus is composed of 62 literary texts and 129 medical texts. The corpus counts over 1 million words in the target Ukrainian language, and at least as much in the source languages taken all together. This is a directional corpus aligned at the level of sentences. After the description of this corpus, we introduce some possible exploitations and first results. We then conclude and indicate some directions for future work. The corpus presented in this work is available for the research purposes: http://natalia.grabar.free.fr/resources.phpLire moins >
Lire la suite >The question on creation of linguistic resources (such as corpora, lexica or terminologies) occupies an important place in the research areas related to linguistics, Natural Language Processing, Computer Sciences, psycholinguistics, etc. In this paper, we propose the description of a multilingual corpus in which Ukrainian is the target language, while source languages are Polish, French and English. The corpus contains literary texts and a small subset built with texts provided by medical area. On the whole, the corpus is composed of 62 literary texts and 129 medical texts. The corpus counts over 1 million words in the target Ukrainian language, and at least as much in the source languages taken all together. This is a directional corpus aligned at the level of sentences. After the description of this corpus, we introduce some possible exploitations and first results. We then conclude and indicate some directions for future work. The corpus presented in this work is available for the research purposes: http://natalia.grabar.free.fr/resources.phpLire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :