Uncovering Machine Translationese Using ...
Type de document :
Compte-rendu et recension critique d'ouvrage
Titre :
Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine-Translated French
Auteur(s) :
de Clercq, Orphée [Auteur]
de Sutter, Gert [Auteur]
Loock, Rudy [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Cappelle, Bert [Auteur]
Plevoets, Koen [Auteur]
de Sutter, Gert [Auteur]
Loock, Rudy [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Cappelle, Bert [Auteur]
Plevoets, Koen [Auteur]
Titre de la revue :
Translation Quarterly
Pagination :
21-45
Éditeur :
The Hong Kong Translation Society
Date de publication :
2021
ISSN :
1027-8559
Mot(s)-clé(s) en anglais :
Machine translation
Corpus-based translation studies
Translation
Translation quality
Corpus linguistics
Machine-translationese
Corpus-based translation studies
Translation
Translation quality
Corpus linguistics
Machine-translationese
Discipline(s) HAL :
Sciences de l'Homme et Société/Linguistique
Résumé en anglais : [en]
This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. ...
Lire la suite >This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. In the same vein as corpus-based translation studies which have focused on human-translated texts, and using a corpus-based statistical approach (Principal Component Analysis), we analyzed a ca. 1.8-million-word corpus of English to French translations of press texts, corresponding to the output of four machine translation systems: one statistical (SMT) and three neural (NMT) systems, namely DeepL, Google Translate, and the European Commission’s eTranslation MT tool, in both its SMT and NMT versions. In particular, to complement a previous study on language-specific features in French(e.g. derived adverbs, existential constructions, coordinator et, preposition avec), a series of language-independent linguistic features were extracted for each text in our corpus, ranging from superficial text characteristics such as average word and sentence length to frequencies of closed class lexical categories and measures of lexical diversity. Our results, which compare the machine-translated data with a corpus of French untranslated data, allow us to uncoverlinguistic features in French machine-translated texts that clearly deviate from the observed norms in original French (e.g.average sentence length, ngram features, lexicaldiversity), and which might serve as information for the post-diting process in order to optimize translation quality.Lire moins >
Lire la suite >This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. In the same vein as corpus-based translation studies which have focused on human-translated texts, and using a corpus-based statistical approach (Principal Component Analysis), we analyzed a ca. 1.8-million-word corpus of English to French translations of press texts, corresponding to the output of four machine translation systems: one statistical (SMT) and three neural (NMT) systems, namely DeepL, Google Translate, and the European Commission’s eTranslation MT tool, in both its SMT and NMT versions. In particular, to complement a previous study on language-specific features in French(e.g. derived adverbs, existential constructions, coordinator et, preposition avec), a series of language-independent linguistic features were extracted for each text in our corpus, ranging from superficial text characteristics such as average word and sentence length to frequencies of closed class lexical categories and measures of lexical diversity. Our results, which compare the machine-translated data with a corpus of French untranslated data, allow us to uncoverlinguistic features in French machine-translated texts that clearly deviate from the observed norms in original French (e.g.average sentence length, ngram features, lexicaldiversity), and which might serve as information for the post-diting process in order to optimize translation quality.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- Uncovering%20Machine%20Translationese%20Using%20Corpus%20Analysis%20Techniques.pdf
- Accès libre
- Accéder au document