Uncovering Machine Translationese Using ...
Document type :
Article dans une revue scientifique
Title :
Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine-Translated French
Author(s) :
de Clercq, Orphée [Auteur]
de Sutter, Gert [Auteur]
Loock, Rudy [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Cappelle, Bert [Auteur]
Plevoets, Koen [Auteur]
de Sutter, Gert [Auteur]
Loock, Rudy [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Cappelle, Bert [Auteur]

Plevoets, Koen [Auteur]
Journal title :
Translation Quarterly
Pages :
21-45
Publisher :
The Hong Kong Translation Society
Publication date :
2021
ISSN :
1027-8559
English keyword(s) :
Machine translation
Corpus-based translation studies
Translation
Translation quality
Corpus linguistics
Machine-translationese
Corpus-based translation studies
Translation
Translation quality
Corpus linguistics
Machine-translationese
HAL domain(s) :
Sciences de l'Homme et Société/Linguistique
English abstract : [en]
This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. ...
Show more >This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. In the same vein as corpus-based translation studies which have focused on human-translated texts, and using a corpus-based statistical approach (Principal Component Analysis), we analyzed a ca. 1.8-million-word corpus of English to French translations of press texts, corresponding to the output of four machine translation systems: one statistical (SMT) and three neural (NMT) systems, namely DeepL, Google Translate, and the European Commission’s eTranslation MT tool, in both its SMT and NMT versions. In particular, to complement a previous study on language-specific features in French(e.g. derived adverbs, existential constructions, coordinator et, preposition avec), a series of language-independent linguistic features were extracted for each text in our corpus, ranging from superficial text characteristics such as average word and sentence length to frequencies of closed class lexical categories and measures of lexical diversity. Our results, which compare the machine-translated data with a corpus of French untranslated data, allow us to uncoverlinguistic features in French machine-translated texts that clearly deviate from the observed norms in original French (e.g.average sentence length, ngram features, lexicaldiversity), and which might serve as information for the post-diting process in order to optimize translation quality.Show less >
Show more >This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. In the same vein as corpus-based translation studies which have focused on human-translated texts, and using a corpus-based statistical approach (Principal Component Analysis), we analyzed a ca. 1.8-million-word corpus of English to French translations of press texts, corresponding to the output of four machine translation systems: one statistical (SMT) and three neural (NMT) systems, namely DeepL, Google Translate, and the European Commission’s eTranslation MT tool, in both its SMT and NMT versions. In particular, to complement a previous study on language-specific features in French(e.g. derived adverbs, existential constructions, coordinator et, preposition avec), a series of language-independent linguistic features were extracted for each text in our corpus, ranging from superficial text characteristics such as average word and sentence length to frequencies of closed class lexical categories and measures of lexical diversity. Our results, which compare the machine-translated data with a corpus of French untranslated data, allow us to uncoverlinguistic features in French machine-translated texts that clearly deviate from the observed norms in original French (e.g.average sentence length, ngram features, lexicaldiversity), and which might serve as information for the post-diting process in order to optimize translation quality.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- Uncovering%20Machine%20Translationese%20Using%20Corpus%20Analysis%20Techniques.pdf
- Open access
- Access the document