Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine­-Translated French

de Clercq, Orphée; de Sutter, Gert; Loock, Rudy; Cappelle, Bert; Plevoets, Koen

Type de document :

Compte-rendu et recension critique d'ouvrage

Titre :

Uncovering Machine Translationese Using Corpus Analysis Techniques to Distinguish between Original and Machine-Translated French

Auteur(s) :

de Clercq, Orphée [Auteur]
de Sutter, Gert [Auteur]
Loock, Rudy [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Cappelle, Bert [Auteur]

Plevoets, Koen [Auteur]

Titre de la revue :

Translation Quarterly

Pagination :

21-45

Éditeur :

The Hong Kong Translation Society

Date de publication :

2021

ISSN :

1027-8559

Mot(s)-clé(s) en anglais :

Machine translation
Corpus-based translation studies
Translation
Translation quality
Corpus linguistics
Machine-translationese

Discipline(s) HAL :

Sciences de l'Homme et Société/Linguistique

Résumé en anglais : [en]

This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. ...
Lire la suite >This paper investigates the linguistic characteristics of English to French machine-translatedtexts in comparison with French original, untranslated texts in order to uncover what has been called “machine translationese”. In the same vein as corpus-based translation studies which have focused on human-translated texts, and using a corpus-based statistical approach (Principal Component Analysis), we analyzed a ca. 1.8-million-word corpus of English to French translations of press texts, corresponding to the output of four machine translation systems: one statistical (SMT) and three neural (NMT) systems, namely DeepL, Google Translate, and the European Commission’s eTranslation MT tool, in both its SMT and NMT versions. In particular, to complement a previous study on language-specific features in French(e.g. derived adverbs, existential constructions, coordinator et, preposition avec), a series of language-independent linguistic features were extracted for each text in our corpus, ranging from superficial text characteristics such as average word and sentence length to frequencies of closed class lexical categories and measures of lexical diversity. Our results, which compare the machine-translated data with a corpus of French untranslated data, allow us to uncoverlinguistic features in French machine-translated texts that clearly deviate from the observed norms in original French (e.g.average sentence length, ngram features, lexicaldiversity), and which might serve as information for the post-diting process in order to optimize translation quality.Lire moins >

Langue :

Anglais

Vulgarisation :

Non

Collections :

Savoirs, Textes, Langage (STL) - UMR 8163

Source :

Harvested from HAL

Fichiers

document
Accès libre
Accéder au document

Uncovering%20Machine%20Translationese%20Using%20Corpus%20Analysis%20Techniques.pdf
Accès libre
Accéder au document

Uncovering Machine Translationese Using ... BibTeX CSV Excel RIS

Fichiers

Uncovering Machine Translationese Using ...

BibTeX

CSV

Excel

RIS