• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Savoirs, Textes, Langage (STL) - UMR 8163
  • View Item
  •   LillOA Home
  • Liste des unités
  • Savoirs, Textes, Langage (STL) - UMR 8163
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Identification of Parallel Sentences in ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Title :
Identification of Parallel Sentences in Comparable Monolingual Corpora from Different Registers
Author(s) :
Cardon, Rémi [Auteur]
Grabar, Natalia [Auteur] refId
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Conference title :
LOUHI 2018:The Ninth International Workshop on Health Text Mining and Information Analysis
City :
Bruxelles
Country :
Belgique
Start date of the conference :
2018-10-31
HAL domain(s) :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Sciences de l'Homme et Société
English abstract : [en]
Parallel aligned sentences provide useful information for different NLP applications. Yet, this kind of data is seldom available, especially for languages other than English. We propose to exploit comparable corpora in ...
Show more >
Parallel aligned sentences provide useful information for different NLP applications. Yet, this kind of data is seldom available, especially for languages other than English. We propose to exploit comparable corpora in French which are distinguished by their registers (spe-cialized and simplified versions) to detect and align parallel sentences. These corpora are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We exploit a set of features and several automatic classi-fiers. The automatic alignment reaches up to 0.93 Precision, Recall and F-measure. In order to better evaluate the method, it is applied to data in English from the SemEval STS competitions. The same features and models are applied in monolingual and cross-lingual contexts , in which they show up to 0.90 and 0.73 F-measure, respectively.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Communication, Literacy, Education, Accessibility, Readability
Collections :
  • Savoirs, Textes, Langage (STL) - UMR 8163
Source :
Harvested from HAL
Files
Thumbnail
  • https://halshs.archives-ouvertes.fr/halshs-01968351/document
  • Open access
  • Access the document
Thumbnail
  • https://halshs.archives-ouvertes.fr/halshs-01968351/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017