Automatic detection of parallel sentences ...
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Titre :
Automatic detection of parallel sentences from comparable biomedical texts
Auteur(s) :
Cardon, Rémi [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Grabar, Natalia [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Titre de la manifestation scientifique :
CICLING 2019
Ville :
La Rochelle
Pays :
France
Date de début de la manifestation scientifique :
2019-04-07
Discipline(s) HAL :
Informatique [cs]
Sciences de l'Homme et Société
Sciences de l'Homme et Société
Résumé en anglais : [en]
Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited ...
Lire la suite >Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We treat this task as binary classification (alignment/non-alignment). We perform experiments on balanced and imbalanced data. The results on balanced data reach up to 0.96 F-Measure. On imbalanced data, the results are lower but remain competitive when using classification models train on balanced data. Besides, among the three datasets exploited (se-mantic equivalence and inclusions), the detection of equivalence pairs is more efficient.Lire moins >
Lire la suite >Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We treat this task as binary classification (alignment/non-alignment). We perform experiments on balanced and imbalanced data. The results on balanced data reach up to 0.96 F-Measure. On imbalanced data, the results are lower but remain competitive when using classification models train on balanced data. Besides, among the three datasets exploited (se-mantic equivalence and inclusions), the detection of equivalence pairs is more efficient.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-02430419/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-02430419/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- cardon-CICLING2019.pdf
- Accès libre
- Accéder au document