Improving Automatic Categorization of ...
Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Title :
Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings
Author(s) :
Pylieva, Hanna [Auteur]
Chernodub, Artem [Auteur]
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Hamon, Thierry [Auteur]
Université Paris 13 [UP13]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Chernodub, Artem [Auteur]
Grabar, Natalia [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Hamon, Thierry [Auteur]
Université Paris 13 [UP13]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Conference title :
1st International Workshop on Informatics & Data-Driven Medicine (IDDM 2018)
City :
Lviv
Country :
Ukraine
Start date of the conference :
2018-11-28
English keyword(s) :
text simplification
difficulty detection
word embeddings
difficulty detection
word embeddings
HAL domain(s) :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Intelligence artificielle [cs.AI]
English abstract : [en]
Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word ...
Show more >Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word embeddings, which contain context information for words together with other linguistic and non-linguistic features, for improving the detection of difficult medical words. We propose new cross-validation scenarios in order to test the generalization ability of the medical words difficulty detection from different perspectives and provide the experimental study of previously used methods for feature extraction together with recently proposed FastText embeddings. We found that for known words and unknown users FastText embeddings surely improves the detection of word understandability reaching 85.9 F-score (up to 2.9 F-score improvement).Show less >
Show more >Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word embeddings, which contain context information for words together with other linguistic and non-linguistic features, for improving the detection of difficult medical words. We propose new cross-validation scenarios in order to test the generalization ability of the medical words difficulty detection from different perspectives and provide the experimental study of previously used methods for feature extraction together with recently proposed FastText embeddings. We found that for known words and unknown users FastText embeddings surely improves the detection of word understandability reaching 85.9 F-score (up to 2.9 F-score improvement).Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://halshs.archives-ouvertes.fr/halshs-01968357/document
- Open access
- Access the document
- https://halshs.archives-ouvertes.fr/halshs-01968357/document
- Open access
- Access the document
- https://halshs.archives-ouvertes.fr/halshs-01968357/document
- Open access
- Access the document
- pylieva-IDDM2018.pdf
- Open access
- Access the document
- document
- Open access
- Access the document