Combining an expert-based medical entity ...
Type de document :
Article dans une revue scientifique: Article original
DOI :
Titre :
Combining an expert-based medical entity recognizer to a machine-learning system: methods and a case-study
Auteur(s) :
Zweigenbaum, Pierre [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Lavergne, Thomas [Auteur]
Université Paris-Sud - Paris 11 [UP11]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Grabar, Natalia [Auteur]
Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Hamon, Thierry [Auteur]
Laboratoire d'Informatique Médicale et de BIOinformatique [LIM&BIO]
Rosset, Sophie [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Grouin, Cyril [Auteur]
Laboratoire de Santé Publique et Informatique Médicale [SPIM]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Lavergne, Thomas [Auteur]
Université Paris-Sud - Paris 11 [UP11]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Grabar, Natalia [Auteur]

Savoirs, Textes, Langage (STL) - UMR 8163 [STL]
Hamon, Thierry [Auteur]
Laboratoire d'Informatique Médicale et de BIOinformatique [LIM&BIO]
Rosset, Sophie [Auteur]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Grouin, Cyril [Auteur]
Laboratoire de Santé Publique et Informatique Médicale [SPIM]
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [LIMSI]
Titre de la revue :
Biomedical Informatics Insights
Pagination :
BII.S11770
Date de publication :
2013
Mot(s)-clé(s) en anglais :
Natural Language Processing
Information Extraction
Medical records
Machine Learning
Hybrid Meth- ods
Overfitting
Information Extraction
Medical records
Machine Learning
Hybrid Meth- ods
Overfitting
Discipline(s) HAL :
Informatique [cs]
Informatique [cs]/Informatique et langage [cs.CL]
Informatique [cs]/Informatique et langage [cs.CL]
Résumé en anglais : [en]
Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system, for ...
Lire la suite >Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system, for instance in the form of lexicons and pattern-based rules, are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. We examine different methods to combine two such systems and test the most relevant ones through experiments performed on the i2b2/VA 2012 challenge data. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the two systems from obtaining improvements in precision, recall, or F-measure, and analyse the underlying mechanisms through a post-hoc feature-level analysis. We also observe that wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710 (strict matching of types and boundaries, as per the conlleval program), bringing it on par with the data-driven system. The generality of this method remains to be further investigated.Lire moins >
Lire la suite >Medical entity recognition is currently generally performed by data-driven methods based on supervised machine learning. Expert-based systems, where linguistic and domain expertise are directly provided to the system, for instance in the form of lexicons and pattern-based rules, are often combined with data-driven systems. We present here a case study where an existing expert-based medical entity recognition system, Ogmios, is combined with a data-driven system, Caramba, based on a linear-chain Conditional Random Field (CRF) classifier. We examine different methods to combine two such systems and test the most relevant ones through experiments performed on the i2b2/VA 2012 challenge data. Our case study specifically highlights the risk of overfitting incurred by an expert-based system. We observe that it prevents the combination of the two systems from obtaining improvements in precision, recall, or F-measure, and analyse the underlying mechanisms through a post-hoc feature-level analysis. We also observe that wrapping the expert-based system alone as attributes input to a CRF classifier does boost its F-measure from 0.603 to 0.710 (strict matching of types and boundaries, as per the conlleval program), bringing it on par with the data-driven system. The generality of this method remains to be further investigated.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :