A machine learning based framework to ...
Type de document :
Compte-rendu et recension critique d'ouvrage
Titre :
A machine learning based framework to identify and classify long terminal repeat retrotransposons
Auteur(s) :
Schietgat, Leander [Auteur]
Catholic University of Leuven = Katholieke Universiteit Leuven [KU Leuven]
Vens, Celine [Auteur]
Catholic University of Leuven = Katholieke Universiteit Leuven [KU Leuven]
Ramon, Jan [Auteur]
Machine Learning in Information Networks [MAGNET]
Cerri, Ricardo [Auteur]
Federal University of São Carlos [UFSCar]
Fischer, Carlos [Auteur]
Universidade Estadual Paulista Júlio de Mesquita Filho = São Paulo State University [UNESP]
Costa, Eduardo [Auteur]
Catholic University of Leuven = Katholieke Universiteit Leuven [KU Leuven]
Universidade de São Paulo = University of São Paulo [USP]
Carareto, Claudia [Auteur]
Universidade Estadual Paulista Júlio de Mesquita Filho = São Paulo State University [UNESP]
Blockeel, Hendrik [Auteur]
Declarative Languages and Artificial Intelligence [DTAI]
Catholic University of Leuven = Katholieke Universiteit Leuven [KU Leuven]
Vens, Celine [Auteur]
Catholic University of Leuven = Katholieke Universiteit Leuven [KU Leuven]
Ramon, Jan [Auteur]
Machine Learning in Information Networks [MAGNET]
Cerri, Ricardo [Auteur]
Federal University of São Carlos [UFSCar]
Fischer, Carlos [Auteur]
Universidade Estadual Paulista Júlio de Mesquita Filho = São Paulo State University [UNESP]
Costa, Eduardo [Auteur]
Catholic University of Leuven = Katholieke Universiteit Leuven [KU Leuven]
Universidade de São Paulo = University of São Paulo [USP]
Carareto, Claudia [Auteur]
Universidade Estadual Paulista Júlio de Mesquita Filho = São Paulo State University [UNESP]
Blockeel, Hendrik [Auteur]
Declarative Languages and Artificial Intelligence [DTAI]
Titre de la revue :
PLoS Computational Biology
Pagination :
1-21
Éditeur :
PLOS
Date de publication :
2018-04-23
ISSN :
1553-734X
Discipline(s) HAL :
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Bio-informatique [q-bio.QM]
Mathématiques [math]/Statistiques [math.ST]
Physique [physics]/Physique [physics]/Analyse de données, Statistiques et Probabilités [physics.data-an]
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Bio-informatique [q-bio.QM]
Mathématiques [math]/Statistiques [math.ST]
Physique [physics]/Physique [physics]/Analyse de données, Statistiques et Probabilités [physics.data-an]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity ...
Lire la suite >Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.Lire moins >
Lire la suite >Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-LEARNER, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: REPEATMASKER, CENSOR and LTRDIGEST. In contrast to these methods, TE-LEARNER is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance , while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-LEARNER's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01814669/document
- Accès libre
- Accéder au document
- http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006097&type=printable
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01814669/document
- Accès libre
- Accéder au document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01814669/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- journal.pcbi.1006097.pdf
- Accès libre
- Accéder au document
- file
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- journal.pcbi.1006097.pdf
- Accès libre
- Accéder au document