Improved search heuristics find 20 000 new ...
Type de document :
Article dans une revue scientifique: Article original
DOI :
PMID :
Titre :
Improved search heuristics find 20 000 new alignments between human and mouse genomes.
Auteur(s) :
Frith, Martin [Auteur correspondant]
National Institute of Advanced Industrial Science and Technology [AIST]
Noé, Laurent [Auteur]
Bioinformatics and Sequence Analysis [BONSAI]
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
National Institute of Advanced Industrial Science and Technology [AIST]
Noé, Laurent [Auteur]

Bioinformatics and Sequence Analysis [BONSAI]
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
Titre de la revue :
Nucleic Acids Research
Pagination :
e59
Éditeur :
Oxford University Press
Date de publication :
2014-02-03
ISSN :
0305-1048
Discipline(s) HAL :
Informatique [cs]/Bio-informatique [q-bio.QM]
Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
Résumé en anglais : [en]
Sequence similarity search is a fundamental way of analyzing nucleotide sequences. Despite decades of research, this is not a solved problem because there exist many similarities that are not found by current methods. ...
Lire la suite >Sequence similarity search is a fundamental way of analyzing nucleotide sequences. Despite decades of research, this is not a solved problem because there exist many similarities that are not found by current methods. Search methods are typically based on a seed-and-extend approach, which has many variants (e.g. spaced seeds, transition seeds), and it remains unclear how to optimize this approach. This study designs and tests seeding methods for inter-mammal and inter-insect genome comparison. By considering substitution patterns of real genomes, we design sets of multiple complementary transition seeds, which have better performance (sensitivity per run time) than previous seeding strategies. Often the best seed patterns have more transition positions than those used previously. We also point out that recent computer memory sizes (e.g. 60 GB) make it feasible to use multiple (e.g. eight) seeds for whole mammal genomes. Interestingly, the most sensitive settings achieve diminishing returns for human-dog and melanogaster-pseudoobscura comparisons, but not for human-mouse, which suggests that we still miss many human-mouse alignments. Our optimized heuristics find ∼20 000 new human-mouse alignments that are missing from the standard UCSC alignments. We tabulate seed patterns and parameters that work well so they can be used in future research.Lire moins >
Lire la suite >Sequence similarity search is a fundamental way of analyzing nucleotide sequences. Despite decades of research, this is not a solved problem because there exist many similarities that are not found by current methods. Search methods are typically based on a seed-and-extend approach, which has many variants (e.g. spaced seeds, transition seeds), and it remains unclear how to optimize this approach. This study designs and tests seeding methods for inter-mammal and inter-insect genome comparison. By considering substitution patterns of real genomes, we design sets of multiple complementary transition seeds, which have better performance (sensitivity per run time) than previous seeding strategies. Often the best seed patterns have more transition positions than those used previously. We also point out that recent computer memory sizes (e.g. 60 GB) make it feasible to use multiple (e.g. eight) seeds for whole mammal genomes. Interestingly, the most sensitive settings achieve diminishing returns for human-dog and melanogaster-pseudoobscura comparisons, but not for human-mouse, which suggests that we still miss many human-mouse alignments. Our optimized heuristics find ∼20 000 new human-mouse alignments that are missing from the standard UCSC alignments. We tabulate seed patterns and parameters that work well so they can be used in future research.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985675/pdf
- Accès libre
- Accéder au document
- Accès libre
- Accéder au document