• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Optimal neighborhood indexing for protein ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Article dans une revue scientifique: Article original
DOI :
10.1186/1471-2105-9-534
Title :
Optimal neighborhood indexing for protein similarity search
Author(s) :
Peterlongo, Pierre [Auteur correspondant]
Noé, Laurent [Auteur] refId
Sequential Learning [SEQUOIA]
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
Lavenier, Dominique [Auteur]
Biological systems and models, bioinformatics and sequences [SYMBIOSE]
Nguyen, van Hoa [Auteur]
Biological systems and models, bioinformatics and sequences [SYMBIOSE]
Kucherov, Gregory [Auteur]
Sequential Learning [SEQUOIA]
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
Giraud, Mathieu [Auteur] refId
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
Journal title :
BMC Bioinformatics
Publisher :
BioMed Central
Publication date :
2008
ISSN :
1471-2105
HAL domain(s) :
Informatique [cs]/Algorithme et structure de données [cs.DS]
Informatique [cs]/Bio-informatique [q-bio.QM]
Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
English abstract : [en]
Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In ...
Show more >
Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet. The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated \evalue parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/inria-00340510/document
  • Open access
  • Access the document
Thumbnail
  • https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/1471-2105-9-534
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/inria-00340510/document
  • Open access
  • Access the document
Thumbnail
  • document
  • Open access
  • Access the document
Thumbnail
  • journal2.pdf
  • Open access
  • Access the document
Thumbnail
  • 1471-2105-9-534
  • Open access
  • Access the document
Thumbnail
  • document
  • Open access
  • Access the document
Thumbnail
  • journal2.pdf
  • Open access
  • Access the document
Université de Lille

Mentions légales
Accessibilité : non conforme
Université de Lille © 2017