Stereoisomers Are Not Machine Learning's ...
Type de document :
Article dans une revue scientifique: Article original
DOI :
PMID :
URL permanente :
Titre :
Stereoisomers Are Not Machine Learning's Best Friends.
Auteur(s) :
Tahıl, G. [Auteur]
Centre de Recherche en Informatique de Lens [CRIL]
Unité de Catalyse et Chimie du Solide - UMR 8181 [UCCS]
Delorme, F. [Auteur]
Centre de Recherche en Informatique de Lens [CRIL]
Le Berre, D. [Auteur]
Centre de Recherche en Informatique de Lens [CRIL]
Monflier, Eric [Auteur]
Unité de Catalyse et Chimie du Solide (UCCS) - UMR 8181
Sayede, Adlane [Auteur]
Unité de Catalyse et Chimie du Solide - UMR 8181 [UCCS]
Tilloy, Sebastien [Auteur]
Unité de Catalyse et Chimie du Solide - UMR 8181 [UCCS]
Centre de Recherche en Informatique de Lens [CRIL]
Unité de Catalyse et Chimie du Solide - UMR 8181 [UCCS]
Delorme, F. [Auteur]
Centre de Recherche en Informatique de Lens [CRIL]
Le Berre, D. [Auteur]
Centre de Recherche en Informatique de Lens [CRIL]
Monflier, Eric [Auteur]
Unité de Catalyse et Chimie du Solide (UCCS) - UMR 8181
Sayede, Adlane [Auteur]
Unité de Catalyse et Chimie du Solide - UMR 8181 [UCCS]
Tilloy, Sebastien [Auteur]
Unité de Catalyse et Chimie du Solide - UMR 8181 [UCCS]
Titre de la revue :
J Chem Inf Model
Nom court de la revue :
J Chem Inf Model
Numéro :
64
Pagination :
5451–5469
Date de publication :
2024-07-05
ISSN :
1549-960X
Mot(s)-clé(s) en anglais :
Algorithms
Molecular modeling
Molecular structure
Molecules
Stereochemistry
Molecular modeling
Molecular structure
Molecules
Stereochemistry
Discipline(s) HAL :
Chimie/Catalyse
Résumé en anglais : [en]
This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a ...
Lire la suite >This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES that can distinguish stereoisomers. However, such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information on molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared to our original machine learning task: predicting the association constant between cyclodextrin and a guest molecule.Lire moins >
Lire la suite >This study addresses the challenge of accurately identifying stereoisomers in cheminformatics, which originates from our objective to apply machine learning to predict the association constant between cyclodextrin and a guest. Identifying stereoisomers is indeed crucial for machine learning applications. Current tools offer various molecular descriptors, including their textual representation as Isomeric SMILES that can distinguish stereoisomers. However, such representation is text-based and does not have a fixed size, so a conversion is needed to make it usable to machine learning approaches. Word embedding techniques can be used to solve this problem. Mol2vec, a word embedding approach for molecules, offers such a conversion. Unfortunately, it cannot distinguish between stereoisomers due to its inability to capture the spatial configuration of molecular structures. This study proposes several approaches that use word embedding techniques to handle molecular discrimination using stereochemical information on molecules or considering Isomeric SMILES notation as a text in Natural Language Processing. Our aim is to generate a distinct vector for each unique molecule, correctly identifying stereoisomer information in cheminformatics. The proposed approaches are then compared to our original machine learning task: predicting the association constant between cyclodextrin and a guest molecule.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Établissement(s) :
Université de Lille
CNRS
Centrale Lille
ENSCL
Univ. Artois
CNRS
Centrale Lille
ENSCL
Univ. Artois
Collections :
Date de dépôt :
2024-07-23T21:01:50Z
2024-08-23T09:20:03Z
2024-08-23T09:20:03Z