A Genetic and Graph-Guided Feature Learning ...
Type de document :
Pré-publication ou Document de travail
Titre :
A Genetic and Graph-Guided Feature Learning Strategy for Improving Decision Tree Construction
Auteur(s) :
Karabadji, Nour El Islem [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Amara Korba, Abdelaziz [Auteur]
Laboratoire Informatique, Image et Interaction - EA 2118 [L3I]
Assi, Ali [Auteur]
Seridi, Hassina [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Aimen, Mohamed [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Ghamri-Doudane, Yacine [Auteur]
Laboratoire d'Informatique Gaspard-Monge [LIGM]
Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise [ENSIIE]
Lakhdari, Abdelghani [Auteur]
Elati, Mohamed [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Dhifli, Wajdi [Auteur]
Hétérogénéité, Plasticité et Résistance aux Thérapies des Cancers = Cancer Heterogeneity, Plasticity and Resistance to Therapies - UMR 9020 - U 1277 [CANTHER]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Amara Korba, Abdelaziz [Auteur]
Laboratoire Informatique, Image et Interaction - EA 2118 [L3I]
Assi, Ali [Auteur]
Seridi, Hassina [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Aimen, Mohamed [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Ghamri-Doudane, Yacine [Auteur]
Laboratoire d'Informatique Gaspard-Monge [LIGM]
Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise [ENSIIE]
Lakhdari, Abdelghani [Auteur]
Elati, Mohamed [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Dhifli, Wajdi [Auteur]
Hétérogénéité, Plasticité et Résistance aux Thérapies des Cancers = Cancer Heterogeneity, Plasticity and Resistance to Therapies - UMR 9020 - U 1277 [CANTHER]
Mot(s)-clé(s) en anglais :
Decision Tree
Feature selection
Genetic Algorithm
Feature selection
Genetic Algorithm
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Résumé en anglais : [en]
<div><p>Machine learning algorithms have offered unprecedented solutions for many real-world problems. These algorithms frequently involve using a large number of features. However, several of these features could not be ...
Lire la suite ><div><p>Machine learning algorithms have offered unprecedented solutions for many real-world problems. These algorithms frequently involve using a large number of features. However, several of these features could not be very informative due to data uncertainties, such as noise and residual variation. Decision trees are among the most preferred classification models. This is due to their simplicity, explainability, and readability. However, data inaccuracies could impact the construction of decision trees and thus hinder their results. Feature selection and construction present promising research direction to enhance the performance of decision tree models. In this paper, we present a strategy that combines feature selection and construction where the construction of new features is performed by using the ones that were not chosen during the selection step. However, the search space of combinations of selected/constructed features is extremely large. To find the best solution, a genetic algorithm has been developed combined with a graph covering vertices set guided approach. The obtained results on a large number of datasets from the UCI Repository demonstrate that our approach outperforms both recent and classical decision tree construction techniques. We also present a successful use case of our approach in detecting Botnet traffic in the Internet of Vehicles.</p></div>Lire moins >
Lire la suite ><div><p>Machine learning algorithms have offered unprecedented solutions for many real-world problems. These algorithms frequently involve using a large number of features. However, several of these features could not be very informative due to data uncertainties, such as noise and residual variation. Decision trees are among the most preferred classification models. This is due to their simplicity, explainability, and readability. However, data inaccuracies could impact the construction of decision trees and thus hinder their results. Feature selection and construction present promising research direction to enhance the performance of decision tree models. In this paper, we present a strategy that combines feature selection and construction where the construction of new features is performed by using the ones that were not chosen during the selection step. However, the search space of combinations of selected/constructed features is extremely large. To find the best solution, a genetic algorithm has been developed combined with a graph covering vertices set guided approach. The obtained results on a large number of datasets from the UCI Repository demonstrate that our approach outperforms both recent and classical decision tree construction techniques. We also present a successful use case of our approach in detecting Botnet traffic in the Internet of Vehicles.</p></div>Lire moins >
Langue :
Anglais
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- informationSciences%20%283%29.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- informationSciences%20%283%29.pdf
- Accès libre
- Accéder au document