A Genetic and Graph-Guided Feature Learning ...
Document type :
Pré-publication ou Document de travail
Title :
A Genetic and Graph-Guided Feature Learning Strategy for Improving Decision Tree Construction
Author(s) :
Karabadji, Nour El Islem [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Amara Korba, Abdelaziz [Auteur]
Laboratoire Informatique, Image et Interaction - EA 2118 [L3I]
Assi, Ali [Auteur]
Seridi, Hassina [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Aimen, Mohamed [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Ghamri-Doudane, Yacine [Auteur]
Laboratoire d'Informatique Gaspard-Monge [LIGM]
Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise [ENSIIE]
Lakhdari, Abdelghani [Auteur]
Elati, Mohamed [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Dhifli, Wajdi [Auteur]
Hétérogénéité, Plasticité et Résistance aux Thérapies des Cancers = Cancer Heterogeneity, Plasticity and Resistance to Therapies - UMR 9020 - U 1277 [CANTHER]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Amara Korba, Abdelaziz [Auteur]
Laboratoire Informatique, Image et Interaction - EA 2118 [L3I]
Assi, Ali [Auteur]
Seridi, Hassina [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Aimen, Mohamed [Auteur]
Laboratoire de Gestion Electronique de Document [Annaba] [LabGED]
Ghamri-Doudane, Yacine [Auteur]
Laboratoire d'Informatique Gaspard-Monge [LIGM]
Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise [ENSIIE]
Lakhdari, Abdelghani [Auteur]
Elati, Mohamed [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Dhifli, Wajdi [Auteur]
Hétérogénéité, Plasticité et Résistance aux Thérapies des Cancers = Cancer Heterogeneity, Plasticity and Resistance to Therapies - UMR 9020 - U 1277 [CANTHER]
English keyword(s) :
Decision Tree
Feature selection
Genetic Algorithm
Feature selection
Genetic Algorithm
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
English abstract : [en]
<div><p>Machine learning algorithms have offered unprecedented solutions for many real-world problems. These algorithms frequently involve using a large number of features. However, several of these features could not be ...
Show more ><div><p>Machine learning algorithms have offered unprecedented solutions for many real-world problems. These algorithms frequently involve using a large number of features. However, several of these features could not be very informative due to data uncertainties, such as noise and residual variation. Decision trees are among the most preferred classification models. This is due to their simplicity, explainability, and readability. However, data inaccuracies could impact the construction of decision trees and thus hinder their results. Feature selection and construction present promising research direction to enhance the performance of decision tree models. In this paper, we present a strategy that combines feature selection and construction where the construction of new features is performed by using the ones that were not chosen during the selection step. However, the search space of combinations of selected/constructed features is extremely large. To find the best solution, a genetic algorithm has been developed combined with a graph covering vertices set guided approach. The obtained results on a large number of datasets from the UCI Repository demonstrate that our approach outperforms both recent and classical decision tree construction techniques. We also present a successful use case of our approach in detecting Botnet traffic in the Internet of Vehicles.</p></div>Show less >
Show more ><div><p>Machine learning algorithms have offered unprecedented solutions for many real-world problems. These algorithms frequently involve using a large number of features. However, several of these features could not be very informative due to data uncertainties, such as noise and residual variation. Decision trees are among the most preferred classification models. This is due to their simplicity, explainability, and readability. However, data inaccuracies could impact the construction of decision trees and thus hinder their results. Feature selection and construction present promising research direction to enhance the performance of decision tree models. In this paper, we present a strategy that combines feature selection and construction where the construction of new features is performed by using the ones that were not chosen during the selection step. However, the search space of combinations of selected/constructed features is extremely large. To find the best solution, a genetic algorithm has been developed combined with a graph covering vertices set guided approach. The obtained results on a large number of datasets from the UCI Repository demonstrate that our approach outperforms both recent and classical decision tree construction techniques. We also present a successful use case of our approach in detecting Botnet traffic in the Internet of Vehicles.</p></div>Show less >
Language :
Anglais
Collections :
Source :
Files
- document
- Open access
- Access the document
- informationSciences%20%283%29.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- informationSciences%20%283%29.pdf
- Open access
- Access the document