Evolutionary machine learning

Description

Aucune description disponible pour cet axe de recherche.

Publications

  • 2025
    Marwa Chabbouh, Slim Bechikh, Lamjed Ben Said, Efrén Mezura-Montes

    Evolutionary optimization of the area under precision-recall curve for classifying imbalanced multi-class data

    J. Heuristics 31(1): 9 (2025), 2025

    Résumé

    Classification of imbalanced multi-class data is still so far one of the most challenging issues in machine learning and data mining. This task becomes more serious when classes containing fewer instances are located in overlapping regions. Several approaches have been proposed through the literature to deal with these two issues such as the use of decomposition, the design of ensembles, the employment of misclassification costs, and the development of ad-hoc strategies. Despite these efforts, the number of existing works dealing with the imbalance in multi-class data is much reduced compared to the case of binary classification. Moreover, existing approaches still suffer from many limits. These limitations include difficulties in handling imbalances across multiple classes, challenges in adapting sampling techniques, limitations of certain classifiers, the need for specialized evaluation metrics, the complexity of data representation, and increased computational costs. Motivated by these observations, we propose a multi-objective evolutionary induction approach that evolves a population of NLM-DTs (Non-Linear Multivariate Decision Trees) using the -NSGA-III (-Non-dominated Sorting Genetic Algorithm-III) as a search engine. The resulting algorithm is termed EMO-NLM-DT (Evolutionary Multi-objective Optimization of NLM-DTs) and is designed to optimize the construction of NLM-DTs for imbalanced multi-class data classification by simultaneously maximizing both the Macro-Average-Precision and the Macro-Average-Recall as two possibly conflicting objectives. The choice of these two measures as objective functions is motivated by a recent study on the appropriateness of performance metrics for imbalanced data classification, which suggests that the mAURPC (mean Area Under Recall Precision Curve) satisfies all necessary conditions for imbalanced multi-class classification. Moreover, the NLM-DT adoption as a baseline classifier to be optimized allows the generation non-linear hyperplanes that are well-adapted to the classes ‘boundaries’ geometrical shapes. The statistical analysis of the comparative experimental results on more than twenty imbalanced multi-class data sets reveals the outperformance of EMO-NLM-DT in building NLM-DTs that are highly effective in classifying imbalanced multi-class data compared to seven relevant and recent state-of-the-art methods.

  • Rihab Said, Slim Bechikh, Carlos A. Coello Coello, Lamjed Ben Said

    Solving the Discretization-based Feature Construction Problem using Bi-level Evolutionary Optimization

    2023 IEEE Congress on Evolutionary Computation (CEC), Chicago, IL, USA, 2023, pp. 1-8, 2023

    Résumé

    Feature construction represents a crucial data preprocessing technique in machine learning applications because it ensures the creation of new informative features from the original ones. This fact leads to the improvement of the classification performance and the reduction of the problem dimensionality. Since many feature construction methods require discrete data, it is important to perform discretization in order to transform the constructed features given in continuous values into their corresponding discrete versions. To deal with this situation, the aim of this paper is to jointly perform feature construction and feature discretization in a synchronous manner in order to benefit from the advantages of each process. Thus, we propose here to model the discretization-based feature construction task as a bi-level optimization problem in which the constructed features are evaluated based on their optimized sequence of cut-points. The resulting algorithm is termed Discretization-Based Feature Construction (Bi-DFC) where the proposed model is solved using an improved version of an existing co-evolutionary algorithm, named I-CEMBA that ensures the variation of concatenation trees. Bi-DFC performs the selection of original attributes at the upper level and ensures the creation and the evaluation of constructed features at the upper level based on their optimal corresponding sequence of cut-points. The obtained experimental results on ten high-dimensional datasets illustrate the ability of Bi-DFC in outperforming relevant state-of-the-art approaches in terms of classification results.

    Marwa Chabbouh, Slim Bechikh, Lamjed Ben Said, Efrén Mezura-Montes

    Imbalanced multi-label data classification as a bi-level optimization problem: application to miRNA-related diseases diagnosis

    Neural Comput. Appl. 35(22): 16285-16303 (2023), 2023

    Résumé

    In multi-label classification, each instance could be assigned multiple labels at the same time. In such a situation, the relationships between labels and the class imbalance are two serious issues that should be addressed. Despite the important number of existing multi-label classification methods, the widespread class imbalance among labels has not been adequately addressed. Two main issues should be solved to come up with an effective classifier for imbalanced multi-label data. On the one hand, the imbalance could occur between labels and/or within a label. The “Between-labels imbalance” occurs where the imbalance is between labels however the “Within-label imbalance” occurs where the imbalance is in the label itself and it could occur across multiple labels. On the other hand, the labels’ processing order heavily influences the quality of a multi-label classifier. To deal with these challenges, we propose in this paper a bi-level evolutionary approach for the optimized induction of multivariate decision trees, where the upper-level role is to design the classifiers while the lower-level approximates the optimal labels’ ordering for each classifier. Our proposed method, named BIMLC-GA (Bi-level Imbalanced Multi-Label Classification Genetic Algorithm), is compared to several state-of-the-art methods across a variety of imbalanced multi-label data sets from several application fields and then applied on the miRNA-related diseases case study. The statistical analysis of the obtained results shows the merits of our proposal.

  • Slim Bechikh, Maha Elarbi, Chih-Cheng Hung, Sabrine Hamdi, Lamjed Ben Said

    A Hybrid Evolutionary Algorithm with Heuristic Mutation for Multi-objective Bi-clustering

    In 2019 IEEE Congress on Evolutionary Computation (CEC) (pp. 2323-2330). IEEE, 2019

    Résumé

    Bi-clustering is one of the main tasks in data mining with several application domains. It consists in partitioning a data set based on both rows and columns simultaneously. One of the main difficulties in bi-clustering is the issue of finding the number of bi-clusters, which is usually a user-specified parameter. Recently, in 2017, a new multi-objective evolutionary clustering algorithm, called MOCK-II, has shown its effectiveness in data clustering while automatically determining the number of clusters. Motivated by the promising results of MOCK-II, we propose in this paper a hybrid extension of this algorithm for the case of bi-clustering. Our new algorithm, called MOBICK, uses an efficient solution encoding, an effective crossover operator, and a heuristic mutation strategy. Similarly to MOCK-II, MOBICK is able to find automatically the number of bi-clusters. The outperformance of our algorithm is shown on a set of real gene expression data sets against several existing state-of-the-art works. Moreover, to be able to compare MOBICK to MOCK-I and MOCK-II, we have designed two basic extensions of MOCK-I and MOCK-II for the case of bi-clustering that we named B-MOCK-I and B-MOCK-II. Again, the experimental results confirm the merits of our proposal.