Data Mining

Description

Aucune description disponible pour cet axe de recherche.

Publications

  • 2025
    Marwa Chabbouh, Slim Bechikh, Lamjed Ben Said, Efrén Mezura-Montes

    Evolutionary optimization of the area under precision-recall curve for classifying imbalanced multi-class data

    J. Heuristics 31(1): 9 (2025), 2025

    Résumé

    Classification of imbalanced multi-class data is still so far one of the most challenging issues in machine learning and data mining. This task becomes more serious when classes containing fewer instances are located in overlapping regions. Several approaches have been proposed through the literature to deal with these two issues such as the use of decomposition, the design of ensembles, the employment of misclassification costs, and the development of ad-hoc strategies. Despite these efforts, the number of existing works dealing with the imbalance in multi-class data is much reduced compared to the case of binary classification. Moreover, existing approaches still suffer from many limits. These limitations include difficulties in handling imbalances across multiple classes, challenges in adapting sampling techniques, limitations of certain classifiers, the need for specialized evaluation metrics, the complexity of data representation, and increased computational costs. Motivated by these observations, we propose a multi-objective evolutionary induction approach that evolves a population of NLM-DTs (Non-Linear Multivariate Decision Trees) using the -NSGA-III (-Non-dominated Sorting Genetic Algorithm-III) as a search engine. The resulting algorithm is termed EMO-NLM-DT (Evolutionary Multi-objective Optimization of NLM-DTs) and is designed to optimize the construction of NLM-DTs for imbalanced multi-class data classification by simultaneously maximizing both the Macro-Average-Precision and the Macro-Average-Recall as two possibly conflicting objectives. The choice of these two measures as objective functions is motivated by a recent study on the appropriateness of performance metrics for imbalanced data classification, which suggests that the mAURPC (mean Area Under Recall Precision Curve) satisfies all necessary conditions for imbalanced multi-class classification. Moreover, the NLM-DT adoption as a baseline classifier to be optimized allows the generation non-linear hyperplanes that are well-adapted to the classes ‘boundaries’ geometrical shapes. The statistical analysis of the comparative experimental results on more than twenty imbalanced multi-class data sets reveals the outperformance of EMO-NLM-DT in building NLM-DTs that are highly effective in classifying imbalanced multi-class data compared to seven relevant and recent state-of-the-art methods.

  • Thouraya Sakouhi, Jalel Akaichi

    Clustering-based multidimensional sequential pattern mining of semantic trajectories

    International Journal of Data Mining, Modelling and Management, 16(2), 148-175., 2024

    Résumé

    Knowledge discovery from mobility data is about identifying behaviours from trajectories. In fact, mining masses of trajectories is required to have an overview of this data, notably, investigate the relationship between different entities movement. Most state-of-the-art work in this issue operates on raw trajectories. Nevertheless, behaviours discovered from raw trajectories are not as rich and meaningful as those discovered from semantic trajectories. In this paper, we establish a mining approach to extract patterns from semantic trajectories. We propose to apply sequential pattern mining based on a pre-processing step of clustering to alleviate the former's temporal complexity. Mining considers the spatial and temporal dimensions at different levels of granularity providing then richer and more insightful patterns about humans behaviour. We evaluate our work on tourists semantic trajectories in Kyoto. Results showed the effectiveness and efficiency of our model compared to state-of-the-art work.

  • Thouraya Sakouhi, Jalel Akaichi

    Dynamic and multi-source semantic annotation of raw mobility data using geographic and social media data

    Pervasive and Mobile Computing, 71, 101310., 2021

    Résumé

    Nowadays, positioning technologies have become widely available providing then large datasets of individuals’ mobility data. Actually, annotating raw traces with contextual information brings semantics to them and then provides a better understanding of people behavior. To do so, literature work explored novel techniques to enrich raw mobility data with contextual information using either geographic context represented by landmarks/points of interest or widely used social media feeds. Accordingly, in this work, a novel approach integrating three data sources: raw mobility data, geographic information and social media feeds for a two-fold trajectory semantic annotation process is presented. In a first step, structured trajectories are constructed using geographic information. Later, the former are annotated by event-related words grasped from social media. Indeed, combining both data sources could result in a more complete annotation of trajectories. The proposed approach is experimented and evaluated on datasets of tourists in Kyoto. Results showed that the proposed approach quantitatively performed well compared to previous work in terms of precision of annotation words that maintained  when recall reached 50%, while improving its quality by consolidating both sources of semantics.

  • Marwa Chabbouh, Slim Bechikh, Lamjed Ben Said, Chih-Cheng Hung

    Multi-objective evolution of oblique decision trees for imbalanced data binary classification

    Swarm Evol. Comput. 49: 1-22 (2019), 2019

    Résumé

    Imbalanced data classification is one of the most challenging problems in data mining. In this kind of problems, we have two types of classes: the majority class and the minority one. The former has a relatively high number of instances while the latter contains a much less number of instances. As most traditional classifiers usually assume that data is evenly distributed for all classes, they may considerably fail in recognizing instances in the minority class due to the imbalance problem. Several interesting approaches have been proposed to handle the class imbalance issue in the literature and the Oblique Decision Tree (ODT) is one of them. Nevertheless, most standard ODT construction algorithms use a greedy search process; while only very few works have addressed this induction problem using an evolutionary approach and this is done without really considering the class imbalance issue. To cope with this limitation, we propose in this paper a multi-objective evolutionary approach to find optimized ODTs for imbalanced binary classification. Our approach, called ODT-Θ-NSGA-III (ODT-based-Θ-Nondominated Sorting Genetic Algorithm-III), is motivated by its abilities: (a) to escape local optima in the ODT search space and (b) to maximize simultaneously both Precision and Recall. Thanks to these two features, ODT-Θ-NSGA-III provides competitive and better results when compared to many state-of-the-art classification algorithms on commonly used imbalanced benchmark data sets.
  • Chedi Abdelkarim, Lilia Rejeb, Lamjed Ben Said, Maha Elarbi

    Evidential learning classifier system

    In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 123-124), 2017

    Résumé

    During the last decades, Learning Classifier Systems have known many advancements that were highlighting their potential to resolve complex problems. Despite the advantages offered by these algorithms, it is important to tackle other aspects such as the uncertainty to improve their performance. In this paper, we present a new Learning Classifier System (LCS) that deals with uncertainty in the class selection in particular imprecision. Our idea is to integrate the Belief function theory in the sUpervised Classifier System (UCS) for classification purpose. The new approach proved to be efficient to resolve several classification problems.