Data Mining

Description

Aucune description disponible pour cet axe de recherche.

Membres

Thouraya Sakouhi

Wassim Ayadi

Amel ZIDI

Marwa Chabbouh

Salah Ghodhbani

Publications

2025

Marwa Chabbouh, Slim Bechikh, Lamjed Ben Said, Efrén Mezura-Montes
Evolutionary optimization of the area under precision-recall curve for classifying imbalanced multi-class data

J. Heuristics 31(1): 9 (2025), 2025

Résumé

Classification of imbalanced multi-class data is still so far one of the most challenging issues in machine learning and data mining. This task becomes more serious when classes containing fewer instances are located in overlapping regions. Several approaches have been proposed through the literature to deal with these two issues such as the use of decomposition, the design of ensembles, the employment of misclassification costs, and the development of ad-hoc strategies. Despite these efforts, the number of existing works dealing with the imbalance in multi-class data is much reduced compared to the case of binary classification. Moreover, existing approaches still suffer from many limits. These limitations include difficulties in handling imbalances across multiple classes, challenges in adapting sampling techniques, limitations of certain classifiers, the need for specialized evaluation metrics, the complexity of data representation, and increased computational costs. Motivated by these observations, we propose a multi-objective evolutionary induction approach that evolves a population of NLM-DTs (Non-Linear Multivariate Decision Trees) using the $θ$ -NSGA-III ( $θ$ -Non-dominated Sorting Genetic Algorithm-III) as a search engine. The resulting algorithm is termed EMO-NLM-DT (Evolutionary Multi-objective Optimization of NLM-DTs) and is designed to optimize the construction of NLM-DTs for imbalanced multi-class data classification by simultaneously maximizing both the Macro-Average-Precision and the Macro-Average-Recall as two possibly conflicting objectives. The choice of these two measures as objective functions is motivated by a recent study on the appropriateness of performance metrics for imbalanced data classification, which suggests that the mAURPC (mean Area Under Recall Precision Curve) satisfies all necessary conditions for imbalanced multi-class classification. Moreover, the NLM-DT adoption as a baseline classifier to be optimized allows the generation non-linear hyperplanes that are well-adapted to the classes ‘boundaries’ geometrical shapes. The statistical analysis of the comparative experimental results on more than twenty imbalanced multi-class data sets reveals the outperformance of EMO-NLM-DT in building NLM-DTs that are highly effective in classifying imbalanced multi-class data compared to seven relevant and recent state-of-the-art methods.
2024

Thouraya Sakouhi, Jalel Akaichi
Clustering-based multidimensional sequential pattern mining of semantic trajectories

International Journal of Data Mining, Modelling and Management, 16(2), 148-175., 2024

Résumé

Knowledge discovery from mobility data is about identifying behaviours from trajectories. In fact, mining masses of trajectories is required to have an overview of this data, notably, investigate the relationship between different entities movement. Most state-of-the-art work in this issue operates on raw trajectories. Nevertheless, behaviours discovered from raw trajectories are not as rich and meaningful as those discovered from semantic trajectories. In this paper, we establish a mining approach to extract patterns from semantic trajectories. We propose to apply sequential pattern mining based on a pre-processing step of clustering to alleviate the former's temporal complexity. Mining considers the spatial and temporal dimensions at different levels of granularity providing then richer and more insightful patterns about humans behaviour. We evaluate our work on tourists semantic trajectories in Kyoto. Results showed the effectiveness and efficiency of our model compared to state-of-the-art work.

Salah Ghodhbani, Sabeur Elkosantini
A Spatial-Temporal DLApproach for Traffic Flow Prediction Using Attention Fusion Method

The proposed model can extract comprehensive features from various transportation data and effectively capture the spatial-temporal dependencies. By merging these features, it aims to generate more accurate and robust traffic flow predictions. This method, 2024

Résumé

in recent years, traffic flow prediction has presented challenges in the management of transportation systems. It is a crucial part of Intelligent Transportation Systems (ITS). The complexities of various transportation data, spatial and temporal dependencies on road networks, and multimodalities, such as public transit, pedestrian flow, and bike sharing, make it a challenging task to forecast traffic flow accurately. Numerous works have been introduced to address these challenges, but few have simultaneously considered these factors, resulting in limited success. In this study, a model is proposed to integrate Graph Convolutional Networks (GCN) and Bidirectional Long Short-Term Memory (BiLSTM). This model utilizes the advantages of GCN in handling spatial data and capturing dependencies in road networks, combined with BiLSTM's capability in learning temporal dynamics. The proposed model can extract comprehensive features from various transportation data and effectively capture the spatial-temporal dependencies. By merging these features, it aims to generate more accurate and robust traffic flow predictions. This method addresses the limitations of existing methods that fail to consider spatial-temporal dependencies and multimodalities, leading to improved prediction accuracy and efficiency
2021

Thouraya Sakouhi, Jalel Akaichi
Dynamic and multi-source semantic annotation of raw mobility data using geographic and social media data

Pervasive and Mobile Computing, 71, 101310., 2021

Résumé

Nowadays, positioning technologies have become widely available providing then large datasets of individuals’ mobility data. Actually, annotating raw traces with contextual information brings semantics to them and then provides a better understanding of people behavior. To do so, literature work explored novel techniques to enrich raw mobility data with contextual information using either geographic context represented by landmarks/points of interest or widely used social media feeds. Accordingly, in this work, a novel approach integrating three data sources: raw mobility data, geographic information and social media feeds for a two-fold trajectory semantic annotation process is presented. In a first step, structured trajectories are constructed using geographic information. Later, the former are annotated by event-related words grasped from social media. Indeed, combining both data sources could result in a more complete annotation of trajectories. The proposed approach is experimented and evaluated on datasets of tourists in Kyoto. Results showed that the proposed approach quantitatively performed well compared to previous work in terms of precision of annotation words that maintained $≃ 0.9$ when recall reached 50%, while improving its quality by consolidating both sources of semantics.
2019

Marwa Chabbouh, Slim Bechikh, Lamjed Ben Said, Chih-Cheng Hung
Multi-objective evolution of oblique decision trees for imbalanced data binary classification

Swarm Evol. Comput. 49: 1-22 (2019), 2019

Résumé

Imbalanced data classification is one of the most challenging problems in data mining. In this kind of problems, we have two types of classes: the majority class and the minority one. The former has a relatively high number of instances while the latter contains a much less number of instances. As most traditional classifiers usually assume that data is evenly distributed for all classes, they may considerably fail in recognizing instances in the minority class due to the imbalance problem. Several interesting approaches have been proposed to handle the class imbalance issue in the literature and the Oblique Decision Tree (ODT) is one of them. Nevertheless, most standard ODT construction algorithms use a greedy search process; while only very few works have addressed this induction problem using an evolutionary approach and this is done without really considering the class imbalance issue. To cope with this limitation, we propose in this paper a multi-objective evolutionary approach to find optimized ODTs for imbalanced binary classification. Our approach, called ODT-Θ-NSGA-III (ODT-based-Θ-Nondominated Sorting Genetic Algorithm-III), is motivated by its abilities: (a) to escape local optima in the ODT search space and (b) to maximize simultaneously both Precision and Recall. Thanks to these two features, ODT-Θ-NSGA-III provides competitive and better results when compared to many state-of-the-art classification algorithms on commonly used imbalanced benchmark data sets.
2017

Chedi Abdelkarim, Lilia Rejeb, Lamjed Ben Said, Maha Elarbi
Evidential learning classifier system

In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 123-124), 2017

Résumé

During the last decades, Learning Classifier Systems have known many advancements that were highlighting their potential to resolve complex problems. Despite the advantages offered by these algorithms, it is important to tackle other aspects such as the uncertainty to improve their performance. In this paper, we present a new Learning Classifier System (LCS) that deals with uncertainty in the class selection in particular imprecision. Our idea is to integrate the Belief function theory in the sUpervised Classifier System (UCS) for classification purpose. The new approach proved to be efficient to resolve several classification problems.

Description

Membres

Publications

Evolutionary optimization of the area under precision-recall curve for classifying imbalanced multi-class data

Résumé

Clustering-based multidimensional sequential pattern mining of semantic trajectories

Résumé

A Spatial-Temporal DLApproach for Traffic Flow Prediction Using Attention Fusion Method

Résumé

Dynamic and multi-source semantic annotation of raw mobility data using geographic and social media data

Résumé

Multi-objective evolution of oblique decision trees for imbalanced data binary classification

Résumé

Evidential learning classifier system

Résumé