Sofian Boutaib

Informations générales

Grade

Maître Assistant

Biographie courte

Sofien Boutaib is an Assistant Professor in Computer Science. He received his Ph.D. degree in Computer Science from the University of Tunis in 2022, where he conducted research on Code Smells Detection and Identification in Software Systems under the supervision of Dr. Slim Bechikh at the Strategies for Modelling and ARtificial inTelligence (SMART) Lab. He obtained his Master’s degree in Computer Science with a very well mention from the Higher Institute of Management of Tunis in 2017, defending a thesis on Incremental Possibilistic Decision Tree (IPDT) advised by Prof. Zied Elouedi. He received his Bachelor’s degree in Computer Science from the Higher Institute of Management of Tunis in 2015. His research interests include software maintenance and evolution, source code quality, machine learning, and uncertainty theories. Since 2020, he has served as a Review Board Member for IEEE Access, Elsevier’s Applied Soft Computing (ASOC), and Information Sciences.

Équipes

SMART-Optimization

Axes de recherche

Optimization

Soft Computing

Software Engineering

Publications

2025

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Carlos A Coello Coello, Lamjed Ben Said
Cross-Project Code Smell Detection as a Dynamic Optimization Problem: An Evolutionary Memetic Approach

IEEE Congress on Evolutionary Computation (CEC), 2025

Résumé

Code smells signal poor software design that can prevent maintainability and scalability. Identifying code smells is difficult because of the large volume of code, considerable detection expenses, and the substantial effort needed for manual tagging. Although current techniques perform well in within-project situations, they frequently struggle to adapt to cross-project environments that have varying data distributions. In this paper, we introduce CLADES (Cross-project Learning and Adaptation for Detection of Code Smells), a hybrid evolutionary approach consisting of three main modules: Initialization, Evolution, and Adaptation. The first module generates an initial population of decision tree detectors using labeled within-project data and evaluates their quality through fitness functions based on structural code metrics. The evolution module applies genetic operators (selection, crossover, and mutation) to create new offspring solutions. To handle cross-project scenarios, the adaptation module employs a clustering-based instance selection technique that identifies representative instances from new projects, which are added to the dataset and used to repair the decision trees through simulated annealing. These locally refined decision trees are then evolved using a genetic algorithm, thus enabling continuous adaptation to new project instances. The resulting optimized decision tree detectors are then employed to predict labels for the new unlabeled project instances. We assess CLADES across five open-source projects and we show that it has a better performance with respect to baseline techniques in terms of weighted F1-score and AUC-PR metrics. These results emphasize its capacity to effectively adjust to different project environments, facilitating precise and scalable detection of code smells while minimizing the need for manual review, contributing to more robust and maintainable software systems.
2022

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Fabio Palomba, Lamjed Ben Said
A bi-level evolutionary approach for the multi-label detection of smelly classes

Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO), 2022

Résumé

This paper presents a new evolutionary method and tool called BMLDS (Bi-level Multi-Label Detection of Smells) that optimizes a population of classifier chains for the multi-label detection of smells. As the chain is sensitive to the labels' (i.e., smell types) order, the chains induction task is framed as a bi-level optimization problem, where the upper-level role is to search for the optimal order of each considered chain while the lower-level one is to generate the chains. This allows taking into consideration the interactions between smells in the multi-label detection process. The statistical analysis of the experimental results reveals the merits of our proposal with respect to several existing works.

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Fabio Palomba, Lamjed Ben Said
Handling uncertainty in SBSE: a possibilistic evolutionary approach for code smells detection

Empirical Software Engineering, 2022

Résumé

Code smells, also known as anti-patterns, are poor design or implementation choices that hinder program comprehensibility and maintainability. While several code smell detection methods have been proposed, Mantyla et al. identified the uncertainty issue as one of the major individual human factors that may affect developer’s decisions about the smelliness of software classes: they may indeed have different opinions mainly due to their different knowledge and expertise. Unfortunately, almost all the existing approaches assume data perfection and neglect the uncertainty when identifying the labels of the software classes. Ignoring or rejecting any uncertainty form could lead to a considerable loss of information, which could significantly deteriorate the effectiveness of the detection and identification processes. Inspired by our previous works and motivated by the interesting performance of the PDT (Possibilistic Decision Tree) in classifying uncertain data, we propose ADIPE (Anti-pattern Detection and Identification using Possibilistic decision tree Evolution), as a new tool that evolves and optimizes a set of detectors (PDTs) that could effectively deal with software class labels uncertainty using some concepts from the Possibility theory. ADIPE uses a PBE (Possibilistic Base of Examples: a dataset with possibilistic labels) that it is built using a set of opinion-based classifiers (i.e., a set of probabilistic classifiers) with the aim to simulate human developers’ uncertainty. A set of advisors and probabilistic classifiers are employed in order to mimic the subjectivity and the doubtfulness of software engineers. A detailed experimental study is conducted to show the merits and outperformance of ADIPE in dealing with uncertainty in code smells detection and identification with respect to four relevant state-of-the-art methods, including the baseline PDT. The experimental study was performed in uncertain and certain environments based on two suitable metrics: PF-measure_dist (Possibilistic F-measure_Distance) and IAC (Information Affinity Criterion); which corresponds to the F-measure and Accuracy (PCC) for the certain case. The obtained results for the uncertain environment reveal that for the detection process, the PF-measure_dist of ADIPE ranges within [0.9047 and 0.9285], and its IAC lies within [0.9288 and 0.9557]; while for the identification process, the PF-measure_dist of ADIPE is in [0.8545, 0.9228], and its IAC lies within [0.8751, 0.933]. ADIPE is able to find 35% more code smells with uncertain data than the second best algorithm (i.e., BLOP). In addition, ADIPE succeeds to decrease the number of false alarms (i.e., misclassified smelly instances) with a rate equals to 12%. Our proposed approach is also able to identify 43% more smell types than BLOP and decreases the number of false alarms with a rate equals to 32%. Similar results were obtained for the certain environment, which demonstrate the ability of ADIPE to also deal with the certain environment.

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Carlos A Coello Coello, Lamjed Ben Said
Uncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approach

Applied Soft Computing, 2022

Résumé

Code smells (a.k.a. anti-patterns) are manifestations of poor design solutions that can deteriorate software maintainability and evolution. Existing works did not take into account the issue of uncertain class labels, which is an important inherent characteristic of the smells detection problem. More precisely, two human experts may have different degrees of uncertainty about the smelliness of a particular software class not only for the smell detection task but also for the smell type identification one. Unluckily, existing approaches usually reject and/or ignore uncertain data that correspond to software classes (i.e. dataset instances) with uncertain labels. Throwing away and/or disregarding the uncertainty factor could considerably degrade the detection/identification process effectiveness. From a solution approach viewpoint, there is no work in the literature that proposed a method that is able to detect and/or identify code smells while preserving the uncertainty aspect. The main goal of our research work is to handle the uncertainty factor, issued from human experts, in detecting and/or identifying code smells by proposing an evolutionary approach that is able to deal with anti-patterns classification with uncertain labels. We suggest Bi-ADIPOK, as an effective search-based tool that is capable to tackle the previously mentioned challenge for both detection and identification cases. The proposed method corresponds to an EA (Evolutionary Algorithm) that optimizes a set of detectors encoded as PK-NNs (Possibilistic K-nearest neighbors) based on a bi-level hierarchy, in which the upper level role consists on finding the optimal PK-NNs parameters, while the lower level one is to generate the PK-NNs. A newly fitness function has been proposed fitness function PomAURPC-OVA_dist (Possibilistic modified Area Under Recall Precision Curve One-Versus-All_distance, abbreviated PAURPC_d in this paper). Bi-ADIPOK is able to deal with label uncertainty using some concepts stemming from the Possibility Theory. Furthermore, the PomAURPC-OVA_dist is capable to process the uncertainty issue even with imbalanced data. We notice that Bi-ADIPOK is first built and then validated using a possibilistic base of smell examples that simulates and mimics the subjectivity of software engineers opinions. The statistical analysis of the obtained results on a set of comparative experiments with respect to four relevant state-of-the-art methods shows the merits of our proposal. The obtained detection results demonstrate that, for the uncertain environment, the PomAURPC-OVA_dist of Bi-ADIPOK ranges between 0.902 and 0.932 and its IAC lies between 0.9108 and 0.9407, while for the certain environment, the PomAURPC-OVA_dist lies between 0.928 and 0.955 and the IAC ranges between 0.9477 and 0.9622. Similarly, the identification results, for the uncertain environment, indicate that the PomAURPC-OVA_dist of Bi-ADIPOK varies between 0.8576 and 0.9273 and its IAC is between 0.8693 and 0.9318. For the certain environment, the PomAURPC-OVA_dist lies between 0.8613 and 0.9351 and the IAC values are between 0.8672 and 0.9476. With uncertain data, Bi-ADIPOK can find 35% more code smells than the second best approach (i.e., BLOP). Furthermore, Bi-ADIPOK has succeeded to reduce the number of false alarms (i.e., misclassified smelly instances) by 12%. In addition, our proposed approach can identify 43% more smell types than BLOP and reduces the number of false alarms by 32%. The same results have been obtained for the certain environment, demonstrating Bi-ADIPOK’s ability to deal with such environment.
2021

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Chih-Cheng Hung, Lamjed Ben Said
Software Anti-patterns Detection Under Uncertainty Using a Possibilistic Evolutionary Approach

24th European Conference on Genetic Programming, 2021

Résumé

Code smells (a.k.a. anti-patterns) are manifestations of poor design solutions that could deteriorate the software maintainability and evolution. Despite the high number of existing detection methods, the issue of class label uncertainty is usually omitted. Indeed, two human experts may have different degrees of uncertainty about the smelliness of a particular software class not only for the smell detection task but also for the smell type identification one. Thus, this uncertainty should be taken into account and then processed by detection tools. Unfortunately, these latter usually reject and/or ignore uncertain data that correspond to software classes (i.e. dataset instances) with uncertain labels. This practice could considerably degrade the detection/identification process effectiveness. Motivated by this observation and the interesting performance of the Possibilistic K-NN (PK-NN) classifier in dealing with uncertain data, we propose a new possibilistic evolutionary detection method, named ADIPOK (Anti-patterns Detection and Identification using Possibilistic Optimized K-NNs), that is able to deal with label uncertainty using some concepts stemming from the Possibility theory. ADIPOK is validated using a possibilistic base of smell examples that simulates the subjectivity of software engineers’ opinions’ uncertainty. The statistical analysis of the obtained results on a set of comparative experiments with respect to four state-of-the-art methods show the merits of our proposed method.

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Mohamed Makhlouf, Lamjed Ben Said
Dealing with Label Uncertainty in Web Service Anti-patterns Detection using a Possibilistic Evolutionary Approach

IEEE International Conference on Web Services (ICWS), 2021

Résumé

Like the case of any software, Web Services (WSs) developers could introduce anti-patterns due to the lack of experience and badly-planned changes. During the last decade, search-based approaches have shown their outperformance over other approaches mainly thanks to their global search ability. Unfortunately, these approaches do not consider the uncertainty of class labels. In fact, two experts could be uncertain about the smelliness of a particular WS interface but also about the smell type. Currently, existing works reject uncertain data that correspond to WSs interfaces with doubtful labels. Motivated by this observation and the good performance of the possibilistic K-NN classifier in handling uncertain data, we propose a new evolutionary detection approach, named Web Services Anti-patterns Detection and Identification using Possibilistic Optimized K-NNs (WS-ADIPOK), which can cope with the uncertainty based on the Possibility Theory. The obtained experimental results reveal the merits of our proposal regarding four relevant state-of-the-art approaches.

Sofian Boutaib, Maha Elarbi, Slim Bechikh, Fabio Palomba, Lamjed Ben Said
A Possibilistic Evolutionary Approach to Handle the Uncertainty of Software Metrics Thresholds in Code Smells Detection

IEEE International Conference on Software Quality, Reliability and Security (QRS), 2021

Résumé

A code smells detection rule is a combination of metrics with their corresponding crisp thresholds and labels. The goal of this paper is to deal with metrics' thresholds uncertainty; as usually such thresholds could not be exactly determined to judge the smelliness of a particular software class. To deal with this issue, we first propose to encode each metric value into a binary possibility distribution with respect to a threshold computed from a discretization technique; using the Possibilistic C-means classifier. Then, we propose ADIPOK-UMT as an evolutionary algorithm that evolves a population of PK-NN classifiers for the detection of smells under thresholds' uncertainty. The experimental results reveal that the possibility distribution-based encoding allows the implicit weighting of software metrics (features) with respect to their computed discretization thresholds. Moreover, ADIPOK-UMT is shown to outperform four relevant state-of-art approaches on a set of commonly adopted benchmark software systems.
2020

Sofian Boutaib, Slim Bechikh, Fabio Palomba, Maha Elarbi, Mohamed Makhlouf, Lamjed Ben Said
Code smell detection and identification in imbalanced environments

Expert Systems with Applications, 2020

Résumé

Code smells are sub-optimal design choices that could lower software maintainability. Previous literature did not consider an important characteristic of the smell detection problem, namely data imbalance. When considering a high number of code smell types, the number of smelly classes is likely to largely exceed the number of non-smelly ones, and vice versa. Moreover, most studies did address the smell identification problem, which is more likely to present a higher imbalance as the number of smelly classes is relatively much less than the number of non-smelly ones. Furthermore, an additional research gap in the literature consists in the fact that the number of smell type identification methods is very small compared to the detection ones. The main challenges in smell detection and identification in an imbalanced environment are: (1) the structuring of the smell detector that should be able to deal with complex splitting boundaries and small disjuncts, (2) the design of the detector quality evaluation function that should take into account data imbalance, and (3) the efficient search for effective software metrics’ thresholds that should well characterize the different smells. Furthermore, the number of smell type identification methods is very small compared to the detection ones. We propose ADIODE, an effective search-based engine that is able to deal with all the above-described challenges not only for the smell detection case but also for the identification one. Indeed, ADIODE is an EA (Evolutionary Algorithm) that evolves a population of detectors encoded as ODTs (Oblique Decision Trees) using the F-measure as a fitness function. This allows ADIODE to efficiently approximate globally-optimal detectors with effective oblique splitting hyper-planes and metrics’ thresholds. We note that to build the BE, each software class is parsed using a particular tool with the aim to extract its metrics’ values, based on which the considered class is labeled by means of a set of existing advisors; which could be seen as a two-step construction process. A comparative experimental study on six open-source software systems demonstrates the merits and the outperformance of our approach compared to four of the most representative and prominent baseline techniques available in literature. The detection results show that the F-measure of ADIODE ranges between 91.23 % and 95.24 %, and its AUC lies between 0.9273 and 0.9573. Similarly, the identification results indicate that the F-measure of ADIODE varies between 86.26 % and 94.5 %, and its AUC is between 0.8653 and 0.9531.

Sofian Boutaib, Slim Bechikh, Carlos A Coello Coello, Chih-Cheng Hung, Lamjed Ben Said
Handling uncertainty in code smells detection using a possibilistic SBSE approach

Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, 2020

Résumé

Code smells, also known as anti-patterns, are indicators of bad design solutions. However, two different experts may have different opinions not only about the smelliness of a particular software class but also about the smell type. This causes an uncertainty problem that should be taken into account. Unfortunately, existing works reject uncertain data that correspond to software classes with doubtful labels. Uncertain data rejection could cause a significant loss of information that could considerably degrade the performance of the detection process. Motivated by this observation and the good performance of the possibilistic K-NN classifier in handling uncertain data, we propose in this paper a new evolutionary detection method, named ADIPOK (Anti-pattern Detection and Identification using Possibilistic Optimized K-NN), that is able to cope with the uncertainty factor using the possibility theory. The comparative experimental results reveal the merits of our proposal with respect to four relevant state-of-the-art approaches.

Informations générales

Équipes

Axes de recherche

Publications

Cross-Project Code Smell Detection as a Dynamic Optimization Problem: An Evolutionary Memetic Approach

Résumé

A bi-level evolutionary approach for the multi-label detection of smelly classes

Résumé

Handling uncertainty in SBSE: a possibilistic evolutionary approach for code smells detection

Résumé

Uncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approach

Résumé

Software Anti-patterns Detection Under Uncertainty Using a Possibilistic Evolutionary Approach

Résumé

Dealing with Label Uncertainty in Web Service Anti-patterns Detection using a Possibilistic Evolutionary Approach

Résumé

A Possibilistic Evolutionary Approach to Handle the Uncertainty of Software Metrics Thresholds in Code Smells Detection

Résumé

Code smell detection and identification in imbalanced environments

Résumé

Handling uncertainty in code smells detection using a possibilistic SBSE approach

Résumé