MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction

Kwabena E. Bennin, Jacky Keung, Passakorn Phannachitta, Akito Monden, Solomon Mensah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This study presents MAHAKIL, a novel and efficient synthetic over-sampling approach for software defect datasets that is based on the chromosomal theory of inheritance. Exploiting this theory, MAHAKIL interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution. We extensively compare MAHAKIL with five other sampling approaches using 20 releases of defect datasets from the PROMISE repository and five prediction models. Our experiments indicate that MAHAKIL improves the prediction performance for all the models and achieves better and more significant pf values than the other oversampling approaches, based on robust statistical tests.

Original languageEnglish
Title of host publicationProceedings of the 40th International Conference on Software Engineering, ICSE 2018
PublisherIEEE Computer Society
Number of pages1
VolumePart F137142
ISBN (Electronic)9781450356381
DOIs
Publication statusPublished - May 27 2018
Event40th International Conference on Software Engineering, ICSE 2018 - Gothenburg, Sweden
Duration: May 27 2018Jun 3 2018

Other

Other40th International Conference on Software Engineering, ICSE 2018
Country/TerritorySweden
CityGothenburg
Period5/27/186/3/18

Keywords

  • Class imbalance learning
  • Classification problems
  • Data sampling methods
  • Software defect prediction
  • Synthetic sample generation

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction'. Together they form a unique fingerprint.

Cite this