On the relative value of data resampling approaches for software defect prediction

Kwabena Ebo Bennin, Jacky W. Keung, Akito Monden

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both classes in the data set to be equally balanced. Resampling approaches address this concern by modifying the class distribution to balance the minority and majority class distribution. However, very little is known about the best distribution for attaining high performance especially in a more practical scenario. There are still inconclusive results pertaining to the suitable ratio of defect and clean instances (Pfp), the statistical and practical impacts of resampling approaches on prediction performance and the more stable resampling approach across several performance measures. To assess the impact of resampling approaches, we investigated the bias and effect of commonly used resampling approaches on prediction accuracy in software defect prediction. Analyzes of six resampling approaches on 40 releases of 20 open-source projects across five performance measures and five imbalance rates were performed. The experimental results obtained indicate that there were statistical differences between the prediction results with and without resampling methods when evaluated with the geometric-mean, recall(pd), probability of false alarms(pf ) and balance performance measures. However, resampling methods could not improve the AUC values across all prediction models implying that resampling methods can help in defect classification but not defect prioritization. A stable Pfp rate was dependent on the performance measure used. Lower Pfp rates are required for lower pf values while higher Pfp values are required for higher pd values. Random Under-Sampling and Borderline-SMOTE proved to be the more stable resampling method across several performance measures among the studied resampling methods. Performance of resampling methods are dependent on the imbalance ratio, evaluation measure and to some extent the prediction model. Newer oversampling methods should aim at generating relevant and informative data samples and not just increasing the minority samples.

Original languageEnglish
Pages (from-to)1-35
Number of pages35
JournalEmpirical Software Engineering
DOIs
Publication statusAccepted/In press - Jun 21 2018

Fingerprint

Defects
Sampling

Keywords

  • Class imbalance
  • Data resampling approaches
  • Empirical study
  • Imbalanced data
  • Software defect prediction

ASJC Scopus subject areas

  • Software

Cite this

On the relative value of data resampling approaches for software defect prediction. / Bennin, Kwabena Ebo; Keung, Jacky W.; Monden, Akito.

In: Empirical Software Engineering, 21.06.2018, p. 1-35.

Research output: Contribution to journalArticle

@article{5f6808a977ef41299f1784401191d6c4,
title = "On the relative value of data resampling approaches for software defect prediction",
abstract = "Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both classes in the data set to be equally balanced. Resampling approaches address this concern by modifying the class distribution to balance the minority and majority class distribution. However, very little is known about the best distribution for attaining high performance especially in a more practical scenario. There are still inconclusive results pertaining to the suitable ratio of defect and clean instances (Pfp), the statistical and practical impacts of resampling approaches on prediction performance and the more stable resampling approach across several performance measures. To assess the impact of resampling approaches, we investigated the bias and effect of commonly used resampling approaches on prediction accuracy in software defect prediction. Analyzes of six resampling approaches on 40 releases of 20 open-source projects across five performance measures and five imbalance rates were performed. The experimental results obtained indicate that there were statistical differences between the prediction results with and without resampling methods when evaluated with the geometric-mean, recall(pd), probability of false alarms(pf ) and balance performance measures. However, resampling methods could not improve the AUC values across all prediction models implying that resampling methods can help in defect classification but not defect prioritization. A stable Pfp rate was dependent on the performance measure used. Lower Pfp rates are required for lower pf values while higher Pfp values are required for higher pd values. Random Under-Sampling and Borderline-SMOTE proved to be the more stable resampling method across several performance measures among the studied resampling methods. Performance of resampling methods are dependent on the imbalance ratio, evaluation measure and to some extent the prediction model. Newer oversampling methods should aim at generating relevant and informative data samples and not just increasing the minority samples.",
keywords = "Class imbalance, Data resampling approaches, Empirical study, Imbalanced data, Software defect prediction",
author = "Bennin, {Kwabena Ebo} and Keung, {Jacky W.} and Akito Monden",
year = "2018",
month = "6",
day = "21",
doi = "10.1007/s10664-018-9633-6",
language = "English",
pages = "1--35",
journal = "Empirical Software Engineering",
issn = "1382-3256",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - On the relative value of data resampling approaches for software defect prediction

AU - Bennin, Kwabena Ebo

AU - Keung, Jacky W.

AU - Monden, Akito

PY - 2018/6/21

Y1 - 2018/6/21

N2 - Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both classes in the data set to be equally balanced. Resampling approaches address this concern by modifying the class distribution to balance the minority and majority class distribution. However, very little is known about the best distribution for attaining high performance especially in a more practical scenario. There are still inconclusive results pertaining to the suitable ratio of defect and clean instances (Pfp), the statistical and practical impacts of resampling approaches on prediction performance and the more stable resampling approach across several performance measures. To assess the impact of resampling approaches, we investigated the bias and effect of commonly used resampling approaches on prediction accuracy in software defect prediction. Analyzes of six resampling approaches on 40 releases of 20 open-source projects across five performance measures and five imbalance rates were performed. The experimental results obtained indicate that there were statistical differences between the prediction results with and without resampling methods when evaluated with the geometric-mean, recall(pd), probability of false alarms(pf ) and balance performance measures. However, resampling methods could not improve the AUC values across all prediction models implying that resampling methods can help in defect classification but not defect prioritization. A stable Pfp rate was dependent on the performance measure used. Lower Pfp rates are required for lower pf values while higher Pfp values are required for higher pd values. Random Under-Sampling and Borderline-SMOTE proved to be the more stable resampling method across several performance measures among the studied resampling methods. Performance of resampling methods are dependent on the imbalance ratio, evaluation measure and to some extent the prediction model. Newer oversampling methods should aim at generating relevant and informative data samples and not just increasing the minority samples.

AB - Software defect data sets are typically characterized by an unbalanced class distribution where the defective modules are fewer than the non-defective modules. Prediction performances of defect prediction models are detrimentally affected by the skewed distribution of the faulty minority modules in the data set since most algorithms assume both classes in the data set to be equally balanced. Resampling approaches address this concern by modifying the class distribution to balance the minority and majority class distribution. However, very little is known about the best distribution for attaining high performance especially in a more practical scenario. There are still inconclusive results pertaining to the suitable ratio of defect and clean instances (Pfp), the statistical and practical impacts of resampling approaches on prediction performance and the more stable resampling approach across several performance measures. To assess the impact of resampling approaches, we investigated the bias and effect of commonly used resampling approaches on prediction accuracy in software defect prediction. Analyzes of six resampling approaches on 40 releases of 20 open-source projects across five performance measures and five imbalance rates were performed. The experimental results obtained indicate that there were statistical differences between the prediction results with and without resampling methods when evaluated with the geometric-mean, recall(pd), probability of false alarms(pf ) and balance performance measures. However, resampling methods could not improve the AUC values across all prediction models implying that resampling methods can help in defect classification but not defect prioritization. A stable Pfp rate was dependent on the performance measure used. Lower Pfp rates are required for lower pf values while higher Pfp values are required for higher pd values. Random Under-Sampling and Borderline-SMOTE proved to be the more stable resampling method across several performance measures among the studied resampling methods. Performance of resampling methods are dependent on the imbalance ratio, evaluation measure and to some extent the prediction model. Newer oversampling methods should aim at generating relevant and informative data samples and not just increasing the minority samples.

KW - Class imbalance

KW - Data resampling approaches

KW - Empirical study

KW - Imbalanced data

KW - Software defect prediction

UR - http://www.scopus.com/inward/record.url?scp=85048837828&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048837828&partnerID=8YFLogxK

U2 - 10.1007/s10664-018-9633-6

DO - 10.1007/s10664-018-9633-6

M3 - Article

AN - SCOPUS:85048837828

SP - 1

EP - 35

JO - Empirical Software Engineering

JF - Empirical Software Engineering

SN - 1382-3256

ER -