The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification

Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Passakorn Phannachitta, Solomon Mensah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect prediction models is still unknown. Goal: To investigate the statistical and practical significance of using resampled data for constructing defect prediction models. Method: We examine the practical effects of six data sampling methods on performances of five defect prediction models. The prediction performances of the models trained on default datasets (no sampling method) are compared with that of the models trained on resampled datasets (application of sampling methods). To decide whether the performance changes are significant or not, robust statistical tests are performed and effect sizes computed. Twenty releases of ten open source projects extracted from the PROMISE repository are considered and evaluated using the AUC, pd, pf and G-mean performance measures. Results: There are statistical significant differences and practical effects on the classification performance (pd, pf and G-mean) between models trained on resampled datasets and those trained on the default datasets. However, sampling methods have no statistical and practical effects on defect prioritization performance (AUC) with small or no effect values obtained from the models trained on the resampled datasets. Conclusions: Existing sampling methods can properly set the threshold between buggy and clean samples, while they cannot improve the prediction of defect-proneness itself. Sampling methods are highly recommended for defect classification purposes when all faulty modules are to be considered for testing.

Original languageEnglish
Title of host publicationProceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017
PublisherIEEE Computer Society
Pages364-373
Number of pages10
Volume2017-November
ISBN (Electronic)9781509040391
DOIs
Publication statusPublished - Dec 7 2017
Event11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017 - Toronto, Canada
Duration: Nov 9 2017Nov 10 2017

Other

Other11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017
CountryCanada
CityToronto
Period11/9/1711/10/17

Fingerprint

Sampling
Defects
Statistical tests
Testing

Keywords

  • Defect prediction
  • Empirical software engineering
  • Imbalanced data
  • Sampling methods
  • Statistical significance

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Bennin, K. E., Keung, J., Monden, A., Phannachitta, P., & Mensah, S. (2017). The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification. In Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017 (Vol. 2017-November, pp. 364-373). IEEE Computer Society. https://doi.org/10.1109/ESEM.2017.50

The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification. / Bennin, Kwabena Ebo; Keung, Jacky; Monden, Akito; Phannachitta, Passakorn; Mensah, Solomon.

Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017. Vol. 2017-November IEEE Computer Society, 2017. p. 364-373.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bennin, KE, Keung, J, Monden, A, Phannachitta, P & Mensah, S 2017, The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification. in Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017. vol. 2017-November, IEEE Computer Society, pp. 364-373, 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017, Toronto, Canada, 11/9/17. https://doi.org/10.1109/ESEM.2017.50
Bennin KE, Keung J, Monden A, Phannachitta P, Mensah S. The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification. In Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017. Vol. 2017-November. IEEE Computer Society. 2017. p. 364-373 https://doi.org/10.1109/ESEM.2017.50
Bennin, Kwabena Ebo ; Keung, Jacky ; Monden, Akito ; Phannachitta, Passakorn ; Mensah, Solomon. / The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification. Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017. Vol. 2017-November IEEE Computer Society, 2017. pp. 364-373
@inproceedings{88eb5dee2d804450ad8edff99f4664b1,
title = "The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification",
abstract = "Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect prediction models is still unknown. Goal: To investigate the statistical and practical significance of using resampled data for constructing defect prediction models. Method: We examine the practical effects of six data sampling methods on performances of five defect prediction models. The prediction performances of the models trained on default datasets (no sampling method) are compared with that of the models trained on resampled datasets (application of sampling methods). To decide whether the performance changes are significant or not, robust statistical tests are performed and effect sizes computed. Twenty releases of ten open source projects extracted from the PROMISE repository are considered and evaluated using the AUC, pd, pf and G-mean performance measures. Results: There are statistical significant differences and practical effects on the classification performance (pd, pf and G-mean) between models trained on resampled datasets and those trained on the default datasets. However, sampling methods have no statistical and practical effects on defect prioritization performance (AUC) with small or no effect values obtained from the models trained on the resampled datasets. Conclusions: Existing sampling methods can properly set the threshold between buggy and clean samples, while they cannot improve the prediction of defect-proneness itself. Sampling methods are highly recommended for defect classification purposes when all faulty modules are to be considered for testing.",
keywords = "Defect prediction, Empirical software engineering, Imbalanced data, Sampling methods, Statistical significance",
author = "Bennin, {Kwabena Ebo} and Jacky Keung and Akito Monden and Passakorn Phannachitta and Solomon Mensah",
year = "2017",
month = "12",
day = "7",
doi = "10.1109/ESEM.2017.50",
language = "English",
volume = "2017-November",
pages = "364--373",
booktitle = "Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification

AU - Bennin, Kwabena Ebo

AU - Keung, Jacky

AU - Monden, Akito

AU - Phannachitta, Passakorn

AU - Mensah, Solomon

PY - 2017/12/7

Y1 - 2017/12/7

N2 - Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect prediction models is still unknown. Goal: To investigate the statistical and practical significance of using resampled data for constructing defect prediction models. Method: We examine the practical effects of six data sampling methods on performances of five defect prediction models. The prediction performances of the models trained on default datasets (no sampling method) are compared with that of the models trained on resampled datasets (application of sampling methods). To decide whether the performance changes are significant or not, robust statistical tests are performed and effect sizes computed. Twenty releases of ten open source projects extracted from the PROMISE repository are considered and evaluated using the AUC, pd, pf and G-mean performance measures. Results: There are statistical significant differences and practical effects on the classification performance (pd, pf and G-mean) between models trained on resampled datasets and those trained on the default datasets. However, sampling methods have no statistical and practical effects on defect prioritization performance (AUC) with small or no effect values obtained from the models trained on the resampled datasets. Conclusions: Existing sampling methods can properly set the threshold between buggy and clean samples, while they cannot improve the prediction of defect-proneness itself. Sampling methods are highly recommended for defect classification purposes when all faulty modules are to be considered for testing.

AB - Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect prediction models is still unknown. Goal: To investigate the statistical and practical significance of using resampled data for constructing defect prediction models. Method: We examine the practical effects of six data sampling methods on performances of five defect prediction models. The prediction performances of the models trained on default datasets (no sampling method) are compared with that of the models trained on resampled datasets (application of sampling methods). To decide whether the performance changes are significant or not, robust statistical tests are performed and effect sizes computed. Twenty releases of ten open source projects extracted from the PROMISE repository are considered and evaluated using the AUC, pd, pf and G-mean performance measures. Results: There are statistical significant differences and practical effects on the classification performance (pd, pf and G-mean) between models trained on resampled datasets and those trained on the default datasets. However, sampling methods have no statistical and practical effects on defect prioritization performance (AUC) with small or no effect values obtained from the models trained on the resampled datasets. Conclusions: Existing sampling methods can properly set the threshold between buggy and clean samples, while they cannot improve the prediction of defect-proneness itself. Sampling methods are highly recommended for defect classification purposes when all faulty modules are to be considered for testing.

KW - Defect prediction

KW - Empirical software engineering

KW - Imbalanced data

KW - Sampling methods

KW - Statistical significance

UR - http://www.scopus.com/inward/record.url?scp=85042378748&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042378748&partnerID=8YFLogxK

U2 - 10.1109/ESEM.2017.50

DO - 10.1109/ESEM.2017.50

M3 - Conference contribution

AN - SCOPUS:85042378748

VL - 2017-November

SP - 364

EP - 373

BT - Proceedings - 11th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017

PB - IEEE Computer Society

ER -