Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models

Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei, Naoyasu Ubayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
PublisherIEEE Computer Society
Pages154-163
Number of pages10
Volume1
ISBN (Electronic)9781467388450
DOIs
Publication statusPublished - Aug 24 2016
Event2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016 - Atlanta, United States
Duration: Jun 10 2016Jun 14 2016

Other

Other2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016
CountryUnited States
CityAtlanta
Period6/10/166/14/16

Fingerprint

Testing
Sampling
Quality assurance
Experiments

Keywords

  • class imbalance
  • empirical study
  • sampling techniques
  • software fault prediction
  • software quality

ASJC Scopus subject areas

  • Software

Cite this

Bennin, K. E., Keung, J., Monden, A., Kamei, Y., & Ubayashi, N. (2016). Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models. In Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016 (Vol. 1, pp. 154-163). [7552003] IEEE Computer Society. https://doi.org/10.1109/COMPSAC.2016.144

Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models. / Bennin, Kwabena Ebo; Keung, Jacky; Monden, Akito; Kamei, Yasutaka; Ubayashi, Naoyasu.

Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016. Vol. 1 IEEE Computer Society, 2016. p. 154-163 7552003.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bennin, KE, Keung, J, Monden, A, Kamei, Y & Ubayashi, N 2016, Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models. in Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016. vol. 1, 7552003, IEEE Computer Society, pp. 154-163, 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016, Atlanta, United States, 6/10/16. https://doi.org/10.1109/COMPSAC.2016.144
Bennin KE, Keung J, Monden A, Kamei Y, Ubayashi N. Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models. In Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016. Vol. 1. IEEE Computer Society. 2016. p. 154-163. 7552003 https://doi.org/10.1109/COMPSAC.2016.144
Bennin, Kwabena Ebo ; Keung, Jacky ; Monden, Akito ; Kamei, Yasutaka ; Ubayashi, Naoyasu. / Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models. Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016. Vol. 1 IEEE Computer Society, 2016. pp. 154-163
@inproceedings{f0022679cfd14bf198d88f5ac6458f80,
title = "Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models",
abstract = "To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30){\%} on the trainingdatasets implying that a strictly balanced dataset (50{\%} faultymodules and 50{\%} clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies.",
keywords = "class imbalance, empirical study, sampling techniques, software fault prediction, software quality",
author = "Bennin, {Kwabena Ebo} and Jacky Keung and Akito Monden and Yasutaka Kamei and Naoyasu Ubayashi",
year = "2016",
month = "8",
day = "24",
doi = "10.1109/COMPSAC.2016.144",
language = "English",
volume = "1",
pages = "154--163",
booktitle = "Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models

AU - Bennin, Kwabena Ebo

AU - Keung, Jacky

AU - Monden, Akito

AU - Kamei, Yasutaka

AU - Ubayashi, Naoyasu

PY - 2016/8/24

Y1 - 2016/8/24

N2 - To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies.

AB - To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies.

KW - class imbalance

KW - empirical study

KW - sampling techniques

KW - software fault prediction

KW - software quality

UR - http://www.scopus.com/inward/record.url?scp=84987981885&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84987981885&partnerID=8YFLogxK

U2 - 10.1109/COMPSAC.2016.144

DO - 10.1109/COMPSAC.2016.144

M3 - Conference contribution

VL - 1

SP - 154

EP - 163

BT - Proceedings - 2016 IEEE 40th Annual Computer Software and Applications Conference, COMPSAC 2016

PB - IEEE Computer Society

ER -