TY - GEN
T1 - Using Bandit Algorithms for Project Selection in Cross-Project Defect Prediction
AU - Asano, Takuya
AU - Tsunoda, Masateru
AU - Toda, Koji
AU - Tahir, Amjed
AU - Bennin, Kwabena Ebo
AU - Nakasai, Keitaro
AU - Monden, Akito
AU - Matsumoto, Kenichi
N1 - Funding Information:
ACKNOWLEDGMENT This research is partially supported by the Japan Society for the Promotion of Science [Grants-in-Aid for Scientific Research (C) and (S) (No.21K11840 and No. 20H05706).
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Background: defect prediction model is built using historical data from previous versions/releases of the same project. However, such historical data may not exist in case of newly developed projects. Alternatively, one can train a model using data obtained from external projects. This approach is known as cross-project defect prediction (CPDP). In CPDP, it is still difficult to utilize external projects' data or decide which particular project to use to train a model. Aim: to address this issue, we apply bandit algorithm (BA) to CPDP in order to select the most suitable training project from a set of projects. Method: BA-based prediction iteratively reselects the project after each module is tested, considering the accuracy of the predictions. As baselines, we used simple CPDP methods such as training a model with randomly selected project. All models were built using logistic regression. Results: We experimented our approach on two datasets (NASA and DAMB, with a total of 12 projects). The BA-based defect prediction models resulted in, on average, a higher accuracy (AUC and F1 score) than the baselines. Conclusion: in this preliminarily study, we demonstrate the feasibility of using BA in the context of CPDP. Our initial assessment shows that the use BA for predicting defects in CPDP is promising and may outperform existing approaches.
AB - Background: defect prediction model is built using historical data from previous versions/releases of the same project. However, such historical data may not exist in case of newly developed projects. Alternatively, one can train a model using data obtained from external projects. This approach is known as cross-project defect prediction (CPDP). In CPDP, it is still difficult to utilize external projects' data or decide which particular project to use to train a model. Aim: to address this issue, we apply bandit algorithm (BA) to CPDP in order to select the most suitable training project from a set of projects. Method: BA-based prediction iteratively reselects the project after each module is tested, considering the accuracy of the predictions. As baselines, we used simple CPDP methods such as training a model with randomly selected project. All models were built using logistic regression. Results: We experimented our approach on two datasets (NASA and DAMB, with a total of 12 projects). The BA-based defect prediction models resulted in, on average, a higher accuracy (AUC and F1 score) than the baselines. Conclusion: in this preliminarily study, we demonstrate the feasibility of using BA in the context of CPDP. Our initial assessment shows that the use BA for predicting defects in CPDP is promising and may outperform existing approaches.
KW - CPFP
KW - external validity
KW - fault prediction
KW - multi-armed bandit
KW - online optimization
KW - risk-based testing
UR - http://www.scopus.com/inward/record.url?scp=85123343328&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123343328&partnerID=8YFLogxK
U2 - 10.1109/ICSME52107.2021.00074
DO - 10.1109/ICSME52107.2021.00074
M3 - Conference contribution
AN - SCOPUS:85123343328
T3 - Proceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
SP - 649
EP - 653
BT - Proceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
Y2 - 27 September 2021 through 1 October 2021
ER -