Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models

Kwabena Ebo Bennin, Jacky Keung, Akito Monden

研究成果

7 被引用数 (Scopus)

抄録

Sampling methods are known to impact defect prediction performance. These sampling methods have configurable parameters that can significantly affect the prediction performance. It is however, impractical to assess the effect of all the possible different settings in the parameter space for all the several existing sampling methods. A constant and easy to tweak parameter present in all sampling methods is the distribution of the defective and non-defective modules in the dataset known as Pfp (% of fault-prone modules). In this paper, we investigate and assess the performance of defect prediction models where the Pfp parameter of sampling methods are tweaked. An empirical experiment and assessment of seven sampling methods on five prediction models over 20 releases of 10 static metric projects indicate that (1) Area Under the Receiver Operating Characteristics Curve (AUC) performance is not improved after tweaking the Pfp parameter, (2) pf (false alarms) performance degrades as the Pfp is increased. (3) a stable predictor is difficult to achieve across different Pfp rates. Hence, we conclude that the Pfp parameter setting can have a large impact on the performance (except AUC) of defect prediction models. We thus recommend researchers experiment with the Pfp parameter of the sampling method since the distribution of training datasets vary.

本文言語English
ホスト出版物のタイトルProceedings - 24th Asia-Pacific Software Engineering Conference, APSEC 2017
編集者Jian Lv, He Zhang, Xiao Liu, Mike Hinchey
出版社IEEE Computer Society
ページ630-635
ページ数6
ISBN(電子版)9781538636817
DOI
出版ステータスPublished - 3月 1 2018
イベント24th Asia-Pacific Software Engineering Conference, APSEC 2017 - Nanjing, Jiangsu
継続期間: 12月 4 201712月 8 2017

出版物シリーズ

名前Proceedings - Asia-Pacific Software Engineering Conference, APSEC
2017-December
ISSN(印刷版)1530-1362

Other

Other24th Asia-Pacific Software Engineering Conference, APSEC 2017
国/地域China
CityNanjing, Jiangsu
Period12/4/1712/8/17

ASJC Scopus subject areas

  • ソフトウェア

フィンガープリント

「Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル