TY - JOUR
T1 - Influence of outliers on estimation accuracy of software development effort
AU - Ono, Kenichi
AU - Tsunoda, Masateru
AU - Monden, Akito
AU - Matsumoto, Kenichi
N1 - Funding Information:
This research was partially supported by the Japan Society for the Promotion of Science (JSPS) [Grants-in-Aid for Scientific Research (A) (No.17H00731)]
Publisher Copyright:
Copyright © 2021 The Institute of Electronics, Information and Communication Engineers
PY - 2021/1/1
Y1 - 2021/1/1
N2 - When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.
AB - When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.
KW - Case-based reasoning
KW - Effort prediction
KW - Multiple linear regression
KW - Outlier
UR - http://www.scopus.com/inward/record.url?scp=85099248776&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099248776&partnerID=8YFLogxK
U2 - 10.1587/transinf.2020MPP0005
DO - 10.1587/transinf.2020MPP0005
M3 - Article
AN - SCOPUS:85099248776
VL - E104D
SP - 91
EP - 105
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
SN - 0916-8532
IS - 1
ER -