TY - GEN
T1 - An empirical evaluation of outlier deletion methods for analogy-based cost estimation
AU - Tsunoda, Masateru
AU - Monden, Akito
AU - Kakimoto, Takeshi
AU - Matsumoto, Ken Ichi
PY - 2011
Y1 - 2011
N2 - Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogybased estimation.
AB - Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogybased estimation.
KW - Abnormal value
KW - Case based reasoning
KW - Effort prediction
KW - Productivity
KW - Project management
UR - http://www.scopus.com/inward/record.url?scp=80054075985&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80054075985&partnerID=8YFLogxK
U2 - 10.1145/2020390.2020407
DO - 10.1145/2020390.2020407
M3 - Conference contribution
AN - SCOPUS:80054075985
SN - 9781450307093
T3 - ACM International Conference Proceeding Series
BT - PROMISE 2011 - 7th International Conference on Predictive Models in Software Engineering, Co-located with ESEM 2011
T2 - 7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011
Y2 - 20 September 2011 through 21 September 2011
ER -