An empirical evaluation of outlier deletion methods for analogy-based cost estimation

Masateru Tsunoda, Akito Monden, Takeshi Kakimoto, Ken Ichi Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogybased estimation.

Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011 - Banff, AB, Canada
Duration: Sep 20 2011Sep 21 2011

Other

Other7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011
CountryCanada
CityBanff, AB
Period9/20/119/21/11

Fingerprint

Costs
Experiments

Keywords

  • Abnormal value
  • Case based reasoning
  • Effort prediction
  • Productivity
  • Project management

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Tsunoda, M., Monden, A., Kakimoto, T., & Matsumoto, K. I. (2011). An empirical evaluation of outlier deletion methods for analogy-based cost estimation. In ACM International Conference Proceeding Series [2020407] https://doi.org/10.1145/2020390.2020407

An empirical evaluation of outlier deletion methods for analogy-based cost estimation. / Tsunoda, Masateru; Monden, Akito; Kakimoto, Takeshi; Matsumoto, Ken Ichi.

ACM International Conference Proceeding Series. 2011. 2020407.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tsunoda, M, Monden, A, Kakimoto, T & Matsumoto, KI 2011, An empirical evaluation of outlier deletion methods for analogy-based cost estimation. in ACM International Conference Proceeding Series., 2020407, 7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011, Banff, AB, Canada, 9/20/11. https://doi.org/10.1145/2020390.2020407
Tsunoda M, Monden A, Kakimoto T, Matsumoto KI. An empirical evaluation of outlier deletion methods for analogy-based cost estimation. In ACM International Conference Proceeding Series. 2011. 2020407 https://doi.org/10.1145/2020390.2020407
Tsunoda, Masateru ; Monden, Akito ; Kakimoto, Takeshi ; Matsumoto, Ken Ichi. / An empirical evaluation of outlier deletion methods for analogy-based cost estimation. ACM International Conference Proceeding Series. 2011.
@inproceedings{738ec287055f4475b3aeb3ae9d0bf9a4,
title = "An empirical evaluation of outlier deletion methods for analogy-based cost estimation",
abstract = "Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10{\%} improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogybased estimation.",
keywords = "Abnormal value, Case based reasoning, Effort prediction, Productivity, Project management",
author = "Masateru Tsunoda and Akito Monden and Takeshi Kakimoto and Matsumoto, {Ken Ichi}",
year = "2011",
doi = "10.1145/2020390.2020407",
language = "English",
isbn = "9781450307093",
booktitle = "ACM International Conference Proceeding Series",

}

TY - GEN

T1 - An empirical evaluation of outlier deletion methods for analogy-based cost estimation

AU - Tsunoda, Masateru

AU - Monden, Akito

AU - Kakimoto, Takeshi

AU - Matsumoto, Ken Ichi

PY - 2011

Y1 - 2011

N2 - Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogybased estimation.

AB - Background: Any software project dataset sometimes includes outliers which affect the accuracy of effort estimation. Outlier deletion methods are often used to eliminate them. However, there are few case studies which apply outlier deletion methods to analogy-based estimation, so it is not clear which method is more suitable for analogy-based estimation. Aim: Clarifying the effects of existing outlier deletion methods (Cook's distance based deletion, LTS based deletion, k-means based deletion, Mantel's correlation based deletion, and EID based deletion) and our method for analogy-based estimation. Method: In the experiment, outlier deletion methods were applied to three kinds of datasets (the ISBSG, Kitchenham, and Desharnais datasets), and their estimation accuracy evaluated based on BRE (Balanced Relative Error). Our method eliminates outliers from the neighborhoods of a target project when the effort is extremely different from other neighborhoods. Results: Deletion methods which are designed to apply to analogy-based estimation (i.e. Mantel's correlation based deletion, EID based deletion, and our method) showed stable performance. Especially, only our method showed over 10% improvement of the average BRE on two datasets. Conclusions: It is reasonable to apply deletion methods designed for analogy-based estimation, and more preferable to apply our method to analogybased estimation.

KW - Abnormal value

KW - Case based reasoning

KW - Effort prediction

KW - Productivity

KW - Project management

UR - http://www.scopus.com/inward/record.url?scp=80054075985&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80054075985&partnerID=8YFLogxK

U2 - 10.1145/2020390.2020407

DO - 10.1145/2020390.2020407

M3 - Conference contribution

AN - SCOPUS:80054075985

SN - 9781450307093

BT - ACM International Conference Proceeding Series

ER -