Empirical evaluation of cost overrun prediction with imbalance data

Masateru Tsunoda, Akito Monden, Jun Ichiro Shibata, Ken Ichi Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.

Original languageEnglish
Title of host publicationProceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011
Pages415-420
Number of pages6
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011 - Sanya, Hainan Island, China
Duration: May 16 2011May 18 2011

Other

Other2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011
CountryChina
CitySanya, Hainan Island
Period5/16/115/18/11

Fingerprint

Costs
Discriminant analysis
Collaborative filtering
Taguchi methods
Logistics
Trees (mathematics)
Managers

Keywords

  • biased data
  • Collaborative Filtering
  • failure prone project
  • Mahalanobis-Taguchi method
  • risk management

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems

Cite this

Tsunoda, M., Monden, A., Shibata, J. I., & Matsumoto, K. I. (2011). Empirical evaluation of cost overrun prediction with imbalance data. In Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011 (pp. 415-420). [6086505] https://doi.org/10.1109/ICIS.2011.71

Empirical evaluation of cost overrun prediction with imbalance data. / Tsunoda, Masateru; Monden, Akito; Shibata, Jun Ichiro; Matsumoto, Ken Ichi.

Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011. 2011. p. 415-420 6086505.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tsunoda, M, Monden, A, Shibata, JI & Matsumoto, KI 2011, Empirical evaluation of cost overrun prediction with imbalance data. in Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011., 6086505, pp. 415-420, 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011, Sanya, Hainan Island, China, 5/16/11. https://doi.org/10.1109/ICIS.2011.71
Tsunoda M, Monden A, Shibata JI, Matsumoto KI. Empirical evaluation of cost overrun prediction with imbalance data. In Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011. 2011. p. 415-420. 6086505 https://doi.org/10.1109/ICIS.2011.71
Tsunoda, Masateru ; Monden, Akito ; Shibata, Jun Ichiro ; Matsumoto, Ken Ichi. / Empirical evaluation of cost overrun prediction with imbalance data. Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011. 2011. pp. 415-420
@inproceedings{1d32747bc93d4d34947e4a111141eb46,
title = "Empirical evaluation of cost overrun prediction with imbalance data",
abstract = "To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.",
keywords = "biased data, Collaborative Filtering, failure prone project, Mahalanobis-Taguchi method, risk management",
author = "Masateru Tsunoda and Akito Monden and Shibata, {Jun Ichiro} and Matsumoto, {Ken Ichi}",
year = "2011",
doi = "10.1109/ICIS.2011.71",
language = "English",
isbn = "9780769544014",
pages = "415--420",
booktitle = "Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011",

}

TY - GEN

T1 - Empirical evaluation of cost overrun prediction with imbalance data

AU - Tsunoda, Masateru

AU - Monden, Akito

AU - Shibata, Jun Ichiro

AU - Matsumoto, Ken Ichi

PY - 2011

Y1 - 2011

N2 - To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.

AB - To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.

KW - biased data

KW - Collaborative Filtering

KW - failure prone project

KW - Mahalanobis-Taguchi method

KW - risk management

UR - http://www.scopus.com/inward/record.url?scp=84055184267&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84055184267&partnerID=8YFLogxK

U2 - 10.1109/ICIS.2011.71

DO - 10.1109/ICIS.2011.71

M3 - Conference contribution

AN - SCOPUS:84055184267

SN - 9780769544014

SP - 415

EP - 420

BT - Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011

ER -