TY - GEN
T1 - Empirical evaluation of cost overrun prediction with imbalance data
AU - Tsunoda, Masateru
AU - Monden, Akito
AU - Shibata, Jun Ichiro
AU - Matsumoto, Ken Ichi
PY - 2011
Y1 - 2011
N2 - To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.
AB - To prevent cost overrun of software projects, it is necessary for project managers to identify projects which have high risk of cost overrun in the early phase. So far, discriminant methods such as linear discriminant analysis and logistic regression have been used to predict cost overrun projects. However, accuracy of discriminant methods often becomes low when a dataset used for predict is imbalanced, i.e. there exists a large difference between the number of cost overrun projects and non cost overrun projects. In this paper, we compared accuracy of linear discriminant analysis, logistic regression, classification tree, Mahalanobis-Taguchi method, and collaborative filtering, by changing the percentage of cost overrun projects in the dataset. The result showed that collaborative filtering was highest accuracy among five methods. When the number of cost overrun projects and non cost overrun was balanced in the dataset, linear discriminant analysis was second highest accuracy, and when it was not balanced, Mahalanobis-Taguchi method was second highest among five methods.
KW - Collaborative Filtering
KW - Mahalanobis-Taguchi method
KW - biased data
KW - failure prone project
KW - risk management
UR - http://www.scopus.com/inward/record.url?scp=84055184267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84055184267&partnerID=8YFLogxK
U2 - 10.1109/ICIS.2011.71
DO - 10.1109/ICIS.2011.71
M3 - Conference contribution
AN - SCOPUS:84055184267
SN - 9780769544014
T3 - Proceedings - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011
SP - 415
EP - 420
BT - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011
T2 - 2011 10th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2011
Y2 - 16 May 2011 through 18 May 2011
ER -