Filter-INC: Handling effort-inconsistency in software effort estimation datasets

Passakorn Phannachitta, Jacky Keung, Kwabena Ebo Bennin, Akito Monden, Kenichi Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.

Original languageEnglish
Title of host publicationProceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016
PublisherIEEE Computer Society
Pages185-192
Number of pages8
ISBN (Electronic)9781509055753
DOIs
Publication statusPublished - Mar 30 2017
Event23rd Asia-Pacific Software Engineering Conference, APSEC 2016 - Hamilton, New Zealand
Duration: Dec 6 2016Dec 9 2016

Other

Other23rd Asia-Pacific Software Engineering Conference, APSEC 2016
CountryNew Zealand
CityHamilton
Period12/6/1612/9/16

Fingerprint

Learning systems

Keywords

  • Data preprocessing
  • Effort-inconsistency
  • Empirical software engineering
  • Software effort estimation

ASJC Scopus subject areas

  • Software

Cite this

Phannachitta, P., Keung, J., Bennin, K. E., Monden, A., & Matsumoto, K. (2017). Filter-INC: Handling effort-inconsistency in software effort estimation datasets. In Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016 (pp. 185-192). [7890587] IEEE Computer Society. https://doi.org/10.1109/APSEC.2016.035

Filter-INC : Handling effort-inconsistency in software effort estimation datasets. / Phannachitta, Passakorn; Keung, Jacky; Bennin, Kwabena Ebo; Monden, Akito; Matsumoto, Kenichi.

Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016. IEEE Computer Society, 2017. p. 185-192 7890587.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Phannachitta, P, Keung, J, Bennin, KE, Monden, A & Matsumoto, K 2017, Filter-INC: Handling effort-inconsistency in software effort estimation datasets. in Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016., 7890587, IEEE Computer Society, pp. 185-192, 23rd Asia-Pacific Software Engineering Conference, APSEC 2016, Hamilton, New Zealand, 12/6/16. https://doi.org/10.1109/APSEC.2016.035
Phannachitta P, Keung J, Bennin KE, Monden A, Matsumoto K. Filter-INC: Handling effort-inconsistency in software effort estimation datasets. In Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016. IEEE Computer Society. 2017. p. 185-192. 7890587 https://doi.org/10.1109/APSEC.2016.035
Phannachitta, Passakorn ; Keung, Jacky ; Bennin, Kwabena Ebo ; Monden, Akito ; Matsumoto, Kenichi. / Filter-INC : Handling effort-inconsistency in software effort estimation datasets. Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016. IEEE Computer Society, 2017. pp. 185-192
@inproceedings{d18c789e1efa4430a0c5e635969c7d48,
title = "Filter-INC: Handling effort-inconsistency in software effort estimation datasets",
abstract = "Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95{\%}. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.",
keywords = "Data preprocessing, Effort-inconsistency, Empirical software engineering, Software effort estimation",
author = "Passakorn Phannachitta and Jacky Keung and Bennin, {Kwabena Ebo} and Akito Monden and Kenichi Matsumoto",
year = "2017",
month = "3",
day = "30",
doi = "10.1109/APSEC.2016.035",
language = "English",
pages = "185--192",
booktitle = "Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - Filter-INC

T2 - Handling effort-inconsistency in software effort estimation datasets

AU - Phannachitta, Passakorn

AU - Keung, Jacky

AU - Bennin, Kwabena Ebo

AU - Monden, Akito

AU - Matsumoto, Kenichi

PY - 2017/3/30

Y1 - 2017/3/30

N2 - Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.

AB - Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.

KW - Data preprocessing

KW - Effort-inconsistency

KW - Empirical software engineering

KW - Software effort estimation

UR - http://www.scopus.com/inward/record.url?scp=85018504275&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018504275&partnerID=8YFLogxK

U2 - 10.1109/APSEC.2016.035

DO - 10.1109/APSEC.2016.035

M3 - Conference contribution

AN - SCOPUS:85018504275

SP - 185

EP - 192

BT - Proceedings - 23rd Asia-Pacific Software Engineering Conference, APSEC 2016

PB - IEEE Computer Society

ER -