Comparison of outlier detection methods in fault-proneness models

Shinsuke Matsumoto, Yasutaka Kamei, Akito Monden, Ken Ichi Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

In this paper, we experimentally evaluated the effect of outlier detection methods to improve the prediction performance of fault-proneness models. Detected outliers were removed from a fit dataset before building a model. In the experiment, we compared three outlier detection methods (Mahalanobis outlier analysis (MOA), local outlier factor method (LOFM) and rule based modeling (RBM)) each applied to three well-known fault-proneness models (linear discriminant analysis (LDA), logistic regression analysis (LRA) and classification tree (CT)). As a result, MOA and RBM improved F1-values of all models (0.04 at minimum, 0.17 at maximum and 0.10 at mean) while improvements by LOFM were relatively small (-0.01 at minimum, 0.04 at maximum and 0.01 at mean).

Original languageEnglish
Title of host publicationProceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007
Pages461-463
Number of pages3
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007 - Madrid, Spain
Duration: Sep 20 2007Sep 21 2007

Other

Other1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007
CountrySpain
CityMadrid
Period9/20/079/21/07

Fingerprint

Discriminant analysis
Regression analysis
Logistics
Experiments

ASJC Scopus subject areas

  • Computer Science(all)
  • Software

Cite this

Matsumoto, S., Kamei, Y., Monden, A., & Matsumoto, K. I. (2007). Comparison of outlier detection methods in fault-proneness models. In Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007 (pp. 461-463). [4343779] https://doi.org/10.1109/ESEM.2007.34

Comparison of outlier detection methods in fault-proneness models. / Matsumoto, Shinsuke; Kamei, Yasutaka; Monden, Akito; Matsumoto, Ken Ichi.

Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007. 2007. p. 461-463 4343779.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Matsumoto, S, Kamei, Y, Monden, A & Matsumoto, KI 2007, Comparison of outlier detection methods in fault-proneness models. in Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007., 4343779, pp. 461-463, 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007, Madrid, Spain, 9/20/07. https://doi.org/10.1109/ESEM.2007.34
Matsumoto S, Kamei Y, Monden A, Matsumoto KI. Comparison of outlier detection methods in fault-proneness models. In Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007. 2007. p. 461-463. 4343779 https://doi.org/10.1109/ESEM.2007.34
Matsumoto, Shinsuke ; Kamei, Yasutaka ; Monden, Akito ; Matsumoto, Ken Ichi. / Comparison of outlier detection methods in fault-proneness models. Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007. 2007. pp. 461-463
@inproceedings{fa5cd2ac994b40d88112bea73bf5c545,
title = "Comparison of outlier detection methods in fault-proneness models",
abstract = "In this paper, we experimentally evaluated the effect of outlier detection methods to improve the prediction performance of fault-proneness models. Detected outliers were removed from a fit dataset before building a model. In the experiment, we compared three outlier detection methods (Mahalanobis outlier analysis (MOA), local outlier factor method (LOFM) and rule based modeling (RBM)) each applied to three well-known fault-proneness models (linear discriminant analysis (LDA), logistic regression analysis (LRA) and classification tree (CT)). As a result, MOA and RBM improved F1-values of all models (0.04 at minimum, 0.17 at maximum and 0.10 at mean) while improvements by LOFM were relatively small (-0.01 at minimum, 0.04 at maximum and 0.01 at mean).",
author = "Shinsuke Matsumoto and Yasutaka Kamei and Akito Monden and Matsumoto, {Ken Ichi}",
year = "2007",
doi = "10.1109/ESEM.2007.34",
language = "English",
isbn = "0769528864",
pages = "461--463",
booktitle = "Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007",

}

TY - GEN

T1 - Comparison of outlier detection methods in fault-proneness models

AU - Matsumoto, Shinsuke

AU - Kamei, Yasutaka

AU - Monden, Akito

AU - Matsumoto, Ken Ichi

PY - 2007

Y1 - 2007

N2 - In this paper, we experimentally evaluated the effect of outlier detection methods to improve the prediction performance of fault-proneness models. Detected outliers were removed from a fit dataset before building a model. In the experiment, we compared three outlier detection methods (Mahalanobis outlier analysis (MOA), local outlier factor method (LOFM) and rule based modeling (RBM)) each applied to three well-known fault-proneness models (linear discriminant analysis (LDA), logistic regression analysis (LRA) and classification tree (CT)). As a result, MOA and RBM improved F1-values of all models (0.04 at minimum, 0.17 at maximum and 0.10 at mean) while improvements by LOFM were relatively small (-0.01 at minimum, 0.04 at maximum and 0.01 at mean).

AB - In this paper, we experimentally evaluated the effect of outlier detection methods to improve the prediction performance of fault-proneness models. Detected outliers were removed from a fit dataset before building a model. In the experiment, we compared three outlier detection methods (Mahalanobis outlier analysis (MOA), local outlier factor method (LOFM) and rule based modeling (RBM)) each applied to three well-known fault-proneness models (linear discriminant analysis (LDA), logistic regression analysis (LRA) and classification tree (CT)). As a result, MOA and RBM improved F1-values of all models (0.04 at minimum, 0.17 at maximum and 0.10 at mean) while improvements by LOFM were relatively small (-0.01 at minimum, 0.04 at maximum and 0.01 at mean).

UR - http://www.scopus.com/inward/record.url?scp=47949090643&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47949090643&partnerID=8YFLogxK

U2 - 10.1109/ESEM.2007.34

DO - 10.1109/ESEM.2007.34

M3 - Conference contribution

SN - 0769528864

SN - 9780769528861

SP - 461

EP - 463

BT - Proceedings - 1st International Symposium on Empirical Software Engineering and Measurement, ESEM 2007

ER -