Defect data analysis based on extended association rule mining

Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken Ichi Matsumoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Citations (Scopus)

Abstract

This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered. As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (1)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (3)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.

Original languageEnglish
Title of host publicationProceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007 - Minneapolis, MN, United States
Duration: May 20 2007May 26 2007

Other

OtherICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007
CountryUnited States
CityMinneapolis, MN
Period5/20/075/26/07

Fingerprint

Association rules
Defects
Testing
Information systems
Statistics

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Cite this

Morisaki, S., Monden, A., Matsumura, T., Tamada, H., & Matsumoto, K. I. (2007). Defect data analysis based on extended association rule mining. In Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007 [4228640] https://doi.org/10.1109/MSR.2007.5

Defect data analysis based on extended association rule mining. / Morisaki, Shuji; Monden, Akito; Matsumura, Tomoko; Tamada, Haruaki; Matsumoto, Ken Ichi.

Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007. 2007. 4228640.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Morisaki, S, Monden, A, Matsumura, T, Tamada, H & Matsumoto, KI 2007, Defect data analysis based on extended association rule mining. in Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007., 4228640, ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007, Minneapolis, MN, United States, 5/20/07. https://doi.org/10.1109/MSR.2007.5
Morisaki S, Monden A, Matsumura T, Tamada H, Matsumoto KI. Defect data analysis based on extended association rule mining. In Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007. 2007. 4228640 https://doi.org/10.1109/MSR.2007.5
Morisaki, Shuji ; Monden, Akito ; Matsumura, Tomoko ; Tamada, Haruaki ; Matsumoto, Ken Ichi. / Defect data analysis based on extended association rule mining. Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007. 2007.
@inproceedings{d3efd76680e547e486bc717a3fa83b0d,
title = "Defect data analysis based on extended association rule mining",
abstract = "This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered. As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (1)Defects detected in coding/unit testing were easily corrected (less than 7{\%} of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (3)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.",
author = "Shuji Morisaki and Akito Monden and Tomoko Matsumura and Haruaki Tamada and Matsumoto, {Ken Ichi}",
year = "2007",
doi = "10.1109/MSR.2007.5",
language = "English",
isbn = "076952950X",
booktitle = "Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007",

}

TY - GEN

T1 - Defect data analysis based on extended association rule mining

AU - Morisaki, Shuji

AU - Monden, Akito

AU - Matsumura, Tomoko

AU - Tamada, Haruaki

AU - Matsumoto, Ken Ichi

PY - 2007

Y1 - 2007

N2 - This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered. As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (1)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (3)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.

AB - This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered. As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (1)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (3)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.

UR - http://www.scopus.com/inward/record.url?scp=34548725593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548725593&partnerID=8YFLogxK

U2 - 10.1109/MSR.2007.5

DO - 10.1109/MSR.2007.5

M3 - Conference contribution

SN - 076952950X

SN - 9780769529509

BT - Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007

ER -