TY - GEN
T1 - Defect data analysis based on extended association rule mining
AU - Morisaki, Shuji
AU - Monden, Akito
AU - Matsumura, Tomoko
AU - Tamada, Haruaki
AU - Matsumoto, Ken Ichi
PY - 2007
Y1 - 2007
N2 - This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered. As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (1)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (3)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.
AB - This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered. As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (1)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (3)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.
UR - http://www.scopus.com/inward/record.url?scp=34548725593&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548725593&partnerID=8YFLogxK
U2 - 10.1109/MSR.2007.5
DO - 10.1109/MSR.2007.5
M3 - Conference contribution
AN - SCOPUS:34548725593
SN - 076952950X
SN - 9780769529509
T3 - Proceedings - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007
BT - Proceedings - ICSE 2007 Workshops
T2 - ICSE 2007 Workshops: Fourth International Workshop on Mining Software Repositories, MSR 2007
Y2 - 20 May 2007 through 26 May 2007
ER -