Unsupervised Bug Report Categorization Using Clustering and Labeling Algorithm

Nachai Limsettho, Hideaki Hata, Akito Monden, Kenichi Matsumoto

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Bug reports are one of the most crucial information sources for software engineering offering answers to many questions. Yet, getting these answers is not always easy; the information in bug reports is often implicit and some processes are required to extract the meaning of these reports. Most research in this area employ a supervised learning approach to classify bug reports so that required types of reports could be identified. However, this approach often requires an immense amount of time and effort, the resources that already too scarce in many projects. We aim to develop an automated framework that can categorize bug reports, according to their grammatical structure without the need for labeled data. Our framework categorizes bug reports according to their text similarity using topic modeling and a clustering algorithm. Each group of bug reports are labeled with our new clustering labeling algorithm specifically made for clusters in the topic space. Our framework is highly customizable with a modular approach and options to incorporate available background knowledge to improve its performance, while our cluster labeling approach make use of natural language process (NLP) chunking to create the representative labels. Our experiment results demonstrate that the performance of our unsupervised framework is comparable to a supervised learning one. We also show that our labeling process is capable of labeling each cluster with phrases that are representative for that cluster's characteristics. Our framework can be used to automatically categorize the incoming bug reports without any prior knowledge, as an automated labeling suggestion system or as a tool for obtaining knowledge about the structure of the bug report repository.

Original languageEnglish
Pages (from-to)1027-1053
Number of pages27
JournalInternational Journal of Software Engineering and Knowledge Engineering
Volume26
Issue number7
DOIs
Publication statusPublished - Sep 1 2016

Fingerprint

Labeling
Supervised learning
Clustering algorithms
Labels
Software engineering
Experiments

Keywords

  • Automated bug report categorization
  • cluster labeling
  • clustering
  • topic modeling

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence

Cite this

Unsupervised Bug Report Categorization Using Clustering and Labeling Algorithm. / Limsettho, Nachai; Hata, Hideaki; Monden, Akito; Matsumoto, Kenichi.

In: International Journal of Software Engineering and Knowledge Engineering, Vol. 26, No. 7, 01.09.2016, p. 1027-1053.

Research output: Contribution to journalArticle

@article{bf161e117273486ebb73e9b00063ef11,
title = "Unsupervised Bug Report Categorization Using Clustering and Labeling Algorithm",
abstract = "Bug reports are one of the most crucial information sources for software engineering offering answers to many questions. Yet, getting these answers is not always easy; the information in bug reports is often implicit and some processes are required to extract the meaning of these reports. Most research in this area employ a supervised learning approach to classify bug reports so that required types of reports could be identified. However, this approach often requires an immense amount of time and effort, the resources that already too scarce in many projects. We aim to develop an automated framework that can categorize bug reports, according to their grammatical structure without the need for labeled data. Our framework categorizes bug reports according to their text similarity using topic modeling and a clustering algorithm. Each group of bug reports are labeled with our new clustering labeling algorithm specifically made for clusters in the topic space. Our framework is highly customizable with a modular approach and options to incorporate available background knowledge to improve its performance, while our cluster labeling approach make use of natural language process (NLP) chunking to create the representative labels. Our experiment results demonstrate that the performance of our unsupervised framework is comparable to a supervised learning one. We also show that our labeling process is capable of labeling each cluster with phrases that are representative for that cluster's characteristics. Our framework can be used to automatically categorize the incoming bug reports without any prior knowledge, as an automated labeling suggestion system or as a tool for obtaining knowledge about the structure of the bug report repository.",
keywords = "Automated bug report categorization, cluster labeling, clustering, topic modeling",
author = "Nachai Limsettho and Hideaki Hata and Akito Monden and Kenichi Matsumoto",
year = "2016",
month = "9",
day = "1",
doi = "10.1142/S0218194016500352",
language = "English",
volume = "26",
pages = "1027--1053",
journal = "International Journal of Software Engineering and Knowledge Engineering",
issn = "0218-1940",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "7",

}

TY - JOUR

T1 - Unsupervised Bug Report Categorization Using Clustering and Labeling Algorithm

AU - Limsettho, Nachai

AU - Hata, Hideaki

AU - Monden, Akito

AU - Matsumoto, Kenichi

PY - 2016/9/1

Y1 - 2016/9/1

N2 - Bug reports are one of the most crucial information sources for software engineering offering answers to many questions. Yet, getting these answers is not always easy; the information in bug reports is often implicit and some processes are required to extract the meaning of these reports. Most research in this area employ a supervised learning approach to classify bug reports so that required types of reports could be identified. However, this approach often requires an immense amount of time and effort, the resources that already too scarce in many projects. We aim to develop an automated framework that can categorize bug reports, according to their grammatical structure without the need for labeled data. Our framework categorizes bug reports according to their text similarity using topic modeling and a clustering algorithm. Each group of bug reports are labeled with our new clustering labeling algorithm specifically made for clusters in the topic space. Our framework is highly customizable with a modular approach and options to incorporate available background knowledge to improve its performance, while our cluster labeling approach make use of natural language process (NLP) chunking to create the representative labels. Our experiment results demonstrate that the performance of our unsupervised framework is comparable to a supervised learning one. We also show that our labeling process is capable of labeling each cluster with phrases that are representative for that cluster's characteristics. Our framework can be used to automatically categorize the incoming bug reports without any prior knowledge, as an automated labeling suggestion system or as a tool for obtaining knowledge about the structure of the bug report repository.

AB - Bug reports are one of the most crucial information sources for software engineering offering answers to many questions. Yet, getting these answers is not always easy; the information in bug reports is often implicit and some processes are required to extract the meaning of these reports. Most research in this area employ a supervised learning approach to classify bug reports so that required types of reports could be identified. However, this approach often requires an immense amount of time and effort, the resources that already too scarce in many projects. We aim to develop an automated framework that can categorize bug reports, according to their grammatical structure without the need for labeled data. Our framework categorizes bug reports according to their text similarity using topic modeling and a clustering algorithm. Each group of bug reports are labeled with our new clustering labeling algorithm specifically made for clusters in the topic space. Our framework is highly customizable with a modular approach and options to incorporate available background knowledge to improve its performance, while our cluster labeling approach make use of natural language process (NLP) chunking to create the representative labels. Our experiment results demonstrate that the performance of our unsupervised framework is comparable to a supervised learning one. We also show that our labeling process is capable of labeling each cluster with phrases that are representative for that cluster's characteristics. Our framework can be used to automatically categorize the incoming bug reports without any prior knowledge, as an automated labeling suggestion system or as a tool for obtaining knowledge about the structure of the bug report repository.

KW - Automated bug report categorization

KW - cluster labeling

KW - clustering

KW - topic modeling

UR - http://www.scopus.com/inward/record.url?scp=84989220596&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989220596&partnerID=8YFLogxK

U2 - 10.1142/S0218194016500352

DO - 10.1142/S0218194016500352

M3 - Article

AN - SCOPUS:84989220596

VL - 26

SP - 1027

EP - 1053

JO - International Journal of Software Engineering and Knowledge Engineering

JF - International Journal of Software Engineering and Knowledge Engineering

SN - 0218-1940

IS - 7

ER -