A fast algorithm for combinatorial hotspot mining based on spatial scan statistic

Shin ichi Minato, Jun Kawahara, Fumio Ishioka, Masahiro Mizuta, Koji Kurihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It is a popular and classical problem to detect a hotspot cluster from a statistical data which is partitioned by geographical regions such as prefectures or cities. Spatial scan statistic is a standard measure of likelihood ratio which has been widely used for testing hotspot clusters. In this work, we propose a very fast algorithm to enumerate all combinatorial regions which are more significant than a given threshold value. Our algorithm features the fast exploration by pruning the search space based on the partial monotonicity of the spatial scan statistic. Experimental results for a nation-wide 47 prefectures dataset show that our method generates the highest-ranked hotspot cluster in a time a million or more times faster than the previous naive search method. Our method works practically for a dataset with several hundreds of regions, and it will drastically accelerate hotspot analysis in various fields.

Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining, SDM 2019
PublisherSociety for Industrial and Applied Mathematics Publications
Pages91-99
Number of pages9
ISBN (Electronic)9781611975673
Publication statusPublished - Jan 1 2019
Event19th SIAM International Conference on Data Mining, SDM 2019 - Calgary, Canada
Duration: May 2 2019May 4 2019

Publication series

NameSIAM International Conference on Data Mining, SDM 2019

Conference

Conference19th SIAM International Conference on Data Mining, SDM 2019
CountryCanada
CityCalgary
Period5/2/195/4/19

Fingerprint

Statistics
Geographical regions
Testing

Keywords

  • Combinatorial problem
  • Data mining
  • Enumeration algorithm
  • Hotspot detection
  • Scan statistic

ASJC Scopus subject areas

  • Software

Cite this

Minato, S. I., Kawahara, J., Ishioka, F., Mizuta, M., & Kurihara, K. (2019). A fast algorithm for combinatorial hotspot mining based on spatial scan statistic. In SIAM International Conference on Data Mining, SDM 2019 (pp. 91-99). (SIAM International Conference on Data Mining, SDM 2019). Society for Industrial and Applied Mathematics Publications.

A fast algorithm for combinatorial hotspot mining based on spatial scan statistic. / Minato, Shin ichi; Kawahara, Jun; Ishioka, Fumio; Mizuta, Masahiro; Kurihara, Koji.

SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications, 2019. p. 91-99 (SIAM International Conference on Data Mining, SDM 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Minato, SI, Kawahara, J, Ishioka, F, Mizuta, M & Kurihara, K 2019, A fast algorithm for combinatorial hotspot mining based on spatial scan statistic. in SIAM International Conference on Data Mining, SDM 2019. SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics Publications, pp. 91-99, 19th SIAM International Conference on Data Mining, SDM 2019, Calgary, Canada, 5/2/19.
Minato SI, Kawahara J, Ishioka F, Mizuta M, Kurihara K. A fast algorithm for combinatorial hotspot mining based on spatial scan statistic. In SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications. 2019. p. 91-99. (SIAM International Conference on Data Mining, SDM 2019).
Minato, Shin ichi ; Kawahara, Jun ; Ishioka, Fumio ; Mizuta, Masahiro ; Kurihara, Koji. / A fast algorithm for combinatorial hotspot mining based on spatial scan statistic. SIAM International Conference on Data Mining, SDM 2019. Society for Industrial and Applied Mathematics Publications, 2019. pp. 91-99 (SIAM International Conference on Data Mining, SDM 2019).
@inproceedings{867a83162c2640229ab1e0fb7a7a444a,
title = "A fast algorithm for combinatorial hotspot mining based on spatial scan statistic",
abstract = "It is a popular and classical problem to detect a hotspot cluster from a statistical data which is partitioned by geographical regions such as prefectures or cities. Spatial scan statistic is a standard measure of likelihood ratio which has been widely used for testing hotspot clusters. In this work, we propose a very fast algorithm to enumerate all combinatorial regions which are more significant than a given threshold value. Our algorithm features the fast exploration by pruning the search space based on the partial monotonicity of the spatial scan statistic. Experimental results for a nation-wide 47 prefectures dataset show that our method generates the highest-ranked hotspot cluster in a time a million or more times faster than the previous naive search method. Our method works practically for a dataset with several hundreds of regions, and it will drastically accelerate hotspot analysis in various fields.",
keywords = "Combinatorial problem, Data mining, Enumeration algorithm, Hotspot detection, Scan statistic",
author = "Minato, {Shin ichi} and Jun Kawahara and Fumio Ishioka and Masahiro Mizuta and Koji Kurihara",
year = "2019",
month = "1",
day = "1",
language = "English",
series = "SIAM International Conference on Data Mining, SDM 2019",
publisher = "Society for Industrial and Applied Mathematics Publications",
pages = "91--99",
booktitle = "SIAM International Conference on Data Mining, SDM 2019",
address = "United States",

}

TY - GEN

T1 - A fast algorithm for combinatorial hotspot mining based on spatial scan statistic

AU - Minato, Shin ichi

AU - Kawahara, Jun

AU - Ishioka, Fumio

AU - Mizuta, Masahiro

AU - Kurihara, Koji

PY - 2019/1/1

Y1 - 2019/1/1

N2 - It is a popular and classical problem to detect a hotspot cluster from a statistical data which is partitioned by geographical regions such as prefectures or cities. Spatial scan statistic is a standard measure of likelihood ratio which has been widely used for testing hotspot clusters. In this work, we propose a very fast algorithm to enumerate all combinatorial regions which are more significant than a given threshold value. Our algorithm features the fast exploration by pruning the search space based on the partial monotonicity of the spatial scan statistic. Experimental results for a nation-wide 47 prefectures dataset show that our method generates the highest-ranked hotspot cluster in a time a million or more times faster than the previous naive search method. Our method works practically for a dataset with several hundreds of regions, and it will drastically accelerate hotspot analysis in various fields.

AB - It is a popular and classical problem to detect a hotspot cluster from a statistical data which is partitioned by geographical regions such as prefectures or cities. Spatial scan statistic is a standard measure of likelihood ratio which has been widely used for testing hotspot clusters. In this work, we propose a very fast algorithm to enumerate all combinatorial regions which are more significant than a given threshold value. Our algorithm features the fast exploration by pruning the search space based on the partial monotonicity of the spatial scan statistic. Experimental results for a nation-wide 47 prefectures dataset show that our method generates the highest-ranked hotspot cluster in a time a million or more times faster than the previous naive search method. Our method works practically for a dataset with several hundreds of regions, and it will drastically accelerate hotspot analysis in various fields.

KW - Combinatorial problem

KW - Data mining

KW - Enumeration algorithm

KW - Hotspot detection

KW - Scan statistic

UR - http://www.scopus.com/inward/record.url?scp=85066086672&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066086672&partnerID=8YFLogxK

M3 - Conference contribution

T3 - SIAM International Conference on Data Mining, SDM 2019

SP - 91

EP - 99

BT - SIAM International Conference on Data Mining, SDM 2019

PB - Society for Industrial and Applied Mathematics Publications

ER -