TY - JOUR
T1 - Clustering Regions Based on Socio-Economic Factors Which Affected the Number of COVID-19 Cases in Java Island
AU - Rahardiantoro, Septian
AU - Sakamoto, Wataru
N1 - Publisher Copyright:
© Published under licence by IOP Publishing Ltd.
PY - 2021/4/19
Y1 - 2021/4/19
N2 - Around 60% of COVID-19 positive cases in Indonesia have occurred in Java Island. This study provides clustering adjacent regions (cities and regencies) in Java Island into some groups based on some socio-economic factors that are suspected to affect the COVID-19 infection rates (positive cases per 100,000 residents), which could be useful for decision making by government. The factors involved in this study are poverty percentage, Human Development Index (HDI), average of expenditure per month, and open unemployment rate. There are two steps in our data analysis: first, we determined the factors that affected the infection rate significantly by using lasso, and then we estimated region-specific effects of each significant factor by using generalized lasso. In the generalized lasso, two types of spatial structure were considered, namely, regions divided by province, and neighbourhood regions based on k-means clustering and Voronoi tessellation. The tuning parameter in both lasso and generalized lasso was selected by 5-folds cross-validation. Based on the first step, three variables were found to affect the infection rate significantly. Then in the second step, the three variables had spatially varying coefficients in the generalized lasso using regions divided by provinces. On the other hand, HDI provided spatially varying coefficient in the generalized lasso using region based on k-means clustering and Voronoi tessellation.
AB - Around 60% of COVID-19 positive cases in Indonesia have occurred in Java Island. This study provides clustering adjacent regions (cities and regencies) in Java Island into some groups based on some socio-economic factors that are suspected to affect the COVID-19 infection rates (positive cases per 100,000 residents), which could be useful for decision making by government. The factors involved in this study are poverty percentage, Human Development Index (HDI), average of expenditure per month, and open unemployment rate. There are two steps in our data analysis: first, we determined the factors that affected the infection rate significantly by using lasso, and then we estimated region-specific effects of each significant factor by using generalized lasso. In the generalized lasso, two types of spatial structure were considered, namely, regions divided by province, and neighbourhood regions based on k-means clustering and Voronoi tessellation. The tuning parameter in both lasso and generalized lasso was selected by 5-folds cross-validation. Based on the first step, three variables were found to affect the infection rate significantly. Then in the second step, the three variables had spatially varying coefficients in the generalized lasso using regions divided by provinces. On the other hand, HDI provided spatially varying coefficient in the generalized lasso using region based on k-means clustering and Voronoi tessellation.
UR - http://www.scopus.com/inward/record.url?scp=85104793323&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104793323&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/1863/1/012014
DO - 10.1088/1742-6596/1863/1/012014
M3 - Conference article
AN - SCOPUS:85104793323
SN - 1742-6588
VL - 1863
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012014
T2 - International Conference on Mathematics, Statistics and Data Science 2020, ICMSDS 2020
Y2 - 11 November 2020 through 12 November 2020
ER -