BioCaster: Detecting public health rumors with a Web-based text mining system

Nigel Collier, Son Doan, Ai Kawazoe, Reiko Matsuda Goodwin, Mike Conway, Yoshio Tateno, Quoc Hung Ngo, Dinh Dien, Asanee Kawtrakul, Koichi Takeuchi, Mika Shigematsu, Kiyosu Taniguchi

Research output: Contribution to journalArticle

145 Citations (Scopus)

Abstract

Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.

Original languageEnglish
Pages (from-to)2940-2941
Number of pages2
JournalBioinformatics
Volume24
Issue number24
DOIs
Publication statusPublished - Dec 2008

Fingerprint

Geographic Mapping
Data Mining
Text Mining
Public Health
Public health
Linguistics
Web-based
Disease Outbreaks
Ontology
Language
RSS
Electronic mail
Pathogens
Named Entity Recognition
Infectious Diseases
Electronic Mail
Gold
Coding
Classify
Higher Order

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability

Cite this

Collier, N., Doan, S., Kawazoe, A., Goodwin, R. M., Conway, M., Tateno, Y., ... Taniguchi, K. (2008). BioCaster: Detecting public health rumors with a Web-based text mining system. Bioinformatics, 24(24), 2940-2941. https://doi.org/10.1093/bioinformatics/btn534

BioCaster : Detecting public health rumors with a Web-based text mining system. / Collier, Nigel; Doan, Son; Kawazoe, Ai; Goodwin, Reiko Matsuda; Conway, Mike; Tateno, Yoshio; Ngo, Quoc Hung; Dien, Dinh; Kawtrakul, Asanee; Takeuchi, Koichi; Shigematsu, Mika; Taniguchi, Kiyosu.

In: Bioinformatics, Vol. 24, No. 24, 12.2008, p. 2940-2941.

Research output: Contribution to journalArticle

Collier, N, Doan, S, Kawazoe, A, Goodwin, RM, Conway, M, Tateno, Y, Ngo, QH, Dien, D, Kawtrakul, A, Takeuchi, K, Shigematsu, M & Taniguchi, K 2008, 'BioCaster: Detecting public health rumors with a Web-based text mining system', Bioinformatics, vol. 24, no. 24, pp. 2940-2941. https://doi.org/10.1093/bioinformatics/btn534
Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y et al. BioCaster: Detecting public health rumors with a Web-based text mining system. Bioinformatics. 2008 Dec;24(24):2940-2941. https://doi.org/10.1093/bioinformatics/btn534
Collier, Nigel ; Doan, Son ; Kawazoe, Ai ; Goodwin, Reiko Matsuda ; Conway, Mike ; Tateno, Yoshio ; Ngo, Quoc Hung ; Dien, Dinh ; Kawtrakul, Asanee ; Takeuchi, Koichi ; Shigematsu, Mika ; Taniguchi, Kiyosu. / BioCaster : Detecting public health rumors with a Web-based text mining system. In: Bioinformatics. 2008 ; Vol. 24, No. 24. pp. 2940-2941.
@article{23607dc629414f948b0b183848f567cb,
title = "BioCaster: Detecting public health rumors with a Web-based text mining system",
abstract = "Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.",
author = "Nigel Collier and Son Doan and Ai Kawazoe and Goodwin, {Reiko Matsuda} and Mike Conway and Yoshio Tateno and Ngo, {Quoc Hung} and Dinh Dien and Asanee Kawtrakul and Koichi Takeuchi and Mika Shigematsu and Kiyosu Taniguchi",
year = "2008",
month = "12",
doi = "10.1093/bioinformatics/btn534",
language = "English",
volume = "24",
pages = "2940--2941",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "24",

}

TY - JOUR

T1 - BioCaster

T2 - Detecting public health rumors with a Web-based text mining system

AU - Collier, Nigel

AU - Doan, Son

AU - Kawazoe, Ai

AU - Goodwin, Reiko Matsuda

AU - Conway, Mike

AU - Tateno, Yoshio

AU - Ngo, Quoc Hung

AU - Dien, Dinh

AU - Kawtrakul, Asanee

AU - Takeuchi, Koichi

AU - Shigematsu, Mika

AU - Taniguchi, Kiyosu

PY - 2008/12

Y1 - 2008/12

N2 - Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.

AB - Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.

UR - http://www.scopus.com/inward/record.url?scp=57249114504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57249114504&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btn534

DO - 10.1093/bioinformatics/btn534

M3 - Article

C2 - 18922806

AN - SCOPUS:57249114504

VL - 24

SP - 2940

EP - 2941

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 24

ER -