Cost evaluation of CRF-based bibliography extraction from reference strings

Naomichi Kawakami, Manabu Ohta, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The effective use of digital libraries demands maintenance of bibliographic databases. Especially, the reference fields of academic papers are full of useful bibliographic information such as authors' names and paper titles. We, therefore, propose a method of automatically extracting bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary for training the CRF to achieve high extraction accuracies. As described herein, we propose the use of active sampling and pseudo-training data to reduce the amount of training data. Then we evaluate the associated training costs by experimentation.

Original languageEnglish
Title of host publicationThe Emergence of Digital Libraries - Research and Practices - 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014, Proceedings
PublisherSpringer Verlag
Pages268-278
Number of pages11
Volume8839
ISBN (Print)9783319128221
Publication statusPublished - 2014
Event16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014 - Chiang Mai, Thailand
Duration: Nov 5 2014Nov 7 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8839
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014
CountryThailand
CityChiang Mai
Period11/5/1411/7/14

    Fingerprint

Keywords

  • Active sampling
  • CRF
  • Information extraction
  • Pseudo-training data
  • Reference string

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Kawakami, N., Ohta, M., Takasu, A., & Adachi, J. (2014). Cost evaluation of CRF-based bibliography extraction from reference strings. In The Emergence of Digital Libraries - Research and Practices - 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014, Proceedings (Vol. 8839, pp. 268-278). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8839). Springer Verlag.