Error detection of CRF-based bibliography extraction from reference strings

Manabu Ohta, Daiki Arauchi, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We proposed a parsing method for reference strings usually listed at the end of research papers to extract important bibliographies such as a title from them. The method uses a conditional random field (CRF) to estimate the correct bibliographic label for each token in the token sequence generated from a reference string. Although we achieved reasonable parsing accuracies for a Japanese academic journal, errors are inevitable. Therefore, this paper proposes ways to increase confidence for CRF-based bibliography parsing to detect such parsing errors. This paper also reports an empirical evaluation of the proposed parsing on the basis not only of its accuracies but also of how easy it is to detect errors. The experiments showed that the proposed measures reasonably indicated parsing errors and could be used to improve the quality of extracted bibliographies at a moderate manual post-editing cost.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages229-238
Number of pages10
Volume7634 LNCS
DOIs
Publication statusPublished - 2012
Event14th International Conference on Asia-Pacific Digital Libraries, ICADL 2012 - Taipei, Taiwan, Province of China
Duration: Nov 12 2012Nov 15 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7634 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other14th International Conference on Asia-Pacific Digital Libraries, ICADL 2012
CountryTaiwan, Province of China
CityTaipei
Period11/12/1211/15/12

    Fingerprint

Keywords

  • bibliography extraction
  • conditional random field (CRF)
  • confidence measure
  • digital library
  • error detection
  • reference string

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Ohta, M., Arauchi, D., Takasu, A., & Adachi, J. (2012). Error detection of CRF-based bibliography extraction from reference strings. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7634 LNCS, pp. 229-238). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7634 LNCS). https://doi.org/10.1007/978-3-642-34752-8_29