Empirical evaluation of CRF F-based bibliography extraction from research papers

Manabu Ohta, Ryohei Inoue, Atsuhiro Takasu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We proposed an automatic bibliography extraction method for research papers scanned with OCR markup. The method uses conditional random fields (CRF) to label serially OCRed text lines in the article title page as appropriate bibliographic element names. Although we achieved good extraction accuracies for some Japanese academic journals, extraction errors are inevitable. Therefore, this paper proposes three confidence measures for bibliography labeling to detect such extraction errors. This paper also reports an empirical evaluation of CRF-based page analysis for research papers on the basis not only of labeling accuracy but also of labeling error detection. We applied the three confidence measures to labeling three academic journals published in Japan. The experiments showed that the proposed confidence measures reasonably indicated the labeling accuracies and could be used for error detection.

Original languageEnglish
Title of host publicationProceedings of the IADIS International Conference Information Systems 2012, IS 2012
EditorsPedro Isaias, Luis Rodrigues, Miguel Baptista Nunes, Philip Powell
PublisherIADIS
Pages18-26
Number of pages9
ISBN (Electronic)9789728939687
Publication statusPublished - Jan 1 2012
EventIADIS International Conference on Information Systems 2012, IS 2012 - Berlin, Germany
Duration: Mar 10 2012Mar 12 2012

Publication series

NameProceedings of the IADIS International Conference Information Systems 2012, IS 2012

Other

OtherIADIS International Conference on Information Systems 2012, IS 2012
CountryGermany
CityBerlin
Period3/10/123/12/12

Keywords

  • Bibliography Extraction
  • Conditional Random Fields (CRF)
  • Digital Library
  • Error Detection
  • OCR

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint Dive into the research topics of 'Empirical evaluation of CRF F-based bibliography extraction from research papers'. Together they form a unique fingerprint.

  • Cite this

    Ohta, M., Inoue, R., & Takasu, A. (2012). Empirical evaluation of CRF F-based bibliography extraction from research papers. In P. Isaias, L. Rodrigues, M. B. Nunes, & P. Powell (Eds.), Proceedings of the IADIS International Conference Information Systems 2012, IS 2012 (pp. 18-26). (Proceedings of the IADIS International Conference Information Systems 2012, IS 2012). IADIS.