CRF-based bibliography extraction from reference strings focusing on various token granularities

Manabu Ohta, Daiki Arauchi, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

The references of academic articles include important bibliographic elements such as authors' names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.

Original languageEnglish
Title of host publicationProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Pages276-281
Number of pages6
DOIs
Publication statusPublished - 2012
Event10th IAPR International Workshop on Document Analysis Systems, DAS 2012 - Gold Coast, QLD, Australia
Duration: Mar 27 2012Mar 29 2012

Other

Other10th IAPR International Workshop on Document Analysis Systems, DAS 2012
CountryAustralia
CityGold Coast, QLD
Period3/27/123/29/12

    Fingerprint

Keywords

  • bibliography extraction
  • conditional random field (CRF)
  • delimiter
  • reference
  • tokenization

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Ohta, M., Arauchi, D., Takasu, A., & Adachi, J. (2012). CRF-based bibliography extraction from reference strings focusing on various token granularities. In Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012 (pp. 276-281). [6195378] https://doi.org/10.1109/DAS.2012.28