CRF-based bibliography extraction from reference strings focusing on various token granularities

Manabu Ohta, Daiki Arauchi, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

The references of academic articles include important bibliographic elements such as authors' names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.

Original languageEnglish
Title of host publicationProceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012
Pages276-281
Number of pages6
DOIs
Publication statusPublished - 2012
Event10th IAPR International Workshop on Document Analysis Systems, DAS 2012 - Gold Coast, QLD, Australia
Duration: Mar 27 2012Mar 29 2012

Other

Other10th IAPR International Workshop on Document Analysis Systems, DAS 2012
CountryAustralia
CityGold Coast, QLD
Period3/27/123/29/12

Fingerprint

Bibliographies
Linguistics
Labels

Keywords

  • bibliography extraction
  • conditional random field (CRF)
  • delimiter
  • reference
  • tokenization

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Ohta, M., Arauchi, D., Takasu, A., & Adachi, J. (2012). CRF-based bibliography extraction from reference strings focusing on various token granularities. In Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012 (pp. 276-281). [6195378] https://doi.org/10.1109/DAS.2012.28

CRF-based bibliography extraction from reference strings focusing on various token granularities. / Ohta, Manabu; Arauchi, Daiki; Takasu, Atsuhiro; Adachi, Jun.

Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012. 2012. p. 276-281 6195378.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ohta, M, Arauchi, D, Takasu, A & Adachi, J 2012, CRF-based bibliography extraction from reference strings focusing on various token granularities. in Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012., 6195378, pp. 276-281, 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, Gold Coast, QLD, Australia, 3/27/12. https://doi.org/10.1109/DAS.2012.28
Ohta M, Arauchi D, Takasu A, Adachi J. CRF-based bibliography extraction from reference strings focusing on various token granularities. In Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012. 2012. p. 276-281. 6195378 https://doi.org/10.1109/DAS.2012.28
Ohta, Manabu ; Arauchi, Daiki ; Takasu, Atsuhiro ; Adachi, Jun. / CRF-based bibliography extraction from reference strings focusing on various token granularities. Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012. 2012. pp. 276-281
@inproceedings{e916d0133179444e9176165dcd800b1f,
title = "CRF-based bibliography extraction from reference strings focusing on various token granularities",
abstract = "The references of academic articles include important bibliographic elements such as authors' names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96{\%}.",
keywords = "bibliography extraction, conditional random field (CRF), delimiter, reference, tokenization",
author = "Manabu Ohta and Daiki Arauchi and Atsuhiro Takasu and Jun Adachi",
year = "2012",
doi = "10.1109/DAS.2012.28",
language = "English",
isbn = "9780769546612",
pages = "276--281",
booktitle = "Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012",

}

TY - GEN

T1 - CRF-based bibliography extraction from reference strings focusing on various token granularities

AU - Ohta, Manabu

AU - Arauchi, Daiki

AU - Takasu, Atsuhiro

AU - Adachi, Jun

PY - 2012

Y1 - 2012

N2 - The references of academic articles include important bibliographic elements such as authors' names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.

AB - The references of academic articles include important bibliographic elements such as authors' names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.

KW - bibliography extraction

KW - conditional random field (CRF)

KW - delimiter

KW - reference

KW - tokenization

UR - http://www.scopus.com/inward/record.url?scp=84862089684&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862089684&partnerID=8YFLogxK

U2 - 10.1109/DAS.2012.28

DO - 10.1109/DAS.2012.28

M3 - Conference contribution

SN - 9780769546612

SP - 276

EP - 281

BT - Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012

ER -