Examination of effective features for CRF-based bibliography extraction from reference strings

Daiki Matsuoka, Manabu Ohta, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Metadata such as bibliographic information about documents are indispensable in the effective use of digital libraries. In particular, the reference fields of academic papers contain much bibliographic information such as authors' names and document titles. We are therefore developing a method for automatically extracting bibliographic information from reference strings using a conditional random field (CRF). The features used by the CRF determine the accuracy of this method. We examine effective features for accurate extraction by experimentally changing the features used. The experiments showed that lexical features were quite effective in accurate extraction and augmenting lexicons properly could lead to further improvements in accuracy.

Original languageEnglish
Title of host publication2016 11th International Conference on Digital Information Management, ICDIM 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages243-248
Number of pages6
ISBN (Electronic)9781509026401
DOIs
Publication statusPublished - 2016
Event2016 11th International Conference on Digital Information Management, ICDIM 2016 - Porto, Portugal
Duration: Sep 19 2016Sep 21 2016

Other

Other2016 11th International Conference on Digital Information Management, ICDIM 2016
CountryPortugal
CityPorto
Period9/19/169/21/16

Fingerprint

Bibliographies
Digital libraries
Metadata
Experiments
Conditional random fields

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems and Management

Cite this

Matsuoka, D., Ohta, M., Takasu, A., & Adachi, J. (2016). Examination of effective features for CRF-based bibliography extraction from reference strings. In 2016 11th International Conference on Digital Information Management, ICDIM 2016 (pp. 243-248). [7829774] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDIM.2016.7829774

Examination of effective features for CRF-based bibliography extraction from reference strings. / Matsuoka, Daiki; Ohta, Manabu; Takasu, Atsuhiro; Adachi, Jun.

2016 11th International Conference on Digital Information Management, ICDIM 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 243-248 7829774.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Matsuoka, D, Ohta, M, Takasu, A & Adachi, J 2016, Examination of effective features for CRF-based bibliography extraction from reference strings. in 2016 11th International Conference on Digital Information Management, ICDIM 2016., 7829774, Institute of Electrical and Electronics Engineers Inc., pp. 243-248, 2016 11th International Conference on Digital Information Management, ICDIM 2016, Porto, Portugal, 9/19/16. https://doi.org/10.1109/ICDIM.2016.7829774
Matsuoka D, Ohta M, Takasu A, Adachi J. Examination of effective features for CRF-based bibliography extraction from reference strings. In 2016 11th International Conference on Digital Information Management, ICDIM 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 243-248. 7829774 https://doi.org/10.1109/ICDIM.2016.7829774
Matsuoka, Daiki ; Ohta, Manabu ; Takasu, Atsuhiro ; Adachi, Jun. / Examination of effective features for CRF-based bibliography extraction from reference strings. 2016 11th International Conference on Digital Information Management, ICDIM 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 243-248
@inproceedings{17e3ca305a6f4665bbcd4338318aaed7,
title = "Examination of effective features for CRF-based bibliography extraction from reference strings",
abstract = "Metadata such as bibliographic information about documents are indispensable in the effective use of digital libraries. In particular, the reference fields of academic papers contain much bibliographic information such as authors' names and document titles. We are therefore developing a method for automatically extracting bibliographic information from reference strings using a conditional random field (CRF). The features used by the CRF determine the accuracy of this method. We examine effective features for accurate extraction by experimentally changing the features used. The experiments showed that lexical features were quite effective in accurate extraction and augmenting lexicons properly could lead to further improvements in accuracy.",
author = "Daiki Matsuoka and Manabu Ohta and Atsuhiro Takasu and Jun Adachi",
year = "2016",
doi = "10.1109/ICDIM.2016.7829774",
language = "English",
pages = "243--248",
booktitle = "2016 11th International Conference on Digital Information Management, ICDIM 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Examination of effective features for CRF-based bibliography extraction from reference strings

AU - Matsuoka, Daiki

AU - Ohta, Manabu

AU - Takasu, Atsuhiro

AU - Adachi, Jun

PY - 2016

Y1 - 2016

N2 - Metadata such as bibliographic information about documents are indispensable in the effective use of digital libraries. In particular, the reference fields of academic papers contain much bibliographic information such as authors' names and document titles. We are therefore developing a method for automatically extracting bibliographic information from reference strings using a conditional random field (CRF). The features used by the CRF determine the accuracy of this method. We examine effective features for accurate extraction by experimentally changing the features used. The experiments showed that lexical features were quite effective in accurate extraction and augmenting lexicons properly could lead to further improvements in accuracy.

AB - Metadata such as bibliographic information about documents are indispensable in the effective use of digital libraries. In particular, the reference fields of academic papers contain much bibliographic information such as authors' names and document titles. We are therefore developing a method for automatically extracting bibliographic information from reference strings using a conditional random field (CRF). The features used by the CRF determine the accuracy of this method. We examine effective features for accurate extraction by experimentally changing the features used. The experiments showed that lexical features were quite effective in accurate extraction and augmenting lexicons properly could lead to further improvements in accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85014316433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014316433&partnerID=8YFLogxK

U2 - 10.1109/ICDIM.2016.7829774

DO - 10.1109/ICDIM.2016.7829774

M3 - Conference contribution

SP - 243

EP - 248

BT - 2016 11th International Conference on Digital Information Management, ICDIM 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -