CRF-based bibliography extraction from reference strings using a small amount of training data

Daiki Namikoshi, Manabu Ohta, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.

Original languageEnglish
Title of host publication2017 12th International Conference on Digital Information Management, ICDIM 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages59-64
Number of pages6
Volume2018-January
ISBN (Electronic)9781538606643
DOIs
Publication statusPublished - Jan 2 2018
Event12th International Conference on Digital Information Management, ICDIM 2017 - Fukuoka, Japan
Duration: Sep 12 2017Sep 14 2017

Other

Other12th International Conference on Digital Information Management, ICDIM 2017
CountryJapan
CityFukuoka
Period9/12/179/14/17

Fingerprint

Bibliographies
Digital libraries
Costs
Experiments
Conditional random fields
Problem-Based Learning
Data base
Transfer learning
Experiment
Active learning

Keywords

  • active learning
  • bibliography extraction
  • confidence measure
  • CRF
  • transfer learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Cite this

Namikoshi, D., Ohta, M., Takasu, A., & Adachi, J. (2018). CRF-based bibliography extraction from reference strings using a small amount of training data. In 2017 12th International Conference on Digital Information Management, ICDIM 2017 (Vol. 2018-January, pp. 59-64). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDIM.2017.8244665

CRF-based bibliography extraction from reference strings using a small amount of training data. / Namikoshi, Daiki; Ohta, Manabu; Takasu, Atsuhiro; Adachi, Jun.

2017 12th International Conference on Digital Information Management, ICDIM 2017. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 59-64.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Namikoshi, D, Ohta, M, Takasu, A & Adachi, J 2018, CRF-based bibliography extraction from reference strings using a small amount of training data. in 2017 12th International Conference on Digital Information Management, ICDIM 2017. vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 59-64, 12th International Conference on Digital Information Management, ICDIM 2017, Fukuoka, Japan, 9/12/17. https://doi.org/10.1109/ICDIM.2017.8244665
Namikoshi D, Ohta M, Takasu A, Adachi J. CRF-based bibliography extraction from reference strings using a small amount of training data. In 2017 12th International Conference on Digital Information Management, ICDIM 2017. Vol. 2018-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 59-64 https://doi.org/10.1109/ICDIM.2017.8244665
Namikoshi, Daiki ; Ohta, Manabu ; Takasu, Atsuhiro ; Adachi, Jun. / CRF-based bibliography extraction from reference strings using a small amount of training data. 2017 12th International Conference on Digital Information Management, ICDIM 2017. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 59-64
@inproceedings{45bd13ce172642408dbbd93465a150ec,
title = "CRF-based bibliography extraction from reference strings using a small amount of training data",
abstract = "The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.",
keywords = "active learning, bibliography extraction, confidence measure, CRF, transfer learning",
author = "Daiki Namikoshi and Manabu Ohta and Atsuhiro Takasu and Jun Adachi",
year = "2018",
month = "1",
day = "2",
doi = "10.1109/ICDIM.2017.8244665",
language = "English",
volume = "2018-January",
pages = "59--64",
booktitle = "2017 12th International Conference on Digital Information Management, ICDIM 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - CRF-based bibliography extraction from reference strings using a small amount of training data

AU - Namikoshi, Daiki

AU - Ohta, Manabu

AU - Takasu, Atsuhiro

AU - Adachi, Jun

PY - 2018/1/2

Y1 - 2018/1/2

N2 - The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.

AB - The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.

KW - active learning

KW - bibliography extraction

KW - confidence measure

KW - CRF

KW - transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85049371656&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049371656&partnerID=8YFLogxK

U2 - 10.1109/ICDIM.2017.8244665

DO - 10.1109/ICDIM.2017.8244665

M3 - Conference contribution

AN - SCOPUS:85049371656

VL - 2018-January

SP - 59

EP - 64

BT - 2017 12th International Conference on Digital Information Management, ICDIM 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -