TY - GEN
T1 - Empirical evaluation of CRF F-based bibliography extraction from research papers
AU - Ohta, Manabu
AU - Inoue, Ryohei
AU - Takasu, Atsuhiro
N1 - Publisher Copyright:
© 2012 IADIS.
PY - 2012
Y1 - 2012
N2 - We proposed an automatic bibliography extraction method for research papers scanned with OCR markup. The method uses conditional random fields (CRF) to label serially OCRed text lines in the article title page as appropriate bibliographic element names. Although we achieved good extraction accuracies for some Japanese academic journals, extraction errors are inevitable. Therefore, this paper proposes three confidence measures for bibliography labeling to detect such extraction errors. This paper also reports an empirical evaluation of CRF-based page analysis for research papers on the basis not only of labeling accuracy but also of labeling error detection. We applied the three confidence measures to labeling three academic journals published in Japan. The experiments showed that the proposed confidence measures reasonably indicated the labeling accuracies and could be used for error detection.
AB - We proposed an automatic bibliography extraction method for research papers scanned with OCR markup. The method uses conditional random fields (CRF) to label serially OCRed text lines in the article title page as appropriate bibliographic element names. Although we achieved good extraction accuracies for some Japanese academic journals, extraction errors are inevitable. Therefore, this paper proposes three confidence measures for bibliography labeling to detect such extraction errors. This paper also reports an empirical evaluation of CRF-based page analysis for research papers on the basis not only of labeling accuracy but also of labeling error detection. We applied the three confidence measures to labeling three academic journals published in Japan. The experiments showed that the proposed confidence measures reasonably indicated the labeling accuracies and could be used for error detection.
KW - Bibliography Extraction
KW - Conditional Random Fields (CRF)
KW - Digital Library
KW - Error Detection
KW - OCR
UR - http://www.scopus.com/inward/record.url?scp=84869037046&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869037046&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84869037046
T3 - Proceedings of the IADIS International Conference Information Systems 2012, IS 2012
SP - 18
EP - 26
BT - Proceedings of the IADIS International Conference Information Systems 2012, IS 2012
A2 - Isaias, Pedro
A2 - Rodrigues, Luis
A2 - Nunes, Miguel Baptista
A2 - Powell, Philip
PB - IADIS
T2 - IADIS International Conference on Information Systems 2012, IS 2012
Y2 - 10 March 2012 through 12 March 2012
ER -