An approach to estimating cited sentences in academic papers using Doc2vec

Shunsuke Tanabe, Atsuhiro Takasu, Manabu Ohta, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most academic authors refer to the literature when introducing their proposed methods and the data used in their experiments. These references can be very helpful when trying to understand a paper; however, some authors do not always state clearly the specific part of the referenced work they are referring the reader to and it can be quite labor-intensive to have to read the whole document to identify the relevant information. In this paper, we propose a method for estimating the appropriate parts of a referenced work as the “cited parts,” with the aim of reducing this burden. We first extract sentences in an academic paper that cites references to the literature as “citing sentences.” We then vectorize the citing sentences and all the sentences in the cited papers using doc2vec and estimate the most appropriate cited part as the sentence that has the most similar feature vector to that of the citing sentence. To evaluate the proposed method, we conducted experiments using English-language papers and a questionnaire survey that asked subjects to evaluate the appropriateness of the cited parts estimated by the method. The experiments showed that this approach’s success in estimating the appropriate parts of a cited paper as the cited parts depended on the citation intention of the citing sentences.

Original languageEnglish
Title of host publicationMEDES 2018 - 10th International Conference on Management of Digital EcoSystems
PublisherAssociation for Computing Machinery, Inc
Pages118-125
Number of pages8
ISBN (Electronic)9781450356220
DOIs
Publication statusPublished - Sep 25 2018
Event10th International Conference on Management of Digital EcoSystems, MEDES 2018 - Tokyo, Japan
Duration: Sep 25 2018Sep 28 2018

Other

Other10th International Conference on Management of Digital EcoSystems, MEDES 2018
CountryJapan
CityTokyo
Period9/25/189/28/18

Fingerprint

Experiments
Personnel

Keywords

  • Academic paper
  • Browsing support
  • Citation
  • Doc2vec
  • Reference

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Networks and Communications
  • Environmental Engineering

Cite this

Tanabe, S., Takasu, A., Ohta, M., & Adachi, J. (2018). An approach to estimating cited sentences in academic papers using Doc2vec. In MEDES 2018 - 10th International Conference on Management of Digital EcoSystems (pp. 118-125). Association for Computing Machinery, Inc. https://doi.org/10.1145/3281375.3281391

An approach to estimating cited sentences in academic papers using Doc2vec. / Tanabe, Shunsuke; Takasu, Atsuhiro; Ohta, Manabu; Adachi, Jun.

MEDES 2018 - 10th International Conference on Management of Digital EcoSystems. Association for Computing Machinery, Inc, 2018. p. 118-125.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tanabe, S, Takasu, A, Ohta, M & Adachi, J 2018, An approach to estimating cited sentences in academic papers using Doc2vec. in MEDES 2018 - 10th International Conference on Management of Digital EcoSystems. Association for Computing Machinery, Inc, pp. 118-125, 10th International Conference on Management of Digital EcoSystems, MEDES 2018, Tokyo, Japan, 9/25/18. https://doi.org/10.1145/3281375.3281391
Tanabe S, Takasu A, Ohta M, Adachi J. An approach to estimating cited sentences in academic papers using Doc2vec. In MEDES 2018 - 10th International Conference on Management of Digital EcoSystems. Association for Computing Machinery, Inc. 2018. p. 118-125 https://doi.org/10.1145/3281375.3281391
Tanabe, Shunsuke ; Takasu, Atsuhiro ; Ohta, Manabu ; Adachi, Jun. / An approach to estimating cited sentences in academic papers using Doc2vec. MEDES 2018 - 10th International Conference on Management of Digital EcoSystems. Association for Computing Machinery, Inc, 2018. pp. 118-125
@inproceedings{f6a93d555dab431bb54b84970161afc0,
title = "An approach to estimating cited sentences in academic papers using Doc2vec",
abstract = "Most academic authors refer to the literature when introducing their proposed methods and the data used in their experiments. These references can be very helpful when trying to understand a paper; however, some authors do not always state clearly the specific part of the referenced work they are referring the reader to and it can be quite labor-intensive to have to read the whole document to identify the relevant information. In this paper, we propose a method for estimating the appropriate parts of a referenced work as the “cited parts,” with the aim of reducing this burden. We first extract sentences in an academic paper that cites references to the literature as “citing sentences.” We then vectorize the citing sentences and all the sentences in the cited papers using doc2vec and estimate the most appropriate cited part as the sentence that has the most similar feature vector to that of the citing sentence. To evaluate the proposed method, we conducted experiments using English-language papers and a questionnaire survey that asked subjects to evaluate the appropriateness of the cited parts estimated by the method. The experiments showed that this approach’s success in estimating the appropriate parts of a cited paper as the cited parts depended on the citation intention of the citing sentences.",
keywords = "Academic paper, Browsing support, Citation, Doc2vec, Reference",
author = "Shunsuke Tanabe and Atsuhiro Takasu and Manabu Ohta and Jun Adachi",
year = "2018",
month = "9",
day = "25",
doi = "10.1145/3281375.3281391",
language = "English",
pages = "118--125",
booktitle = "MEDES 2018 - 10th International Conference on Management of Digital EcoSystems",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - An approach to estimating cited sentences in academic papers using Doc2vec

AU - Tanabe, Shunsuke

AU - Takasu, Atsuhiro

AU - Ohta, Manabu

AU - Adachi, Jun

PY - 2018/9/25

Y1 - 2018/9/25

N2 - Most academic authors refer to the literature when introducing their proposed methods and the data used in their experiments. These references can be very helpful when trying to understand a paper; however, some authors do not always state clearly the specific part of the referenced work they are referring the reader to and it can be quite labor-intensive to have to read the whole document to identify the relevant information. In this paper, we propose a method for estimating the appropriate parts of a referenced work as the “cited parts,” with the aim of reducing this burden. We first extract sentences in an academic paper that cites references to the literature as “citing sentences.” We then vectorize the citing sentences and all the sentences in the cited papers using doc2vec and estimate the most appropriate cited part as the sentence that has the most similar feature vector to that of the citing sentence. To evaluate the proposed method, we conducted experiments using English-language papers and a questionnaire survey that asked subjects to evaluate the appropriateness of the cited parts estimated by the method. The experiments showed that this approach’s success in estimating the appropriate parts of a cited paper as the cited parts depended on the citation intention of the citing sentences.

AB - Most academic authors refer to the literature when introducing their proposed methods and the data used in their experiments. These references can be very helpful when trying to understand a paper; however, some authors do not always state clearly the specific part of the referenced work they are referring the reader to and it can be quite labor-intensive to have to read the whole document to identify the relevant information. In this paper, we propose a method for estimating the appropriate parts of a referenced work as the “cited parts,” with the aim of reducing this burden. We first extract sentences in an academic paper that cites references to the literature as “citing sentences.” We then vectorize the citing sentences and all the sentences in the cited papers using doc2vec and estimate the most appropriate cited part as the sentence that has the most similar feature vector to that of the citing sentence. To evaluate the proposed method, we conducted experiments using English-language papers and a questionnaire survey that asked subjects to evaluate the appropriateness of the cited parts estimated by the method. The experiments showed that this approach’s success in estimating the appropriate parts of a cited paper as the cited parts depended on the citation intention of the citing sentences.

KW - Academic paper

KW - Browsing support

KW - Citation

KW - Doc2vec

KW - Reference

UR - http://www.scopus.com/inward/record.url?scp=85058649582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058649582&partnerID=8YFLogxK

U2 - 10.1145/3281375.3281391

DO - 10.1145/3281375.3281391

M3 - Conference contribution

SP - 118

EP - 125

BT - MEDES 2018 - 10th International Conference on Management of Digital EcoSystems

PB - Association for Computing Machinery, Inc

ER -