Predicting author’s native language using abstracts of scholarly papers

Takahiro Baba, Kensuke Baba, Daisuke Ikeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Predicting author’s attributes is useful for understanding implicit meanings of documents. The target problem of this paper is predicting author’s native language for each document. The authors of this paper used surface-level features of documents for the problem and tried to clarify the practical tendencies of the writing style as word occurrences. They conducted a classification of the abstracts written in English of approximately 85,000 scholarly papers written in English or in Japanese. As a result of the experiment, the accuracy of the binary classification was 0.97, and they found that a number of distinctive phrases used in the classification were related to typical writing styles of Japanese.

Original languageEnglish
Title of host publicationFoundations of Intelligent Systems - 24th International Symposium, ISMIS 2018, Proceedings
EditorsNathalie Japkowicz, George A. Papadopoulos, Michelangelo Ceci, Zbigniew W. Ras, Jiming Liu
PublisherSpringer Verlag
Number of pages6
ISBN (Print)9783030018504
Publication statusPublished - 2018
Externally publishedYes
Event24th International Symposium on Methodologies for Intelligent Systems, ISMIS 2018 - Limassol, Cyprus
Duration: Oct 29 2018Oct 31 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11177 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference24th International Symposium on Methodologies for Intelligent Systems, ISMIS 2018


  • Document classification
  • Machine learning
  • Native language identification
  • Text analysis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Predicting author’s native language using abstracts of scholarly papers'. Together they form a unique fingerprint.

Cite this