Fast plagiarism detection based on simple document similarity

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Plagiarism detection in a large number of documents requires efficient methods. This paper proposes a plagiarism detection algorithm based on approximate string matching to be specified in 'copy and paste'-type plagiarisms, and a speed improvement to an implementation of the algorithm. Most of the computations required in the algorithm are omitted by two kinds of approximations of the output used for plagiarism detection, while the decrease of accuracy caused by the approximations is acceptable. The effect of the improvement on the processing time and accuracy of the algorithm is evaluated by conducting experiments with a data set. The experimental results show that the improvement can reduce the processing time to approximately one-twentieth for a 6.4% decrease of the accuracy from those for the normal implementation of the algorithm.

Original languageEnglish
Title of host publication2017 12th International Conference on Digital Information Management, ICDIM 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages54-58
Number of pages5
ISBN (Electronic)9781538606643
DOIs
Publication statusPublished - Jun 28 2017
Externally publishedYes
Event12th International Conference on Digital Information Management, ICDIM 2017 - Fukuoka, Japan
Duration: Sep 12 2017Sep 14 2017

Publication series

Name2017 12th International Conference on Digital Information Management, ICDIM 2017
Volume2018-January

Other

Other12th International Conference on Digital Information Management, ICDIM 2017
CountryJapan
CityFukuoka
Period9/12/179/14/17

Keywords

  • approximate string matching
  • discrete Fourier transform
  • document similarity
  • Plagiarism detection
  • vector representation of words

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Fast plagiarism detection based on simple document similarity'. Together they form a unique fingerprint.

Cite this