Project

Automatic Plagiarism Detection

Keywords: Plagiarism, Copy Detection, Paraphrasing, N-grams, WordNet, Text-preprocessing, Multilingual Processing, Latent Semantic Analysis, Singular Value Decomposition
Description: This project focuses on the particular field of automatic plagiarism detection in written text. The overlapping parts of documents are identified on the basis of common phrases to be represented by word N-grams. We employ Latent Semantic Analysis as a mathematical framework to infer the associations among the N-grams that are contained in the examined text documents. Moreover, this project deals with the issues of Text Pre-processing, Multilingual Processing, and Feature Selection.
Status: Finished


People on this project:


Zdeněk Češka


E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska

Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Related Downloads:


Publication

SVDPlag v1.0

Size:2 kB
Desc:This tool allows identifying cases of plagiarism in written text. This particular solution employs an advanced technique based on the Latent Semantic Analysis (LSA) framework to perform large statistics computations. For that purpose, Singular Value Decomposition (SVD) is used to infer the associations among the common N-grams contained in the examined documents. Moreover, this tool enables applying various text pre-processing techniques. This library has been developped in C# under the .NET Framework 3.5 which is required for runing as well as the 64-bit operating system. The supported architecture is x86-64. This tool employs Extreme Optimization Numerical Libraries for .NET version 3.5 64-bit. The older or 32-bit libraries are not supported.
Related:  Automatic Plagiarism Detection