An Improved Algorithm For Plagiarism Detection Using N-Gram And String Matching

ABSTRACT

To Plagiarise is to include another person’s work, idea, method or results without due acknowledgement. In this work, we developed an innovative algorithm to detect external plagiarism in documents, using a combination of Word-n-gram and string matching technique. The suspicious document was split into sentences and further split into word n-gram and each n-gram was used to perform a pattern match on a standard PAN-09 corpora. We discovered that the best plagiarism detection occurred when we used the word-2-gram and word-3-gram for heavily disguised plagiarised passages. The detected passages are then tagged according to the number of the retrieved texts in the documents. To check for the level of similarity, we score the plagiarised passages according to the number of the passages where the plagiarised texts occur. A high precision value range of 0.93 – 0.95 was achieved, but there were indications that with a low recall score, many documents were left out of the retrieval process. But the ones retrieved were very relevant to the subject being investigated. This approach is a novel approach to detecting plagiarised passages that have been re-worded and returned very close match to the passages that were plagiarised in the document. It was also found to be useful in monolingual plagiarism detection and mild to medium obfuscation but very weak in heavily obfuscated documents and cross lingual plagiarism detection unless a language translator was integrated to help translate.

Keywords: Plagiarism Detection, N-grams, String Matching, Algorithm

Overall Rating

0

5 Star
(0)
4 Star
(0)
3 Star
(0)
2 Star
(0)
1 Star
(0)
APA

CHIMEBUKA, E (2021). An Improved Algorithm For Plagiarism Detection Using N-Gram And String Matching. Afribary. Retrieved from https://track.afribary.com/works/an-improved-algorithm-for-plagiarism-detection-using-n-gram-and-string-matching

MLA 8th

CHIMEBUKA, EGONU "An Improved Algorithm For Plagiarism Detection Using N-Gram And String Matching" Afribary. Afribary, 07 Apr. 2021, https://track.afribary.com/works/an-improved-algorithm-for-plagiarism-detection-using-n-gram-and-string-matching. Accessed 23 Nov. 2024.

MLA7

CHIMEBUKA, EGONU . "An Improved Algorithm For Plagiarism Detection Using N-Gram And String Matching". Afribary, Afribary, 07 Apr. 2021. Web. 23 Nov. 2024. < https://track.afribary.com/works/an-improved-algorithm-for-plagiarism-detection-using-n-gram-and-string-matching >.

Chicago

CHIMEBUKA, EGONU . "An Improved Algorithm For Plagiarism Detection Using N-Gram And String Matching" Afribary (2021). Accessed November 23, 2024. https://track.afribary.com/works/an-improved-algorithm-for-plagiarism-detection-using-n-gram-and-string-matching