透過您的圖書館登入
IP:3.15.168.130
  • 學位論文

利用多個相似度演算法實作程式碼抄襲系統

A Source Code Plagiarism Detection System Using Multiple Similarity Algorithms

指導教授 : 郭忠義

摘要


在教育上,作業抄襲一直是一個嚴重的問題。本研究的目的是提供一個可以客觀地偵測學生程式作業的抄襲系統,可以偵測的程式語言包括程序式語言、物件導向語言及一般文字。目前主要的程式碼抄襲研究大多使用單一的方法找出抄襲,然而每一個單一方法都有其缺陷,這些缺點可能會影響抄襲檢測的準確性和客觀性。因此,本研究提出利用三個不同的方法計算相似度,藉此能夠客觀地找出抄襲作業。研究的方法分別是文字分析方法、結構分析方法及屬性分析方法。在文字分析方法中,建立文字處理流程並應用 winnowing 演算法計算相似度。在結構分析方法中,增強 F(p)演算法用以轉換類別結構變成文字,再使用 winnowing演算法比對文字。此外,在屬性分析方法中,提出變數分析方法比較兩個程式變數的相似度,以及利用統計方法比較類別相似度。為了證明提出的方法有效性,利用十個常用的抄襲慣用方法製做測試檔案,用來對開發的系統、JPlag 及Wcopyfind 系統進行測試,結果發現提出的系統比較能夠有效找到抄襲。另外,利用三個資訊檢索方法進行系統評估,包括 Precision、Recall 和 F-Measure,結果顯示所提出的方法比 JPlag 系統更能有效地偵測抄襲。

關鍵字

抄襲 抄襲偵測 相似度

並列摘要


Plagiarism is a serious issue in education. This study proposes a system for detecting plagiarism in programming assignments of students objectively. Most previous researches used a single method to find out plagiarism programs. However, every single method has its drawbacks which might undermine the accuracy and objectivity. This research proposes three methods, namely text-based, structure-based, and attribute-based methods, to compute similarity for detecting plagiarism fairly. In text-based method, a process flow is built and winnowing algorithm is employed. In structure-based method, a proposed algorithm is used to translate the class structure to text and winnowing algorithm is employed to compare the translated text. Furthermore, in attribute-based method, a variable analysis method is proposed to analyze the variable similarity and a statistic method is used to measure the class similarity. To demonstrate the desired effectiveness of the proposed approach, ten benchmark files made according to the often used tricks are fed to the proposed system, JPlag and Wcopyfind, respectively. The result shows that the proposed system is more effective to find out the plagiarisms. Next, information retrieval measures, including Precision, Recall and F-Measure, are employed to evaluate system. The proposed system is more effective than JPlag in plagiarism detection.

並列關鍵字

Plagiarism, Plagiarism Detection Similarity

參考文獻


[1] Z. Durić and D. Gašević, “A Source Code Similarity System for Plagiarism Detection,” The Computer Journal, 2012.
[4] E. Flores, A. Barron-Cedeno, P. Rosso and L. Moreno, “Towards the Detection of Cross-Language Source Code Reuse,” Natural language processing and information systems lecture notes in computer science, 6716: 250-253, 2011.
[5] T.W.S. Chow and M.K.M. Rahman, “Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection,” IEEE Transactions on Neural Networks, 20: 1385-1402, 2009.
[7] G. Canfora, A. Cimitile, U. De Carlini and A. De Lucia, “An Extensible System for Source Code Analysis,” IEEE Transactions on Software Engineering, 24(9): 721-740, 1998.
[8] X. Chen, B. Francia and M. Li, “Shared Information and Program Plagiarism Detection,” IEEE Transactions on Information Theory, vol. 50, pp. 1545-1551, 2004.

延伸閱讀