透過您的圖書館登入
IP:18.191.176.66
  • 學位論文

學生程式碼相似度之研究—以抄襲偵測之應用為例

Research on Detection of Similarity in Student Programs – with Application to Detection of Plagiarism

指導教授 : 魏世杰

摘要


程式的相似判斷不像文字那麼複雜,比起文字的語法來說,程式的文法更有 規則性。程式相似判斷的應用上有很多方面,在實際教學應用上,最常用來作抄 襲的檢測。但我們發現在可以參考範例的考試或作業下,可能因為修改自相同的 範例程式,因此雖然找出大量相似片段組,但有很多相似是因為參考範例本身而 相似,不具抄襲意義。另一方面,在資訊檢索領域有IDF (Inverse Document Frequency, 反轉文件頻率法)的概念,發生頻率高的片段較不具意義,發生頻率 低的片段較具意義。因此我們提出以IDF 為主的新方法,幫助我們找出發生頻率 低的相似片段組,視為較有抄襲可能性之片段。並用開放式(open book)的一次考 試和一次作業程式來作驗證。

並列摘要


Similarity detection on programs is simpler than on text documents. Compared to text documents, the grammar used in program languages is easier to define. As a result, more and more applications are developed to detect program similarity. One practical use of these appications is to detect plagiarism for educational purposes. In particular, when students have test or homework on programs where they can open books to consult the examples, they may copy from the example programs without much thinking and rewriting. In this case, we will find many similar code tiles that are copied from the same example but of little value in plagiarism detection. So in this paper based on the IDF (Inverse Document Frequency) concept from information retrieval ,we propose a new method to reduce the influence of high frequency code tiles, and compare with traditional non-IDF result using the datasets from an open-book program test and a homework.

參考文獻


1. P.Clough, “Plagiarism in Natural and Programming Languages: an Overview of
of Sheffield, July 2000.
Plagiarism Detection and Its Practical Implementation”, University of California,
Santa Barbara, May 5, 2002.
3. J.L. Donaldson, A.M. Lancaster and P.H. Sposato,” A Plagiarism Detection

延伸閱讀