透過您的圖書館登入
IP:3.134.77.195
  • 學位論文

利用語彙、句法以及語義資訊偵測網路抄襲

Online Plagiarized Detection Through Exploiting Lexical, Syntactic, and Semantic Information

指導教授 : 林守德
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


傳統的抄襲偵測系統,許多只著重在文章的語彙統計特徵,至多再考慮句法結構,或利用 WordNet 來擷取文章的語義面訊息,且以離線的抄襲偵測居多;我們的系統則是將搜尋引擎整合進來,同時引進語彙、句法和語義這三個層面的結構特徵,抽取可疑文句組對裡,語彙的重覆率、重組率、連續性,單詞在句中所屬的詞性和片語標籤,以及透過 Latent Dirichlet Allocation (LDA) 所標記出的潛在主題來代表可能蘊含的語義資訊,如此結合這六個不同的抄襲偵測模型,再利用我們所設計的加權方法將六個模型的預測結果合併,是一個能自動偵測網路抄襲的線上系統。實驗結果顯示無論是英文還是中文的文章,我們的系統都能成功偵測出相當數量的可能抄襲來源,實驗數據上的表現也相較目前一些最先進的演算法還要來得突出。

關鍵字

抄襲偵測 語彙 句法 語義

並列摘要


In this paper, we introduce a framework that identifies sentence and document level online plagiarism by exploiting lexical, syntactic and semantic features, which includes duplication ngram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We also enhance plagiarism detection by establishing an ensemble framework to combine the prediction scores of each model. Experiments performed on English and Chinese corpora demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms.

並列關鍵字

Plagiarism Detection Lexical Syntactic Semantic

參考文獻


[1] David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:2003.
[2] Bear F. Braumoeller and Brian J. Gaines. 2001. Actions Do Speak Louder Than Words: Deterring Plagiarism with the Use of Plagiarism-Detection Software. In Political Science & Politics, 34(4):835-839.
[7] Yi-Ting Liu, Heng-Rui Zhang, Tai-Wei Chen, and Wei-Guang Teng. 2007. Extending Web Search for Online Plagiarism Detection. In Proceedings of the IEEE International Conference on Information Reuse and Integration, IRI 2007.
[14] Robert A. Wagner and Michael J. Fischer. 1975. The String-to-string correction problem. In Journal of the ACM, 21(1):168-173.
[15] Daniel R. White and Mike S. Joy. 2004. Sentence-Based Natural Language Plagiarism Detection. In Journal on Educational Resources in Computing JERIC Homepage archive, 4(4).

延伸閱讀