  • 學位論文


Automatic Keyword Extraction and Chinese-English Word Alignment based on Law Database

指導教授 : 黃仁竑


由於國際交流的頻繁與時代變遷,國際法比較之研究日益受到重視,但該領域目前遇到兩大挑戰。首先由於法律領域用語有別於日常用語,本身較為洗鍊與精簡,因此取得法律專有名詞成為重要課題。另一方面,由於各國國情與法律架構之差異,為避免交流上的歧異,法律用語之英文翻譯也是重要議題。但現行我國尚未有統一之法律關鍵詞庫以及翻譯,因此本研究利用自然語言處理之技術,針對專家提供之法律雙語平行語料庫,建立法規關鍵詞與翻譯之自動擷取系統。在關鍵字擷取部分,本研究利用支援向量機(SVM)針對字詞作分類,並利用法律文本的特性,建立特徵向量以改善取詞的品質。本研究亦提出二階對詞方法,在沒有雙語詞庫的情況下,僅以法律雙語平行語料庫為基礎,利用統計方法取得中文關鍵詞之翻譯詞。實驗結果顯示,利用SVM結合法規資訊之法規關鍵詞擷取系統,其精確率可達90%以上。此外,二階對詞方法的結果亦優於其他文獻提出如IBM Model等相關方法。


Research on international law comparisons receives a lot of attention due to the abundant multi-lingual laws and legal information on the Internet. However, it faces two difficult challenges. Firstly, legal term is the essential element for law comparison. However, legal terms are a small subset of ordinary words. Thus, how to extract legal terms from the context of laws is very challenging. Secondly, translation of legal terms from one language to another language is also critical to law comparisons. Unfortunately, in general, there is no bilingual dictionary for legal terms. However, word alignment without a bilingual dictionary is also a very challenging problem. In this thesis, we propose an automatic Chinese legal term extraction algorithm based on Support Vector Machine (SVM) technology. Specific features are defined based on the characteristics of legal terms selected by experts to improve the performance of SVM. A two-stage statistical-based word alignment algorithm is then proposed to translate a Chinese legal term to an English word. Experimental result shows that our SVM-based Chinese legal term extraction can achieve a precision rate more than 90%. Furthermore, the accuracy rate of the proposed two-stage algorithm is significantly higher than algorithms proposed in the literature, such as the IBM Model.


[4] C. S.Huang, Y. C.Chang, C. L.Liu and Y. H.Tseng, “Using Co-Occurrence Information to Improve Chinese-English Word Alignment in Translation Test Item for High School Student,” Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010), 2010.
[5] Y. H. Tseng, C. L. Liu, and Z. J. Chuang, "Automatic Term Pair Extraction from Bilingual Patent Corpus," Proceedings of the 21nd Conference on Computational Linguistics and Speech Processing (ROCLING 2009), 2009
[14] “中文分詞.” [Online] Available: http://baike.baidu.com/view/19109.htm.
[17] H. L. Chieu and H. T. Ng, “Named Entity Recognition: A Maximum Entropy Approach Using Global Information,” In Proceedings of COLING02, 2002.
[21] G. Wu and J. Chenggeng, “Regression Method Based on SVM Classification and Its Application in Influence Prediction of a Liberalism Case in International Law,”Proceedings of the 2012 International Conference of Modern Computer Science and Applications Advances in Intelligent Systems and Computing Volume 191, pp 363-368, 2013.
