  • 學位論文


A Study of Combining Automatic Document Classification-Example on Patent Document

指導教授 : 劉士豪


隨著大家對智慧財產權的重視,象徵企業技術能力與發展策略的專利,對於企業在競爭上的影響也愈來愈大,如何有效運用專利這個競爭利器,幫助企業獲得優勢,將是企業不得不關心的事。 過去的研究顯示了,利用自動化文件分類(Automatic Document Categorization)技術等資訊科技,能有效地幫助專利工程師對專利文件進行分類,然而,自動化文件分類技術有很多,各種不同的文件分類器(Classifier),具有不同的特性,在不同的情況下有不同的表現,分類效果並不穩定。 雖然分類器眾多,且各有不同特性,但何種情況適合何種分類器,到目前為止一直未有定論,因此,本研究嘗試以Nae Bayes、KNN及Rocchio三種分類器為基礎,結合過去文獻所提之投票機制及本研究所提之Sampling方法,以期改善利用自動化文件分類技術在專利文件上,分類效果時好時壞,不穩定的情形。 經過本研究的實驗,顯示投票機制與Sampling方法確實都能有效地改善,運用自動化文件分類技術在專利文件上,分類效果時好時壞的問題。然而,投票機制與Sampling方法,因為彼此本質上的不同,所以,在改善的效果及適用的情況下也有所不同。當各個單一分類器的分類效果相近時,利用投票機制可以獲得最有效的改善,反之採用Sampling方法,會有較顯著的效果。 改善原本利用自動化文件分類技術在專利文件上,分類效果不穩定的情況後,將能增加自動化文件分類技術的實用性,使其能真正有效地幫助專利工程師快速地完成專利分類,進行更深入的專利分析工作。


Because of the importance of intellectual property rights, the patent causes keen competiton among enterprises. For enterprises to use the patent to get the advantage of competition has become more importment. In the past, using Automatic Document Categorization can help the patent engineers to classify patent document more effectively. Howerer, there are many kinds of document classifiers, and each of them has its characteristics. Every classifier has different performance in different situations. The performace of classification is unstable. There are many classifiers, and every one has its characteristics. However, when should we use which classifier doesn’t have a final conclusion. In this article, we try to use Nae Bayes, KNN, and Rocchio classifiers with voting measure and the Sampling method to solve the unstable performance of classification. In the experiment of this article, the result shows that using voting and the Sampling method are effective. Voting measure and the Sampling method can make the performance of classification more stable. But voting measure and the Sampling method suit different cases. When the performance of each classifier is closer, the improvement of voting measure will be greater. On the other hand, the Sampling method is good choice To improve the unstable performance of classification can make the Automatic Document Categorization technology more useful for the patent engineers. The engineers can have more time to do more advanced analysis.


1. 李信穎,“專利地圖分析─電子商務軟體專利個案分析,” 中原大學資訊管理研究所碩士論文,2002。
2. 李駿翔,”應用資料探勘分類技術於專利分析之研究,”中原大學資訊管理研究所碩士論文,2003。
1. Andersen B.,“The evolution of technological trajectories 1890-1990,” Structural Change and Economic Dynamics 9, 1998, pp.5-34.
2. Belkin N. J. and Croft W. B., “Information filtering and information retrieval: two side of the same coin?,” Commun ACM 35, 12, 1992, pp.29-38.
4. Fall C.J., Torcsvari A., Benzineb K., Karetka G.,”Automated Categorization in the International Patent Classification,” ACM SIGIR Forum archive ,Volume 37 Issue 1 Spring 2003,pp.10-25.


