應用專利所屬分類號及分類階層架構之技術叢集研究

專利文件中之每一專利分類號，均代表著一個明確的技術領域分類代碼，本研究取用美國專利文件作為研究對象，並以美國專利分類號作為每篇專利文件之代表特徵，並利用該特徵值，將技術相似性高的專利叢集起來，以建立專利技術叢集。而每篇美國專利文件均具有一到多個美國專利分類號，因此每篇專利都可能具有多面向的技術觀點，可以一特徵值序列表示之。美國專利分類號體系是由多個階層式分類架構所組成，每個架構均代表一主要技術領域的集合體，專利技術叢集的法則，即在於以各專利文件所屬的分類號特徵值序列，來計算專利文件間之技術相似度，相似度的算法可單純考量分類號特徵在相同分類號階層架構裡的概念距離遠近而得，或是再納入另一考量因子，即隸屬相異分類號架構底下的分類號間的共篇程度來決定；至於技術叢集的方法，採用兩階段叢集法則，第一階段利用階層式叢集法，獲得適當的叢集數，第二階段再利用非階層式叢集法，獲得每一叢集的所屬專利技術內容。最後，以實際的案例分析，來探討不同的專利相似度計算考量因子，對於不同的專利技術特徵組成性質，是否會造成差異性的叢集結果。

關鍵字

專利分群；專利分類號；美國專利分類架構；專利相似度；技術叢集

並列摘要

Each classification code in the patent document represents a definite technical domain, this thesis took U.S. patent as research source and selected U.S. patent classification (USPC) code as the feature to represent a patent. According to those features, the technology cluster could be formed by the high similar patents. Each U.S. patent may have one or multiple technical viewpoints due to its amounts of USPC code, it can be represented by a feature list. U.S. classification system is composed by many USPC schedules, each USPC schedule represent a set of technical domain. The objective technology cluster was formed by the feature list similarities of patents, and the similarities could be measured by two factors, one was only considering the conceptual distance between two USPC codes under the same USPC schedule, another was considering the pair coupled rate of two USPC codes under different USPC schedules in addition. This thesis used two stage clustering algorithm to get the technology cluster, the first stage used hierarchical clustering algorithm to determine the number of clusters, the second stage used non-hierarchical clustering algorithm to get the members of each cluster. Finally, in cases study, discussed whether different patent similarity measures, could result to different technology clusters or not.

並列關鍵字

patent cluster ； patent classification ； USPC schedule ； patent similarity ； technology clustering

參考文獻

[1] S. Farbrizio, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34(1), pp. 1-47, 2002.

[2] C. Wei, Hu, P., and Dong, Y. X., "Managing Document Categories in E-Commerce Environments: An Evolution-Based Approach," European Journal of Information Systems, vol. 11(3), pp. 222-255, September 2002.

[3] M. Krier and F. Zacca, "Automatic Categorization Applications at the European Patent Office," World Patent Information, vol. 24, pp. 187-196, 2002.

[4] Y. Yang, J. G. Carbonell, R. Brown, T. Pierce, B. T. Archibald, and X. Liu, "Learning Approaches for Detecting and Tracking News Events," IEEE Intelligent System,Vol.14,No.3,pp, vol. 14(3), pp. 32-43, 1999.

[5] R. Sproat, C. Shih, W. Gale, and N. Chang, "A Stochastic Finite-State Word-Segmentation Algorithm for Chinese," Computational Linguistics, vol. 22(3), pp. 376-404, 1996.

國際替代計量

應用專利所屬分類號及分類階層架構之技術叢集研究

主題瀏覽