鏈結導出的因素與內文衍伸的集群間之一致性檢定－以多文集驗證之實證研究

智識建構為研究學者瞭解某特定學術領域結構的方法之一。以往研究學者在使用此方法時會採用共引關係作為文獻之間的關聯，而共引關係即是根據文獻同時引用其他文獻所衍伸出的一種引用關係，流程上會將此關係作為建置智識建構的基底。隨後智識建構依據此共引網路進行因素分析，將資料集分成若干個因素。經由上述的方式將資料集分成多個因素後，理論上歸屬於各因素內的文獻內容之間應具備相似的特性。因此本研究目標為提供以鏈結關聯導向所產生的歸屬於同一因素之文集與以內文導向所產生的文獻集群之間一致性之實證研究。本研究首先在微軟學術資料庫中利用自行開發的智識建構系統進行兩筆資料集的蒐集與分析，並分別得到21與20個因素。本研究為了進行以因素為基底的群集與內文為基底的文獻群集兩者的比較，將智識建構所衍生出的所有文獻視為同一集群後，分別計算各文獻的還原詞彙之TF-IDF作為文獻向量的權重，並利用此向量作為計算餘弦相似度的依據，將文獻重新分群。最後將得出的分群群集與原先的因素群集相互找出對應，以便證實兩者之間的一致性。本研究在一致性部分利用Kappa分析驗證以鏈結導向的因素文獻集群與內文導向所產生的文獻群集間之一致性。據Kappa值結果顯示鏈結導向的因素與內文導向群集之間有介於普通至中等的一致性顯著水準。

關鍵字

智識建構；因素分析；文字探勘；一致性檢定

並列摘要

Intellectual Structure (IS) is a method developed by the information scientists for science mapping. The method utilizes the co-citation relationship, which is a derived relationship from the direct citation between documents, as the substrate to build the Intellectual Structure. The intellectual structure consists of clusters of documents ascribed to factors, which is derived by applying the factor analysis to the co-citation networks. Documents ascribed to a factor are generally related to a common research theme. As such, the contents of documents ascribed to a factor are theorized to be similar with each other. This study tried to provide the evidence that the link-based relatedness implied the content-based similarity. We utilize the home-grown Intellectual Structurer to analyze two datasets retrieved from the Microsoft Academic Search. Twenty one and twenty factors are derived from these two datasets, respectively. Documents ascribed to a factor are referred as a factor-based document cluster, with which the content-based document clusters are compared against. All documents in the intellectual structure are re-clustered based on their content similarity, which are derived from the cosine of their vector form encoded with documents’ TF-IDF weighted terms. The factor-dependent document clusters are then checked against the content-based cluster for congruity. We used Kappa coefficient to check the congruity between the link-derived factor document clusters and the content-based clusters. The Kappa coefficient indicates that there is a fair to moderate agreement between the factor-ascribed documents’ cluster and the content-derived documents’ cluster.

並列關鍵字

Intellectual Structure ； Factor Analysis ； Text Mining ； Kappa coefficient

參考文獻

吳明陽. (2009). 以共引為基礎應用因素轉軸之比較與驗證. (碩士), 國立臺北大學, 新北市.

許家榮. (2009). 探究書目耦合與共同引用之智識構圖與內容差異. (碩士), 國立臺北大學, 新北市.

Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: experimental comparison of five approaches. Journal of Informetrics, 3(1), 49-63.

Chen, C. (2003). Visualizing Evolving Networks: Minimum Spanning Trees versus Pathfinder Networks.

Chen, C. (2004). Searching for intellectual turning points: Progressive knowledge domain visualization (Vol. 101). Washington, DC, ETATS-UNIS: National Academy of Sciences.

被引用紀錄

王彥叡（2014）。應用潛在語意分析建構階層式概念集群之分群法〔碩士論文，國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-2811201414225026

國際替代計量

鏈結導出的因素與內文衍伸的集群間之一致性檢定－以多文集驗證之實證研究

未授權

主題瀏覽