透過您的圖書館登入
IP:3.146.37.35
  • 學位論文

應用正規化共引關係改進智識建構議題辦識 - 以因素內文一致性驗證

The Application of Normalized Co-citation Metrics for Better Intellectual Structure Derivation – An Experimental Study using Multiple Document Corpuses

指導教授 : 陳宗天
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


目前可透過智識建構流程了解特定議題下的研究領域與方向,主要是透過共同被引用為基礎產生共被引矩陣(Co-citation Matrix),依據共被引矩陣歸納出文獻之間的關聯性,進而應用因素分析方法分成若干個因素,以供學者作為探討研究領域與領域相關內容的參考。但共被引矩陣會因為文獻發表時間的先後差別而造成較新的文獻引用與共被引的次數較低,而致使因素分析的結果偏重於較早的文獻。因此本研究將正規化後的共被引矩陣運用於智識建構系統的分析中與原始未處理的共被引矩陣比較,將正規化後的共被引矩陣所導出的因素,與原始共被引矩陣所產生的因素做集群相似度測量,找出對應的因素,並蒐集各因素的文獻,最後將兩者做內文一致性檢定。 本研究使用自行開發的智識建構系統從微軟學術資料庫(Microsoft Academic)中進行兩種研究議題的資料蒐集,經過門檻值的過濾後,從原始矩陣中各自得到20個因素。本研究為了進行以因素與因素之間的一致性比較,先將由共被引矩陣與正規化的共被引矩陣所產生的兩種因素集群透過Jaccard相似度測量方法,產出一個Jaccard矩陣,將相似度較高的因素做配對,最後進一步使用特徵選取的方法取得因素內的所有文獻的特徵詞資訊,將同時出現於兩個因素的特徵詞擷取下來,使用Jensen-Shannon Divergence計算出文獻間的距離,做為比較兩者內文一致性的評估指標,最後從內文一致性比較結果與對因素分析所產生的文獻集的內容作判讀的結果來探討,正規化共被引矩陣皆優於原始共被引矩陣。

並列摘要


Intellectual Structure is a method developed by information scientists that facilitates scholars to decipher research themes from the document corpus of a research field. The method takes the co-citation relationships between objects, such as documents and authors, as the substrate to build an intellectual structure. The foundation of the intellectual structure constructing method is the co-citation analysis. Co-citation is an induced relationship derived from the action of citation; two articles are co-cited if there exists another article that cites both of them. Co-citation analysis relates bibliographic data based on co-citation strengths, which are usually represented by the raw co-citation counts between documents. The co-citation relationships between objects are conveniently abstracted into a co-citation matrix, which is inputted to the factor analysis procedure that combines sets of co-cited documents into research themes (factors). The main problem of using the raw co-citation counts as the relatedness measurement is that it tends to emphasize the older literature since they received more citation counts, which results higher co-citation counts. Several normalized co-citation metrics are proposed to remedy the problem posed by the raw co-citation that overly emphasizes older literatures. In order to check if the normalized one is better than the raw count, we constructed the intellectual structures from two document corpus using the raw co-citation counts, and compared with same structures that are built from the a normalized co-citation measurement (CC-Cosine). The intellectual structures derived from the normalized co-citation metrics appear to be superior in two aspects – more recent literatures in the factors and higher textual coherence.

參考文獻


Aslam, Javed A, & Pavlu, Virgil. (2007). Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions: Springer.
Bassecoulard, Elise, & Zitt, Michel. (1999). Indicators in a research institute: A multi-level classification of scientific journals. Scientometrics, 44(3), 323-345.
Boyack, Kevin W, & Klavans, Richard. (2010). Co‐citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404.
Boyack, Kevin W, Klavans, Richard, & Börner, Katy. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351-374.
Cha, Sung-Hyuk. (2007). Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2).

延伸閱讀