智識建構方法讓研究者瞭解學術領域議題的知識結構,透過當中的因素分析法,可以將學術領域議題分成若干個子議題,同時經由因素分析法產出皮爾森相關係數矩陣,能進一步繪製成智識結構圖,讓研究者得以視覺化方式瞭解知識結構。然而因素分析法假設分析議題資料為常態分配,因此忽略資料不符合常態分配的情況,以及因素分析法將領域議題分成若干個子議題時,是使用因素負荷量門檻值來過濾並將文章歸屬子議題,因此可能將重要資訊過濾掉,並且會有文章因素負荷量過低而無法判別歸屬哪個子議題的情況。而在使用皮爾森相關係數矩陣當作智識結構圖輸入時,皮爾森相關係數矩陣值域介於-1到1之間,因此會有負值相關係數轉換為正值距離問題。因此本研究使用期望值最大化演算法取代因素分析法,解決並改進上述現在通用之智識建構方法中的問題。 本研究利用自行開發的智識建構系統從微軟學術資料庫蒐集兩個議題,使用因素分析法產生出轉軸因素集群,同時擷取智識建構流程中產出共被引矩陣與因素負荷量矩陣,當作期望值最大化演算法輸入,產生出共被引EM集群與轉軸EM集群,而為了更加嚴謹評估智識建構方法分群結果,本研究亦使用文字探勘方法,以內文之餘弦相似度矩陣當作期望值最大化演算法輸入,產生出餘弦EM集群。最後透過相關的集群一致性方法(Jaccard Similarity、Rand Index、Kappa與Gwet’s),以及Jensen Shannon Divergence (JSD)之內文一致性方法評估集群的分群結果,同時利用Quadratic Assignment Procedure (QAP)之智識網絡一致性方法評估集群的智識結構圖,最後發現使用共被引矩陣為基礎,經由期望值最大化演算法,產生出的共被引EM集群的集群分群結果有較高的內文一致性,以及共被引集群與因素集群的智識網絡一致性最高,因此共被引EM集群的智識結構圖有較佳的結果。
Intellectual Structure (IS) is a method that has been widely applied in knowledge domain analysis and science mapping. Factor analysis is often applied in the Intellectual Structure methods to reduce the dimension of the data by ascribing multiple documents to single factors. Each factor generally represents one research theme. The Pearson correlation coefficient matrix, which is derived from the factor analysis, is used to construct the intellectual structure diagram that facilitates knowledge domain visualization. However, factor analysis assumes the input data is normally distributed, which is an untested premise. An article may not be assigned to any factor due to its low factor loading. In order to amend the issues of untested normal distribution premise and unassigned articles, we replace the factor analysis method with Expectation Maximization to carry out several experiments and compare the results. The results of our experiment provide empirical evidence that the EM-based intellectual structure method generates more coherent document clusters than the conventional one. With the supporting results on hand, we can improve the intellectual structure methodology by using Expectation Maximization to replace the conventionally used factor analysis.