透過您的圖書館登入
IP:18.232.188.122
  • 學位論文

運用資料庫管理系統建構鉅量引文分析

A DBMS Based Co-citation Computation Platform

指導教授 : 陳宗天
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


運用文獻引用關係來透析學術領域相關聯知識,為經證實可行之方法。資訊科技之發展使大量學術文章數位化,相關領域學術資料蒐集不再是研究人員之議題,資料篩選反而成為重要之課題。運用引文分析洞察學術研究架構,過去已有建構一套自動分析方法,但由於計算機設備之限制,在分析前需要將所蒐集資料大量縮減,而資料縮減過程中僅使用最簡單之引用門檻值為判別條件,這樣的資料縮減方法可能導致重要資訊被排除,而使得無法窺探某學術領域之全貌。資料庫管理系統可快速運算且儲存大量資料,利用資料庫與引文分析工具之組合,應可改善原門檻值法排除重要資料之缺失。 由於系統軟體常因需求變化、技術基礎架構的更新、政府相關法令變更等種種原因,需將原系統重新建構或增修,因此在軟體工程的領域中,有關系統重建的相關議題變得更加重要,故本研究選擇以「反向工程」為探索主題。在探索智識結構工具之選用,我們使用葡萄牙里斯本大學所開發之WebLA為基礎工具,為驗證該工具有效性,研究過程中將先以現行門檻值工具探索「反向工程」智識結構,再以WebLA工具探索相同領域,比較並確認WebLA工具所建構之智識領域後,再利用資料庫管理系統建構出鉅量引文分析系統,並比較門檻值工具所濾除、鉅量引文分析系統所萃取出之重要因素項目。 本研究,利用Citeseer電子文獻資料庫,蒐集相關之文獻種子資料,並建構出Citeseer資料庫中「反向工程」之智識結構,在研究過程中,我們並比較其他鉅量矩陣計算之解決方案,並說明在計算能力與效能上,鉅量引文分析系統確實可建構出目前分析工具不足所無法建構出之智識因素。

並列摘要


The derivation of the co-citation matrix from an adjacent matrix is an important but time- consuming step in the intellectual construction process. Since a high dimensional matrix requires large memory, the traditional matrix computation methods are only amenable for smaller matrixes. The dimension of a large matrix is usually reduced to a smaller one by filtering out the less important matrix elements using citation or co-citation counts as the threshold values. When a matrix’s dimension is large, the memory of a computation platform is exhausted easily during the processing time. We therefore need a different approach to alleviate the memory demand for large matrixes computation. We took the co-citation calculation method of an open source package and modified the Hasmap-based matrix data structure into a relational DBMS based storage structure. The resultant DBMS based co-citation computation platform is successfully applied in deriving a 20,000 by 20,000 co-citation matrix. We derived the intellectual structure of the research field of “Software Engineering” and compared with the structure derived from the conventional threshold method and discussed the differences between these two structures. The merit of the structure derived from this new approach is inclusive because the structures are difficult to compare and each of the structure reveals different research themes.

參考文獻


2. 陳瑋,民國九十七年,鉅量引文資料分析,臺北大學資訊管理研究所碩士論文
2. Batagelj, V. and Marvar, A., ‘Pajek Analysis and Visualization of Large Networks’, 2003.
3. Belgin, M., Back, G., Ribbens, Calvin J., ’Pattern-based Sparse Matrix Representation for Memory-Eficient SMVM Kernels’, International Conference on Supercomputing, 2009
5. Boerner, K., Chen, C., et al. , ‘Visualizing knowledge domains’, Annual Review of Information Science & Technology., volume 37, 2003, 179-255
6. Chen, C. and Morris, S., ‘Visualizing evolving networks: Minimum spanning trees versus pathfinder networks’, IEEE Symposium on 19-21, Oct. 2003, 67-74.

被引用紀錄


曾瀛巧(2011)。以社會網絡分析方法探討商業與管理學科領域之引文關係〔碩士論文,國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-0308201116574900

延伸閱讀