透過您的圖書館登入
IP:3.147.89.47
  • 學位論文

以Normalized Google Distance辨識學名與別名-以化學物質為例

Identifying Alias of Chemical Material based on Normalized Google Distance

指導教授 : 許秉瑜
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


化學物質名稱複雜多變,很難用幾個關鍵字進行充分描述,而一般的使用者大多不具備化學相關的專業知識,在碰到不懂的化學物質名稱時,通常是透過各種搜尋引擎或線上化學辭典,使用者就可以輕易的取得大量的資訊。然而,在學術上所使用的化學物質名稱大多是從英文翻譯過來,同一化學物質往往會有許多不同的別名,造成在資訊檢索上出現問題。 近期研究提出NGD演算法,利用Google搜尋引擎即時回傳的搜尋結果數,計算兩個字詞之間的抽象距離,進而判斷出兩個字詞的語義相關程度。因此本研究提出兩種方法,辨識化學物質學名與別名的相關程度,”簡易法” 是以化學物質學名與別名,計算兩字詞間的NGD。”類別附加法” 是將化學物質學名加上其分類名稱後,和別名計算NGD。並算出在這兩個方法下,正確答案的平均距離為何,比較兩個方法何者較佳。實驗結果顯示”類別附加法” 以化學物質學名加上其分類名稱後,在Google搜尋引擎能取得較準確的搜尋結果數,使得正確答案的平均距離較短。

關鍵字

NGD 文字探勘

並列摘要


Since Names of Chemical material can be very complex and lay people mostly do not have relevant expertise in chemicals, they usually find related information through search engines or look up an online chemical dictionary. However, the chemical material names used in academy usually translated from English, and the same chemicals often have many different aliases. This English Chinese translation creates many problems when querying information for chemicals. Recent studies have proposed to use NGD to determine semantic relevance between two words. Therefore, this study proposes to find alias based on NGD with two methods, namely, novel and category affixed methods. The Experimental results show that the latter method can derive better result.

並列關鍵字

NGD Text mining

參考文獻


[4] Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, pp. 37-54, 1996.
[5] Feldman R., Dagan I., “Knowledge discovery in textual databases(KDT).” Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, 1995, AAAI Press, pp.112-117.
[12] Montebello, M.,“Information overload-an IR problem?”,String Processing and Information Retrieval: A South American Symposium, September 1998.
[13] P-I, Chen, and S.-J., Lin, “Automatic keyword prediction using Google similarity distance”,Expert Systems with Applications, 37(3), pp. 1928-1938., 2010.
[14] P.-I, Chen, and S.-J., Lin, “Word AdHoc Network: Using Google Core Distance to extract the most relevant information”,Knowledge-Based Systems., 24 (2011), pp.393–405, 2011.

延伸閱讀