人物搜尋之資訊擷取與分類

本論文提出一個以網路資源為本，自動收集中文人名經歷資訊及專業領域。透過個人經歷資訊擷取以及專業領域的分類，可以有效地解決人名歧異(Personal Name Disambiguation)之問題。而專業領域分類更使得個人資訊的提供，能有系統一致化地呈現給使用者。在訓練過程中，我們利用語言學的知識以及統計學上的技術，從網路上收集經歷資訊之表面樣式(surface patterns)，作為從網路上收集人名資訊以及擷取個人資訊之依據。並且應用Yarowsky (1995)的自舉式方法，以網路資源為本來訓練文件分類器。在執行階段，輸入的人名透過表面樣式之輔助收集經歷資訊，經由經歷資訊及領域分類，解析區隔同名同姓人士的資訊。我們也將描述此一方法的系統實作。實驗結果證明我們的方法能夠有效地取出人名的經歷，並且區格不同領域的同名同姓人士，使得個人資訊之網路搜集更為有效。

關鍵字

人名檢索；資訊擷取；文件分類

並列摘要

We introduce a method for automatically collecting personal information and professional domain of the person. In our approach, personal information is extracted and the domain is identified from web-based data based on personal name disambiguation. In the training phase, the method involves generating surface pattern to personal information extraction based on linguistic and statistical information from the Web, and an unsupervising algorithm for constructing Web-based text categorization. At runtime, submitting a person name into a search engine, extracting personal information and identifying each retrieved passage the domain according to the expected person name, finally the referents are sorted by domain, personal information and the degree of popularity. We also described an implementation of the proposed method. Blind evaluation of a set of names shows that our method outperforms extracting personal information and cleanly classifying individual’s domain-specific knowledge. This method can be applied to help users quickly find about a person with resulting in the display of personal information in a systematic and consistent way.

並列關鍵字

person search ； information extraction ； text categorization

參考文獻

AI-Kamha, R. and Embley, D. W. Grouping Search-Engine Returned Citations for Person-Name Queries. In WIDM’04, pp.96-103, Washington, DC, USA, 2004.

Bekkerman, R. and McCallum A. Disambiguating Web Appearances of People in a Social Network. In Proceedings of the 15th World Wide Web Conference (WWW 2005), ACM press, pp.463-470, Chiba, Japan, 2005.

Bollegala D., Marsuo Y., and Ishizuka M. Extracting Key Phrases to Disambiguate Personal Names on the Web. In Proceeding of CICLing, 2006.

Manning, C. D. Foundations of Statistical Natural Language Processing (London: England, 1999), pp. 232, 249-252, 494, 575.

Yarowsky, D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189-196, 1995.

被引用紀錄

黃揚耀（2009）。中醫藥古籍文獻資訊分析－關鍵字詞擷取段落屬性的方法〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215455521

高誌謙（2009）。中藥本草典籍論述屬性欄位關鍵字串之研究〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916272048

國際替代計量

人物搜尋之資訊擷取與分類

全文下載

主題瀏覽