透過您的圖書館登入
IP:18.225.56.51
  • 期刊
  • OpenAccess

White Page Construction from Web Pages for Finding People on the Internet

並列摘要


This paper proposes a method to extract proper names and their associated information from web pages for Internet/Intranet users automatically. The information extracted from World Wide Web documents includes proper nouns, E-mail addresses and home page URLs. Natural language processing techniques are employed to identify and classify proper nouns, which are usually unknown words. The information (i.e., home pages' URLs or e-mail addresses) for those proper nouns appearing in the anchor parts can be easily extracted using the associated anchor tags. For those proper nouns in the non-anchor pan of a web page, different kinds of clues, such as the spelling method, adjacency principle and HTML tags, are used to relate proper nouns to their corresponding E-mail addresses and/or URLs. Based on the semantics of content and HTML tags, the extracted information is more accurate than the results obtained using traditional search engines. The results can be used to construct white pages for Internet/Intranet users or to build databases for finding people and organizations on the Internet. Such searching services are very useful for human communication and dissemination of information.

參考文獻


Jyun-Sheng J. S., J. S.(1992).Large-Corpus-Based Methods for Chinese Personal Name Recognition.中文信息學報.6(3),7-15.
Computational Linguistics=CL(1993).Special Issues on Using Large Corpora.Computational Linguistics.19,1-2.
Davis, M. W.,Ogden, W. C.(1997).Working Notes of the AAAI-97 Spring Symposiums on Cross-Language Text and Speech Retrieval.
Proceeding of AAAI-96
Gachot, D. A.,Yang, J.,Lange, E.(1996).Proceedings of SIGIR96 Workshop on Cross-Linguistic Information Retrieval.

被引用紀錄


胡志祥(2005)。運用Meta-Search搜尋中文例句〔碩士論文,元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611362758
Budiansyah, A. (2010). Text Trend Analysis via Significant Term A Based on Indonesia News [master's thesis, Asia University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215465544
Davis, D. (2014). SociRank : 基於社群媒體影響力之新聞重要性排序 [master's thesis, National Tsing Hua University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0016-2912201413552242

延伸閱讀