  • 學位論文


A New Approach to Organizing Search Results for People Search

指導教授 : 簡立峰
共同指導教授 : 王柏堯


在使用搜尋引擎尋找資訊時,所得到過多的搜尋結果往往需要使用者花費許多心力瀏覽。在使用人名進行搜尋時,除了上述的問題,另外還可能會遭遇到人名歧義的問題,也就是可能出現與查詢之人名同名但實際是不同人的搜尋結果。因此這些搜尋結果仍然可以有更進一步更好的組織,來幫助使用者瀏覽與檢視。 本論文提出了一個新的方式,來祖織人名搜尋之搜尋結果,好讓相似之搜尋結果能群聚在一起,並且群聚在一起的結果都是反映同一個人之搜尋結果。本論文採用了從搜尋結果中抽取並經過篩選而得到的重要片語,當作群聚時的一項特徵,來計算搜尋結果彼此之間的相似度,進而將結果群聚起來。為了呈現群聚後的結果,我們另外利用之前抽取的重要片語,來標記各個群落。除此之外,我們也利用分類的方式試著將各群分進預先訂好的類別,並且使用所得到的類別當作各群額外的標記。 分析實驗結果,搭配使用重要片語當特徵來群聚所得到的效果,的確要比只使用任意單字詞與雙字詞為特徵來群聚的效果好,但整體效果仍有進步空間。而使用重要片語以及類別標記各個群落,則的確為使用者呈現更多有用資訊。最後,我們提出未來在此研究議題中仍可努力的地方。


Web search engines are one of the most important interfaces to the Internet which has grown into a collection of billions of web pages. Due to the ambiguities in the queries and documents, search engines return lots of irrelevant pages. In the case of searching for people names, we may receive web pages of different people with the same name. In this thesis we present a new approach to organize search results for people search, such that the search results in each cluster belong to the same person. To cluster the search results we first extract key phrases from search results, then use key phrases as a feature to cluster search results. To present information in clusters to user, we label clusters with categories and key phrases. The output is clusters of search results, with names and categories labeled, such that the search results in each cluster relate to the same person. The experiments results show that using key phrases as feature to clustering do perform better than only using n-gram (n<=2) as feature. However, the overall performance is not satisfying. In the final section, we make conclusion and propose some future work.


