多字詞擴展及混合式候選人選擇之專家找尋系統

本論文旨在解決專家尋找的問題，在文中提出兩種解決該問題的模型，分別是字詞擴展模型及混合式候選人選擇模型。這兩種模型都是以文件為基礎，在擴展搜尋字詞的條件下，加入個人資料，以進行專家的搜尋。另為比較每個擴展字詞對系統的影響性，針對每一個模型計設兩個方法，一是有相同權重的擴展字詞，另是有不同權重的擴展字詞。第三章說明在字詞擴展模型的專家找尋系統裡，共有三大運作的階段。在第一階段，系統需要先建立可做為擴展字詞的資料集。過程中需對原始的資料集進行文句的分割、詞性的分析。之後會以C-value的方法找出與使用者查詢字串相關的擴展字詞。在第二階段中，系統會針對相關擴展字詞與相關文件之間的關聯性，用向量內積的方式進行計算，以比序各相關文件對使用者查詢字詞的相關性。第三階段則是進行相關文件與候選人之間的配對。此階段會運用各相關文件的比序值及各候選人在原始資料集裡的個人資料量化值進行比較，以得到推薦的專家人選。至於混合式候選人選擇模型專家找尋系統，共有四大運作的階段。前三個階段與字詞擴展模型相同。在第四階段則是再加上各候選人在相關文件裡的個人資料，進行文件與個人資料的混合式計分，以比序出最後的推薦人選。根據第四及第五章P@n, R@n, MRR及MAP的檢測結果，發現混合式候選人選擇模型的精準度會比字詞擴展模型高約3%到以上。不同權重的擴展字詞對字詞擴展模型的精準度影響不大，但對混合式候選人選擇模型就有正向的影響。總體而言，具有權重擴展字詞的混合式候選人選擇方法在四個方法中，可以獲得較高的精準度，並且在候選人數增加的情況下，大都可以維持相似的精準度。文末第六章的結論中，總結出七個要點，並對未來的相關研究提出五個方向。

關鍵字

專家找尋系統；字詞擴展； C-value

並列摘要

The thesis proposes two models to solve the expert finding problem. One is query extension model; the other is hybrid candidate selection model. These models base on documents. They find experts with extension query terms and some profiles. Moreover, every model designs two methods to compare the influence of extension query terms in the system. One method has the same weighted extension query terms; the other has different weighted extension query terms. Chapter 3 presented that Query extension model has three phases to find experts. In the first phase, the system needs to construct the dataset of extension query terms. The processes have to split sentences, analyze part of speech, and retrieve relevant extension query terms with C-value. In the second phase, the system calculates the relationship between extension query terms and relevant documents with the dot product of two vectors. Afterwards the system ranks relevant documents with the calculation result. In the third phase, the system combines the rank from the two phase and profiles from original dataset to calculate every candidate’s score. It ranks these scores for candidates and imports a recommendation list. Hybrid candidate selection model has four phases. It is the same as query extension model in the previous three phases. In the fourth phase, the system collects document scores and profile scores gotten from relevant documents for every candidate. Then, it ranks these scores and selects recommended experts. In the chapter 4 and 5, by the results of P@n, R@n, MRR, and MAP, we found that the precision of hybrid candidate selection model is over 3% higher than query extension model. Different weighted extension query terms have positive influences for hybrid candidate selection model. However they do not have greater impacts for query extension model. Weighted hybrid candidate selection method can get higher precision in all proposed methods as a whole. It can also keep similar precisions when the number of candidates is increasing. In the last chapter, it summarizes seven points and suggests five ideas about future works.

並列關鍵字

expert finding system ； query extension ； C-value

參考文獻

Kuo, W.-T. et al., 2010. Using Linguistic Features to Predict Readability of Short Essays for Senior High School Students in Taiwan1. 中文計算語言學期刊, 15(3-4), pp.193–218.

Lu, Y. et al., 2012. Semantic Link Analysis for Finding Answer Experts. Journal of Information Science and Engineering, 28(1), pp.51–65.

Teng, W.-G., Wen, W.-H. & Liu, Y.-C., 2012. From Experience to Expertise: Digesting Cumulative Information for Informational Web Search. Journal of information science and engineering, 28(1), pp.161–176.

Al-Shboul, B. & Myaeng, S.-H., 2014. Wikipedia-based query phrase expansion in patent class search. Information Retrieval, 17(5-6), pp.430–451.

Balog, K. et al., 2012. Expertise Retrieval.

國際替代計量

多字詞擴展及混合式候選人選擇之專家找尋系統

全文下載

主題瀏覽