以使用者瀏覽行為的情境感知學習於網頁類別預測

現今網路發達及網頁服務的成長非常迅速，大數的網頁類別預測皆利用來自於使用者在入口網站及搜尋引擎的關鍵字查詢、以及與查詢結果有相關連的網頁點擊，探勘其關連來預測使用者的意圖。分析這些使用者在網站上的存取資料及結果，不僅可以幫助增進搜尋引擎回傳的查詢資料的準確度、透過網頁快取及預先儲存點擊的網頁以增加搜尋引擎的效能、與查詢關連的網頁推薦系統、個人化的網站排序系統，還可應用在商業廣告行為的產品推薦及資訊過濾的應用，所以預測使用者的意圖顯然是個很重要的議題及挑戰。多數的研究皆以觀察使用者的查詢關鍵字及關連結果的網頁點擊，來分析使用者的意圖及瀏覽行為。本論文利用觀察使用者瀏覽網頁的存取紀錄及其網頁的類別紀錄，藉由預測使用者未來點擊的網頁類別來了解其意圖，並且實作出兩種模型：利用網頁的頂級網域名稱模型(Top-Level Domain Model)及隱藏馬可夫模型(Hidden Markov Model)來預測使用者的網頁類別。依據上述兩種模型，我們提出混合模型(Mixture Model)，以隱藏馬可夫模型(Hidden Markov Model)配合瀏覽網址的頂級網域名稱模型(Top-Level Domain Model)加上網域的關連做最佳化。實驗證實：(1)觀察網址本身的資訊在特定的頂級網域上，的確能幫助提升網頁類別預測的準確性；(2)觀察使用者瀏覽行為的情境感知的資訊所預測的網頁類別會更加準確；(3)觀察使用者瀏覽行為的前幾次存取紀錄越多，準確率越高(HMM 1-gram, HMM 2-gram, HMM 3-gram, HMM 4-gram 的比較)。

關鍵字

使用者意圖；網頁類別預測；存取紀錄；資料檢索；點擊行為

並列摘要

Web activities and services are increasing rapidly. In recent years, predicting user intent most from relation between query keyword and queried result pages with search engine or portal. Analyzing users’ access data or activities on website can help web service provider to enhance the accuracy of query keyword’s result pages, to improve website’s performance by caching query keyword’s result pages and pre-fetch web pages, to improve web page recommendation system and web page ranking system personalization, to improve commercial advertisement for products and application to information filtering. So capture the context of user’s previous browsing behavior for predicting user intent is a very important issue and challenge. Most studies are focus on user’s query keyword and relation between query keyword and next click pages in queried result page for predicting user intent. We implement two models, Top-Level Domain model(TLD) that trained by URL-based feature, Hidden Markov Model(HMM) that trained by context-aware category sequence from user’s browsing URLs. And we proposed a mixture model for combining TLD and HMM to predict category of user’s next access page. Also, to apply our proposed context-aware web page category prediction model to two filtering applications, i.e., objectionable web content filtering and web security threat prevention.

並列關鍵字

User Intent ； Web Page Category Prediction ； User Browsing Log ； User Click Behavior

參考文獻

F. Sebastiani. (2002). “Machine learning in automated text categorization”, Journal ACM Computing Surveys (CSUR) Volume 34 Issue 1, pages 1-47.

H. Zuo, W. Hu, O. Wu. (2010). “Patch-based skin color detection and its application to pornography image filtering.” WWW '10: Proceedings of the 19th international conference on World wide web, pages 1227–1228.

Internet Assigned Numbers Authority (IANA) , available online at http://www.iana.org/

J. Z. Kolter, M. A. Maloof. (2008). “Learning to Detect and Classify Malicious Executables in the Wild”, IEEE Transactions on Information Forensics and Security Volume 3 Issue 3, pages 2721-2744.

Lee, P. Y., Hui, S. C., and Fong, A. C. M. (2002). “Neural Networks for Web Content Filtering,” IEEE Intelligent Systems Volume 17 Issue 5, pages 48-57.

被引用紀錄

阮彥程（2013）。探索使用者瀏覽行為於不當內容過濾〔碩士論文，國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2013.00443

國際替代計量

以使用者瀏覽行為的情境感知學習於網頁類別預測

全文下載

主題瀏覽