行為預測於不當內容過濾之研究

本論文提出基於使用者行為的預測模型應用於辨識和過濾不當網路內容，例如：色情、賭博、暴力和毒品等等，藉此達到保護兒童或其他使用者在網路瀏覽的過程中，避免存取不當內容的可能性。從使用者行為的觀點，本論文可以區分為兩個部分，介紹如下。第一個部分是搜尋行為探勘於網路色情協同過濾之研究。我們提出基於搜尋意圖的方法產生和更新色情網址黑名單，用於過濾主要的不當內容種類。伺服器端的查詢記錄檔內，儲存的搜尋和點擊行為，在沒有分析網頁內容的前提之下，可以被有效開發，用於標記使用者點擊網址的種類。我們的方法可以幫助搜尋引擎隱藏包含色情內容的搜尋結果，達到兒童存取網路內容適當性的目的。第二個部分是瀏覽行為探勘於不當內容過濾之研究。除了透過搜尋引擎檢索資訊，使用者有其他的替代方式存取不同類型的不當內容。我們進一步提出探索使用者瀏覽意圖的方式，用於預測使用者點擊網頁內容的種類，同時應用預測結果過濾不當內容。用戶端的點擊資料被用來驗證在不擷取網頁內容做機器學習的考量下，展示預測網頁內容種類的可能性。傳統的過濾技術將這個研究議題視為分類問題，採取網頁內容分析的方法，不同於這個研究脈絡，我們提出行為探勘的研究方向。實務上，我們提出的預測模型從使用者行為角度出發，對於現有的解決方案具有互補效果。

關鍵字

使用者行為探勘；情境感知預測；網路內容過濾；兒童報護；分類問題

並列摘要

This dissertation proposes user-behavior-based models to identify and filter objectionable content, such as pornography, gambling, violence, and drugs, for protecting children or anyone else from inappropriate materials during their web surfing. From users’ behavioral perspectives, this dissertation can be divided into two parts, which are introduced as follows. The first part is mining searching behaviors for collaborative cyperporn filtering. We present the search-intent-based methods to generate and update pornographic blacklists for filtering the major objectionable category. Searches-and-clicks keeping in the server-side query logs can be effectively exploited for tagging the categories of users’ clicked URLs without the help of analyzing any page content. Our proposed methods can be adopted to help the search engines to mask objectionable results for child suitability purpose. The second part is mining browsing behaviors for objectionable content filtering. In addition to retrieving the information via search engines, users have many alternatives to access other kinds of objectionable web content. We further explore users’ browsing intents to predict the category of a user’s next access and apply the results to filter objectionable content. Client-side click-through data is evaluated to demonstrate the feasibility of predicting categories without the necessity of crawling page content for machine learning. Traditional filtering techniques regard this research problem as categorization through intelligent content analysis. Different from this research line, we propose another direction via behavioral mining. In practices, our proposed prediction models are complementary to the existing solutions by mining users’ behaviors.

並列關鍵字

users’ behavioral mining ； context-aware prediction ； web content filtering ； child protection ； categorization.

參考文獻

Caulkins, J. P., Ding, W., Duncan, G., Krishnan, R., & Nyberg, E. (2006). A method for managing access to web pages: filtering by statistical classification (FSC) applied to text. Decision Support Systems, 42(1), 144-161.

Chau, M., & Chen, H. (2008). A machine learning approach to web page filtering using content and structure analysis. Decision Support Systems, 44(2), 482-494.

Chen, K.-T., Chen, J.-Y., Huang, C.-R., & Chen, C.-S. (2009). Fighting phishing with discriminative keypoint features. IEEE Internet Computing, 13(3), 56-63.  

Doring, N. M. (2009). The Internet’s impact on sexuality: A critical review of 15 years of research. Computers in Human Behaviors, 25(5), 1089-1101.

Hammami, M., Chahir, Y., & Chen, L. (2006). WebGuard: a web filtering engine combining textual, structural, and visual content-based analysis. IEEE Transactions on Knowledge and Data Engineering, 18(2), 272-284.

國際替代計量

行為預測於不當內容過濾之研究

全文下載

主題瀏覽