現今網際探勘領域中,統計叢聚技術常被用來分析網站瀏覽者對網頁之瀏覽偏好。然而此法只能將每一使用者瀏覽路徑歸類於單一群組中,即事先假設每一瀏覽路徑只包含單一種使用者偏好,卻忽略同一使用者瀏覽路徑可能包含數個網頁偏好。對此,另有學者提出模糊叢聚技術以彌補上述之不足。但此類型研究於分析瀏覽路徑相似程度方面,只能根據網頁距離計算。因此當網站瀏覽者以不同瀏覽路徑觀看相同網頁時,容易產生錯誤的分析結果。 針對上述情況,本論文提出一結合模糊叢聚技術及關聯法則之網際探勘架構。此法首先過濾瀏覽路徑中可能造成分析誤差之超連結網頁,再利用關聯法則計算網頁間之關聯性。最後則擴充模糊叢聚技術於瀏覽路徑相似度之計算方式,以網頁關聯法則信心度取代網頁距離,藉由適當的分群以求得使用者真正之瀏覽偏好。
Lately, most studies have relied on statistic clustering techniques to analyze web user profile data in web mining. However, this approach can only sort each user session into a single cluster. That is, it ignores a user session may contain several browsing prefers. According to this insufficiency, fuzzy clustering techniques were proposed instead. But those methods only can use similarity score of session to calculate the similarity between pages. Therefore, if users browse the same web page by different paths, that causes wrong results. This research proposes a framework which combines the fuzzy clustering and association rules. This approach filters out the noisy data, and employs association rules to calculate the confidence of the rule as the association between different URL addresses. Finally, an improved fuzzy clustering is adopted, which replaces the similarity score of session with the confidence between pages, to found out the user prefers effectively.