透過您的圖書館登入
IP:3.135.221.17
  • 學位論文

品質預測之位置相關特徵擷取與關聯分類

Location-based Feature Extraction and Associative Classification for Quality Prediction

指導教授 : 吳宜鴻

摘要


本研究旨於尋找目標物件與其他不同類型物件在空間上的關聯,我們以飯店為目標物件,飯店周遭的其他物件類型及數量可能影響其評等,我們提出基於不同距離擷取環境特徵的方法,藉以統計各飯店周遭一定距離內不同類型的環境物件數量,形成該飯店的環境特徵,再從不同類型環境物件中各自挑選部分特徵以探勘針對飯店評等的關聯法則,排序及修剪這些法則之後即可建立分類器。實驗資料採用交通部觀光局公布的合法飯店評等以及Google地圖上50種不同類型的環境物件及其距離資訊,我們根據不同的距離或者不同的飯店評等相關程度分別擷取特徵,然後產生對應的關聯法則及分類器,藉此觀察距離或相關程度對分類準確度的影響,最佳特徵集合可以達到93%的平均準確率。為了驗證所發現的關聯法則是否符合人們對飯店品質影響因素的一般認知,我們透過人工檢視的方式標註分類器中的關聯法則,最佳特徵集合產生的結果顯示分類器中85%的關聯法則被認為是合理的。

關鍵字

特徵選擇 分群 關聯分類

並列摘要


This thesis aims at finding spatial relationships between target objects and surrounding objects of various types. We consider hotels as target objects. The types of other objects surrounding the hotels and their quantities may have impacts on the ratings of hotels. We propose an approach to extract environmental features based on different distances. In our approach, within a certain distance the number of objects surrounding each hotel is computed to form its environmental features. After that, for each type of surrounding objects, only a portion of features are chosen to discover association rules with respect to hotel rating. A classifier can be built after these rules are sorted and pruned. Experimental data are the ratings of legal hotels announced by the Tourism Bureau in Taiwan and the objects of 50 types on Google map together with their distance information. According to different distances, or various degrees of correlation with hotel ratings, we respectively extract features and then generate association rules and the corresponding classifier. In this way, we can observe the influence of the distance or correlation degree on the classification accuracy. The best set of features can achieve an average precision of 93%. In order to verify whether the discovered association rules are in line with people’s general cognition of the factors affecting hotel quality, we label the association rules in the classifier by manual inspection. The results produced by the best set of features show that 85% of the association rules in the classifier are considered reasonable.

參考文獻


[1]. R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” VLDB pp. 487-499, 1994.
[2]. E. Baralis, and P. Garza, “A Lazy Approach to Pruning Classification Rules,” ICDM pp. 35-42, 2002.
[3]. J. A. Hartigan, “Clustering algorithms,” 1975.
[4]. G. Kundu, S. Munir, Md. Faizul Bari, Md. Monirul Islam, and Kazuyuki Murase , “A Novel Algorithm for Associative Classification,” ICONIP (2) pp. 453 - 459, 2007.
[5]. W. Li, J. Han, and J. Pei, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,” ICDM pp. 369-376, 2001.

延伸閱讀