透過您的圖書館登入
IP:18.225.209.95
  • 學位論文

多媒體資料特徵群集之探勘

Clustering Strategy for Multimedia Data

指導教授 : 柯佳伶
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


群集分析是在不預先設定資料類別的前提下,將具有相似屬性值的資料聚集成聚落單位。對於多媒體物件,群集分析的結果可以用來自動建立多媒體物件型錄,以提供瀏覽及搜尋相似物件的功能。由於多媒體物件在高維屬性值空間的平均分佈密度相當低,以往提出的群集分析演算法不易在此高維屬性值空間中找出聚落。本論文提出HBP (Histogram-Based Partition)群集分析演算法,可找出在某些屬性維度值所形成的聚集。演算法中以維度屬性值聚集評估函數,選出和聚落形成具有最高度關連的屬性維度,並依此維度的物件累積統計分佈圖將高維屬性值空間分割出一些部分維度區間,以相同做法遞迴分割部分維度區間,直到部分維度區間具有高物件密度為止。最後再合併鄰近的部分維度區間,形成聚落單位。此外,在HBP群集分析演算法中,我們以屬性值鏈結表的資料結構存下物件屬性值資訊,以避免在群集分析過程中重複讀取資料。為了驗證HBP群集演算法的有效性,本論文分別採用人造資料與風景影像特徵做為測試資料,結果顯示HBP群集分析演算法能運用在高維屬性值空間,以極短的計算時間找出物件的聚落。

並列摘要


Clustering strategy analyses a set of data to group the data with similar features to clusters without needing predefined cluster labels. For multimedia data, clusters are the basic units for constructing data category automatically to support browsing and retrieving similar data. Most multimedia data are described by large number of features. Therefore, the distribution density of data is significantly low in the vast feature space. The clustering algorithms proposed before could not find clusters well in the situation of high dimensional feature spaces. In this thesis, the HBP (Histogram-Based Partition) algorithm is provided to find data clusters according to part dimensions in the feature space. Initially, a cluster evaluation function is designed to choose the feature dimension, whose values are most suitable for forming the clusters among all dimensions. Then the high dimensional space is partitioned into subinterval spaces according to the histogram on the selected dimension. By performing the similar processing, the subinterval spaces are partitioned recursively until each subinterval space has high object density. Then the nearby subinterval spaces are merged to form a cluster. Moreover, an attribute-object mapping table is constructed in the algorithm for avoiding scanning data repeatedly. The synthesis data and image data, which have high dimensional features, are used to test the performance of the proposed algorithm. The experimental results show that HBP algorithm is appropriate for finding clusters in the feature space with high dimensions.

並列關鍵字

clustering multimedia data

參考文獻


[2] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” in Proc. of ACM SIGMOD International Conference on Management of Data, pages 94-105, Seattle, WA, USA, 1998.
[5] B. S. Duran and P. L. Odell, “Cluster analysis: a survey,” Lecture Notes in Economics and Mathematical Systems, vol. 100, Spinger-Verlag, 1974.
[8] L. Kaufman and P. J. Rousseeuw, “Clustering Large Applications (Program CLARA),” Finding Groups in Data: An Introduction to Cluster Analysis, pages 126-163, John Wiley & Sons, 1990.
[1]  C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, “Fast Algorithms for Projected Clustering,” in Proc. of ACM SIGMOD International Conference on Management of Data, pages 61-72, Philadelphia, PA, USA, 1999.
[3] M.-S. Chen, J. Han, and P. S. Yu, “Data Mining: An Overview from Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pages 866-883, 1996.

延伸閱讀