透過您的圖書館登入
IP:18.222.200.143
  • 學位論文

資料串流上以滑動視窗探勘子空間天際線

Mining Subspace Skylines with a Sliding Window over Data Streams

指導教授 : 吳宜鴻

摘要


在多維資料的應用中,天際線相較於其它物件是特別突出的,又分為同時考慮所有維度的全域天際線,和只考慮部分維度組合的子空間天際線。本論文研究資料串流上以滑動視窗探勘子空間天際線的問題,物件可能在多維空間中不斷移動,造成天際線的更替,因此需要頻繁地計算物件之間的維度值差異,以隨時掌握在各子空間下的天際線。在實際情況中,非天際線佔有相當大的比例,因此我們先針對非天際線紀錄被支配的資訊,避免對非天際線進行不必要的計算。此外我們依據全域天際線彼此在各維度上的支配與同值關係,利用邏輯運算快速地推算該天際線在哪些子空間中也是天際線,並利用非天際線的被支配資訊,探勘出所有成為子空間天際線的非天際線。實驗結果顯示,我們避免非天際線不必要計算的作法能減少平均約30%執行時間,且達到九成以上準確率;而子空間天際線的探勘,則在維度低時有特別好的表現。

並列摘要


In multi-dimensional data applications, skyline objects, classified as full skyline and subspace skylines, are especially outstanding compared with the other objects. The full skyline takes all the dimensions into account, while the subspace skylines consider only part of the dimensions. We research into the problem of mining subspace skylines with a sliding window over data streams. As the objects move in the multi-dimensional space, the skyline objects vary as time goes. Thus, we need to compute the differences of dimension values among objects to keep track of the skyline objects in every subspace. In real cases, non-skyline objects are in the majority. To avoid unnecessary computations on some non-skyline objects, we record the full skyline objects that dominate them. Besides, according to the dominance and coincidence relationships among the full skyline objects, we employ logical operations to compute the subspaces in which they are also skyline objects. Furthermore, the non-skyline objects that are subspace skylines can also be discovered from the recorded information of the full skyline objects dominating them. The experimental results show that our method for avoiding unnecessary computations on non-skyline objects can reduce on average 30% of the execution time and achieve the accuracy above 90%. The mining of subspace skylines performs well especially when the number of dimensions is low.

參考文獻


[1] S. Borzsonyi, D. Kossmann, and K. Stocker. The Skyline Operator. In ICDE 2001.
[2] C. Y. Chan, H. V. Jagadish, K.L. Tan, K. H. Tung, and Z. Zhang. Finding k-dominant skylines in high dimensional space. In SIGMOD 2006.
[3] H.-C. Chen and A. L. P. Chen. A Music Recommendation System Based on Music Data Grouping and User Interests. In CIKM 2001.
[4] M. E. Khalefa, M. F. Mokbel and J. J. Levandoski. Skyline Query Processing for Incomplete Data. In ICDE 2008.
[5] M. Kontaki, A. N. Papadopoulos, and Y. Manolopoulos. Continuous k-Dominant Skyline Computation on Multidimensional Data Streams. In SAC 2008.

延伸閱讀