在多維資料的應用中,天際線相較於其它物件是特別突出的,又分為同時考慮所有維度的全域天際線,和只考慮部分維度組合的子空間天際線。本論文研究資料串流上以滑動視窗探勘子空間天際線的問題,物件可能在多維空間中不斷移動,造成天際線的更替,因此需要頻繁地計算物件之間的維度值差異,以隨時掌握在各子空間下的天際線。在實際情況中,非天際線佔有相當大的比例,因此我們先針對非天際線紀錄被支配的資訊,避免對非天際線進行不必要的計算。此外我們依據全域天際線彼此在各維度上的支配與同值關係,利用邏輯運算快速地推算該天際線在哪些子空間中也是天際線,並利用非天際線的被支配資訊,探勘出所有成為子空間天際線的非天際線。實驗結果顯示,我們避免非天際線不必要計算的作法能減少平均約30%執行時間,且達到九成以上準確率;而子空間天際線的探勘,則在維度低時有特別好的表現。
In multi-dimensional data applications, skyline objects, classified as full skyline and subspace skylines, are especially outstanding compared with the other objects. The full skyline takes all the dimensions into account, while the subspace skylines consider only part of the dimensions. We research into the problem of mining subspace skylines with a sliding window over data streams. As the objects move in the multi-dimensional space, the skyline objects vary as time goes. Thus, we need to compute the differences of dimension values among objects to keep track of the skyline objects in every subspace. In real cases, non-skyline objects are in the majority. To avoid unnecessary computations on some non-skyline objects, we record the full skyline objects that dominate them. Besides, according to the dominance and coincidence relationships among the full skyline objects, we employ logical operations to compute the subspaces in which they are also skyline objects. Furthermore, the non-skyline objects that are subspace skylines can also be discovered from the recorded information of the full skyline objects dominating them. The experimental results show that our method for avoiding unnecessary computations on non-skyline objects can reduce on average 30% of the execution time and achieve the accuracy above 90%. The mining of subspace skylines performs well especially when the number of dimensions is low.