於機率性資料庫中選擇具影響力物件之技術

In this dissertation, we study how to select influential objects in probabilistic databases. For an uncertain dataset with probabilistic attribute values, we would like to tell which objects can best improve query or mining results if we can acquire their exact attribute values. The problem is explored on both clustering and the Probabilistic k-Nearest-Neighbor (k-PNN) query. We carefully define the metrics for evaluating the quality of the results of clustering and k-PNN query, and then we design algorithms to find the solutions according to the metrics correspondingly. For the k-PNN query, we provide optimal solutions of acquisition for nearest-neighbor query (1-PNN), and we propose a scalable algorithm solving the acquisition for k-PNN query with k > 1. Besides, for a social network dataset with edge probabilities, we would like to tell which neighboring nodes of the query node can best help gather specific information if these nodes are asked. We carefully formulate the problem according to the motivated scenario, and the proposed approach considers both the strength and the diversity of a node’s influence. We conduct experiments on various datasets, and the experimental results demonstrate the effectiveness and the efficiency of the proposed approaches.

並列關鍵字

probabilistic database ； clustering ； nearest-neighbor query ； social networks

參考文獻

[1] S. Abiteboul, P. C. Kanellakis, and G. Grahne, ”On the Representation and Querying of Sets of Possible Worlds,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 1987.

[3] C. C. Aggarwal, J. Han, J. Wang and P. S. Yu, ”A Framework for Clustering Evolving Data Streams,” Proc. 29th Int’l Conf. on Very Large Data Bases (VLDB), 2003.

[4] C. C. Aggarwal and P. S. Yu, ”A Framework for Clustering Uncertain Data Streams,” Proc. 24th IEEE Int’l Conf. on Data Engineering (ICDE), 2008.

[6] C. C. Aggarwal and P. S. Yu, ”A Survey of Uncertain Data Algorithms and Applications,” IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 21, no. 5, pp. 609-623, May 2009.

[8] M. Ankerst, M. M. Breunig, H.-P. Kriegel and J. Sander, ”OPTICS: Ordering Points to Identify the Clustering Structure,” Proc. ACM SIGMOD Int’l Conf. on Management of Data (SIGMOD), 1999.

國際替代計量

於機率性資料庫中選擇具影響力物件之技術

全文下載

主題瀏覽