透過您的圖書館登入
IP:3.149.243.32
  • 期刊

為頻繁單變量不確定樣式產生摘要

Generating Summaries for Frequent Univariate Uncertain Pattern

摘要


在巨量資料的研究與應用裡,從中發現有用的知識是一個重要的主題。大部份關於這個主題的研究著眼於發展於巨量資料中擷取知識的方法,然而如何呈現所發現的知識仍是一個關鍵的議題。在文獻中,單變量不確定資料是一種巨量資料,並且從單變量不確定資料裡取出的頻繁單變量不確定樣式是一有用的知識。可是,頻繁單變量不確定樣式的數量通常都非常龐大而不易被人們利用,因此我們需要一個好的頻繁單變量不確定樣式的呈現方式。我們提出為頻繁單變量不確定樣式產生摘要的研究,其使用階層群集技術去產生摘要。人們只需檢視數十或數百個代表樣式,而不需處理大量的頻繁樣式。實驗顯示我們方法的摘要品質高於最大頻繁單變量不確定樣式所提供的摘要品質。

並列摘要


In big data related research and applications, discovery of useful knowledge from big data is an important topic. While most studies concerning this topic focus on developing methods for retrieving knowledge, presentation of the discovered knowledge remains a critical issue. A good method of presentation allows ordinary people to quickly understand and utilize the discovered knowledge. In the literature, univariate uncertain data is one kind of big data, and frequent patterns retrieved from univariate uncertain data, i.e., frequent univariate uncertain patterns, represent useful knowledge. However, the number of frequent univariate uncertain patterns is often too large to be dealt with by ordinary user. In other words, a good method of presenting the discovered frequent univariate uncertain patterns is required. To this end, we propose a novel way of summarizing frequent univariate uncertain patterns. We use a hierarchical clustering technique to generate a summary of a set of frequent univariate uncertain patterns. Instead of examining a large number of frequent univariate uncertain patterns, a user only needs to check tens, or perhaps hundreds, of representative frequent univariate uncertain patterns. Experimental results show that the summarization quality of our method is better than the summarization quality of maximum frequent univariate uncertain patterns.

參考文獻


Abd-Elmegid, L. A.,El-Sharkawi, M. E.,El-Fangary, L. M.,Helmy, Y. K.(2010).Vertical mining of frequent patterns from uncertain data.Computer and Information Science.3(2),171-179.
Afrati, F.,Gionis, A.,Mannila, H.(2004).Approximating a collection of frequent sets.Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.(Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).:
Aggarwal, C. C.,Li, Y.,Wang, J.,Wang, J.(2009).Frequent pattern mining with uncertain data.Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.(Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).:
Bayardo, R. J., Jr.(1998).Efficiently mining long patterns from databases.Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data.(Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data).:
Calders, T.,Goethals, B.(2007).Non-derivable itemset mining.Data Mining and Knowledge Discovery.14(1),171-206.

延伸閱讀