Title

資料串流環境中之頻繁時間樣式探勘

Translated Titles

Mining of Frequent Temporal Patterns on Data Streams

DOI

10.6342/NTU.2004.02412

Authors

鄧維光

Key Words

資料串流 ; 小波轉換 ; 頻繁時間樣式 ; 資料探勘 ; Frequent Temporal Pattern ; Data Stream ; Data Mining ; Wavelet Transform

PublicationName

國立臺灣大學電機工程學研究所學位論文

Volume or Term/Year and Month of Publication

2004年

Academic Degree Category

博士

Advisor

陳銘憲

Content Language

英文

Chinese Abstract

近年來,許多資料查詢與資料探勘的相關議題被引進資料串流的環境中。在眾多資料探勘議題中,以購物交易資料進行頻繁樣式的探索已被公認為極具重要性之研究方向。在此論文中,主要的研究課題有三:一、以靜態的交易資料庫作為出發點來探討頻繁樣式的探勘模型,進而推廣其觀念以應用於線上環境所產生之交易資料串流;二、研究資料串流環境中有限資源之有效利用方式;三、研發追蹤線上資料串流時的品質保證機制。其明確之相關研究內容簡述如下: 由探索頻繁樣式進而推導關連性法則的流程,我們發展了替代性法則此一新的資料探勘技術。所謂的替代性是指當購買行為發生時,顧客對於商品間的選擇與取捨關係。此種替代性法則的發掘,可以有效提供關於購物預測、顧客行為分析與決策支援等各方面的寶貴知識。具體而言,藉由建立替代性法則的理論基礎,我們更歸納商品組合間的出現頻率關係,來增進計算負面商品組合時的效率,並進而獲取深具統計意義的結果。 為了在資料串流環境中探索頻繁時間樣式,我們首先發展出一適用於各種時間樣式之頻率計數架構,並設計一具有兩大主要特色的演算法:首先是僅對每筆記錄進行單次掃瞄來線上蒐集各統計量值,另一特色則是利用迴歸理論產生簡潔的樣式表示式。藉此,線上的交易資料串流可即時地轉化為各種可能的頻繁樣式,各樣式的頻率變化亦可以多線段的迴歸方法來追蹤,而更利用了區段微調 (segment tuning) 與區段緩和(segment relaxation) 的技巧來確保記憶體的使用容量。結合這些特性後,此演算法不僅能對可變的時間區間進行資料探勘工作,並能有效地進行趨勢偵測。 在資料串流環境中另一應被重視的課題為:如何妥善利用記憶體空間與運算能力等有限的資源來產生準確的預估模型。針對追蹤線上時間序列時不同的精細度考量,系統資源可確保被利用於使用者較為重視的部分,例如:時間精細度是指隨著時間的變化,人們對越新發生的事件較感興趣,這意味著較多的系統資源應被用於仔細地探索較新的資料;此外,當進行頻繁時間樣式之資料探勘工作時,較多的資源也應用於處理所謂的邊緣樣式(borderline patterns)—發生頻率非常接近臨界值者,藉此能有效地辨別正確的頻繁樣式。有鑑於此,我們發展了以小波理論為基礎的演算法,來實現此一具資源感知性的樣式探勘工作,而藉由動態調配記憶體空間的使用方式,多個動態資料串流所產生之時間樣式亦可被正確發掘。 為追蹤由感測器收集得來或由資料探勘演算法產生之時間序列,我們利用小波轉換中的能量守恆特性來推導L1 與L2 誤差的理論關係,在藉由捨去較不重要的小波係數以節省寶貴系統資源時,可提供還原原始序列後的誤差保證。此外,為了處理無限長的線上資料串流,我們提出一個較佳的資料結構以適用於動態的資料摘要保存方式。經實驗證明,此種具解析度可適性之漸進拆解法,在保留時間序列重要特徵上所花的記憶體空間十分小,而當進行漸進式資料更新時,可以得到近似最佳解。

English Abstract

In recent years, several query problems and mining capabilities have been explored for a data stream environment. Among various data mining capabilities, the one receiving a significant amount of research attention is on mining frequent patterns over market basket data. In this dissertation, we first explore the model of frequent itemsets from static transaction databases and generalize relevant concepts to discovering of temporal relationship from online transaction flows. Then, we investigate the resource utilization issues in a data stream environment. Finally, we study the problem of quality guarantees when tracking online data streams. For the problem of mining frequent itemsets to derive association rules, a new mining capability, called mining of substitution rules, is first developed by extending the concepts of mining of association rules. Substitution refers to the choice made by a customer to replace the purchase of some items with that of others. The discovery of substitution rules, same as that of association rules, will lead to very valuable knowledge in various aspects, including market prediction, user behavior analysis and decision support. Specifically, we first derive theoretical properties for the model of substitution rule mining and devise a technique on the induction of positive itemset supports to improve the efficiency of support counting for negative itemsets. Then, in light of these properties, algorithm SRM (standing for substitution rule mining) is designed and implemented to discover the substitution rules efficiently while attaining good statistical significance. To mine frequent temporal patterns on data streams, a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams) is devised. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data scan for online statistics collection and regression-based compact pattern representation. To attain the feature of one data scan, the data segmentation and the pattern growth scenarios are explored for the frequency counting purpose. Algorithm FTP-DS scans online transaction flows and generates candidate frequent patterns in real time. The second important feature of algorithm FTP-DS is on the regression-based compact pattern representation. In addition, we develop the techniques of the segmentation tuning and segment relaxation to enhance the functions of FTP-DS. With these features, algorithm FTP-DS is able to not only conduct mining with variable time intervals but also perform trend detection effectively. The fundamental problem that how the limited resources, e.g., memory space and computation power, can be well utilized to produce accurate estimates in a data stream environment is also addressed. Two important features for tracking mined patterns with properly utilized resources are examined. The first issue is temporal granularity which refers to the phenomenon that as time advances, people are more interested in recent events, meaning that more resources can be utilized to explore more recent data with finer granularities. Second, with the mining task of discovering frequent temporal patterns, more resources are expected to be allocated to the processing of those borderline patterns whose statistics, e.g., occurrence frequencies, are close to the specified threshold so as to have proper frequent itemset identification. This feature is called mining with support count granularity. Consequently, a wavelet-based algorithm, called algorithm RAM-DS (Resource-Aware Mining for Data Streams) is devised to perform general pattern mining tasks for data streams by exploring both temporal and support count granularities. Algorithm RAM-DS is designed to not only reduce the memory required for data storage but also retain good approximation of target time series. In addition, algorithm RAM-DS can support a varying number of data streams by allocating memory space adaptively when tracking patterns generated from online transactions. For tracking online time series data which is directly collected from sensors or is generated by stream mining algorithms, we explore the energy preservation property of wavelet-based transform. The commonly used L1- and L2-error metrics are theoretically guaranteed when insignificant coefficients are discarded for saving precious resources in our framework. In addition, to handle infinite online data flows, an enhanced data structure RAID-tree which is based on the error tree is proposed for dynamic synopses maintenance over data streams. Specifically, an algorithm RAID with the resolution adaptability for incremental decomposition is developed. Experimental results have shown that the memory required for storing significant features of time series data is very small and the quality of approximation is stable when performing incremental data updates.

Topic Category 電機資訊學院 > 電機工程學研究所
工程學 > 電機工程
Reference
  1. Generation of Frequent Itemsets. Journal of Parallel and Distributed Computing (Special
    連結:
  2. Issue on High Performance Data Mining), 61(3):350—371, March 2001.
    連結:
  3. [2] C. C. Aggarwal and P. S. Yu. A New Framework for Itemset Generation. Proceedings of
    連結:
  4. the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems,
    連結:
  5. [3] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items
    連結:
  6. [4] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases.
    連結:
  7. Proceedings of the 20th International Conference on Very Large Data Bases, pages
    連結:
  8. 478—499, September 1994.
    連結:
  9. [6] J. M. Ale and G. Rossi. An Approach to Discovering Temporal Association Rules. Proceedings
    連結:
  10. [7] A. M. Ayad, N. M. El-Makky, and Y. Taha. Incremental mining of constrained association
    連結:
  11. [8] R. J. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-Based Rule Mining in Large,
    連結:
  12. Dense Databases. Proceedings of the 15th International Conference on Data Engineering,
    連結:
  13. pages 188—197, March 1999.
    連結:
  14. [9] J.-F. Boulicaut, A. Bykowski, and B. Jeudy. Towards the Tractable Discovery of Association
    連結:
  15. [11] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic Itemset Counting and Implication
    連結:
  16. [12] A. Bulut and A. K. Singh. SWAT: Hierarchical Stream Summarization in Large Networks.
    連結:
  17. Proceedings of the 19th International Conference on Data Engineering, pages 303—
    連結:
  18. 314, March 2003.
    連結:
  19. [13] C. S. Burrus, R. A. Gopinath, and H. Guo. Introduction to Wavelets and Wavelet Transforms:
    連結:
  20. Description Length. Proceedings of the 24th International Conference on Very Large Data
    連結:
  21. Bases, pages 606—617, August 1998.
    連結:
  22. [15] K.-P. Chan and A. W.-C. Fu. Efficient Time Series Matching by Wavelets. Proceedings of
    連結:
  23. the 15th International Conference on Data Engineering, pages 126—133, March 1999.
    連結:
  24. [16] S. Chandrasekaran and M. J. Franklin. Streaming Queries over Streaming Data. Proceedings
    連結:
  25. of the 28th International Conference on Very Large Data Bases, pages 203—214, August 2002.
    連結:
  26. Conference on Information and Knowledge Management, pages 536—539, November 2003.
    連結:
  27. [18] J. H. Chang and W. S. Lee. Finding Recent Frequent Itemsets Adaptively over Online Data
    連結:
  28. Discovery and Data Mining, pages 487—492, August 2003.
    連結:
  29. [19] M.-S. Chen, J. Han, and P. S. Yu. Data Mining: An Overview from Database Perspective.
    連結:
  30. IEEE Transactions on Knowledge and Data Engineering, 8(6):866—883, December 1996.
    連結:
  31. [20] M.-S. Chen, J.-S. Park, and P. S. Yu. Efficient Data Mining for Path Traversal Patterns.
    連結:
  32. IEEE Transactions on Knowledge and Data Engineering, 10(2):209—221, April 1998.
    連結:
  33. [21] X. Chen and I. Petrounias. Discovering Temporal Association Rules: Algorithms, Language
    連結:
  34. and System. Proceedings of the 16th International Conference on Data Engineering, page
    連結:
  35. 306, February 2000.
    連結:
  36. [23] D. Cheung, J. Han, V. Ng, and C. Y.Wong. Maintenance of Discovered Association Rules in
    連結:
  37. Large Databases: An Incremental Updating Technique. Proceedings of the 12th International
    連結:
  38. [24] D. Cheung, S. D. Lee, and B. Kao. A General Incremental Technique for Updating Discovered
    連結:
  39. [25] E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and
    連結:
  40. C. Yang. Finding Interesting Associations without Support Pruning. IEEE Transactions
    連結:
  41. Hamming Norms (How to Zero In). IEEE Transactions on Knowledge and Data Engineering,
    連結:
  42. 15(3):529—540, May 2003.
    連結:
  43. [28] M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining Stream Statistics over Sliding
    連結:
  44. pages 635—644, January 2002.
    連結:
  45. [30] P. Domingos and G. Hulten. Mining High-Speed Data Streams. Proceedings of the 6th ACM
    連結:
  46. August 2000.
    連結:
  47. [31] D. L. Donoho. De-noising by Soft-Thresholding. IEEE Transactions on Information Theory,
    連結:
  48. 41(3):613—627, May 1995.
    連結:
  49. [32] W. DuMouchel and D. Pregibon. Empirical Bayes Screening for Multi-Item Associations.
    連結:
  50. and Data Mining, pages 67—76, August 2001.
    連結:
  51. [33] V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Data Streams under Block Evolution.
    連結:
  52. [34] M. N. Garofalakis, J. Gehrke, and R. Rastogi. Querying and Mining Data Streams: You
    連結:
  53. [35] J. Gehrke, F. Korn, and D. Srivastava. On Computing Correlated Aggregates Over Continual
    連結:
  54. [36] A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing Wavelets on Streams:
    連結:
  55. Conference on Very Large Data Bases, pages 79—88, September 2001.
    連結:
  56. [37] S. Guha, D. Gunopulos, and N. Koudas. Correlating Synchronous and Asynchronous Data
    連結:
  57. Discovery and Data Mining, pages 529—534, August 2003.
    連結:
  58. of the 41st Annual Symposium on Foundations of Computer Science, pages 359—366,
    連結:
  59. November 2000.
    連結:
  60. 106—115, March 1999.
    連結:
  61. [40] J. Han and Y. Fu. Discovery of Multiple-Level Association Rules from Large Databases.
    連結:
  62. Proceedings of the 21th International Conference on Very Large Data Bases, pages 420—431,
    連結:
  63. September 1995.
    連結:
  64. [41] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann
    連結:
  65. Publishers, August 2000.
    連結:
  66. [42] J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based, Multidimensional Data
    連結:
  67. Mining. COMPUTER (Special Issue on Data Mining), pages 46—50, 1999.
    連結:
  68. Pattern-Projected Sequential Pattern Mining. Proceedings of the 6th ACM SIGKDD International
    連結:
  69. Conference on Knowledge Discovery and Data Mining, pages 355—359, August
    連結:
  70. [44] J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation.
    連結:
  71. pages 1—12, May 2000.
    連結:
  72. [45] R. V. Hogg and E. A. Tanis. Probability and Statistical Inference, 6/e. Prentice-Hall International,
    連結:
  73. Inc., December 2000.
    連結:
  74. [47] G. Hulten, L. Spencer, and P. Domingos. Mining Time-Changing Data Streams. Proceedings
    連結:
  75. Mining, pages 97—106, August 2001.
    連結:
  76. [48] P. Indyk, N. Koudas, and S. Muthukrishnan. Identifying Representative Trends in Massive
    連結:
  77. [49] C. Jermaine. The Computational Complexity of High-Dimensional Correlation Search. Proceedings
    連結:
  78. of the 1st IEEE International Conference on Data Mining, pages 249—256, November
    連結:
  79. [50] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analysis, 5/e. Prentice-
    連結:
  80. Hall International, Inc., November 2001.
    連結:
  81. Reduction for Indexing Large Time Series Databases. Proceedings of the 2001 ACM
    連結:
  82. Similarity Search in Large Time Series Databases. Knowledge and Information Systems,
    連結:
  83. 3(3):263—286, August 2001.
    連結:
  84. [53] E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An Online Algorithm for Segmenting
    連結:
  85. Time Series. Proceedings of the 1st IEEE International Conference on Data Mining, pages
    連結:
  86. 289—296, November 2001.
    連結:
  87. [54] J. Kleinberg, R. Motwani, P. Raghavan, and S. Venkatasubramanian. Storage Management
    連結:
  88. of Computer Science, pages 353—362, October 1997.
    連結:
  89. [55] L. V. S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of Constrained Frequent Set
    連結:
  90. [56] I. Lazaridis and S.Mehrotra. Capturing Sensor-Generated Time Series with Quality Guarantees.
    連結:
  91. Proceedings of the 19th International Conference on Data Engineering, pages 429—440,
    連結:
  92. March 2003.
    連結:
  93. [57] C.-H. Lee, C.-R. Lin, and M.-S. Chen. On Mining General Temporal Association Rules
    連結:
  94. Mining, pages 337—344, November 2001.
    連結:
  95. and Knowledge Management, pages 263—270, November 2001.
    連結:
  96. the 2nd SIAM International Conference on Data Mining, April 2002.
    連結:
  97. [60] J. L. Lin and M. H. Dunham. Mining Association Rules: Anti-Skew Algorithms. Proceedings
    連結:
  98. of the 14th International Conference on Data Engineering, pages 486—493, February 1998.
    連結:
  99. [61] B. Liu, W. Hsu, and Y. Ma. Mining Association Rules with Multiple Minimum Supports.
    連結:
  100. and Data Mining, pages 337—341, August 1999.
    連結:
  101. [62] B. Liu, W. Hsu, and Y. Ma. Identifying Non-Actionable Association Rules. Proceedings of
    連結:
  102. pages 329—334, August 2001.
    連結:
  103. Issues on Data Mining and Knowledge Discovery, pages 12:1—12:7, June 1998.
    連結:
  104. [64] S. Ma and J. L. Hellerstein. Mining Mutually Dependent Patterns. Proceedings of the 1st
    連結:
  105. IEEE International Conference on Data Mining, pages 409—416, November 2001.
    連結:
  106. [65] G. S. Manku and R. Motwani. Approximate Frequency Counts over Streaming Data. Proceedings
    連結:
  107. of the 28th International Conference on Very Large Data Bases, pages 346—357,
    連結:
  108. August 2002.
    連結:
  109. [66] H. Mannila and D. Rusakov. Decomposition of Event Sequences into Independent Components.
    連結:
  110. Proceedings of the 1st SIAM Conference on Data Mining, April 2001.
    連結:
  111. Rules. Proceedings of AAAI Workshop on Knowledge Discovery in Databases, pages
    連結:
  112. 181—192, July 1994.
    連結:
  113. [68] H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of Frequent Episodes in Event
    連結:
  114. [69] Y. Matias, J. C. Vitter, andM.Wang. Wavelet-Based Histograms for Selectivity Estimation.
    連結:
  115. [70] R. Meo. Theory of Dependence Values. ACM Transactions on Database Systems, 25(3):380—
    連結:
  116. 406, September 2000.
    連結:
  117. in a Data Stream Management System. Proceedings of the 2003 Conference on Innovative
    連結:
  118. Data Systems Research, January 2003.
    連結:
  119. [73] A. Mueller. Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison.
    連結:
  120. [74] R. T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining.
    連結:
  121. Proceedings of the 20th International Conference on Very Large Data Bases, pages 144—155,
    連結:
  122. September 1994.
    連結:
  123. for High-Quality Clustering. Proceedings of the 18th International Conference on
    連結:
  124. Data Engineering, pages 685—696, February 2002.
    連結:
  125. [76] J.-S. Park, M.-S. Chen, and P. S. Yu. An Effective Hash-Based Algorithm for Mining Association
    連結:
  126. [77] J.-S. Park, M.-S. Chen, and P. S. Yu. Using a Hash-Based Method with Transaction Trimming
    連結:
  127. 9(5):813—825, October 1997.
    連結:
  128. [78] J. Pei and J. Han. Can We Push More Constraints into Frequent Pattern Mining? Proceedings
    連結:
  129. Mining, pages 350—354, August 2000.
    連結:
  130. Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. Proceedings of
    連結:
  131. the 17th International Conference on Data Engineering, pages 215—224, April 2001.
    連結:
  132. [80] I. Popivanov and R. J. Miller. Similarity Search over Time-Series Data UsingWavelets. Proceedings
    連結:
  133. of the 18th International Conference on Data Engineering, pages 212—221, February
    連結:
  134. [81] A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association
    連結:
  135. [83] R. Srikant and R. Agrawal. Mining Generalized Association Rules. Proceedings of the 21th
    連結:
  136. International Conference on Very Large Data Bases, pages 407—419, September 1995.
    連結:
  137. [84] W.-G. Teng, M.-S. Chen, and P. S. Yu. A Regression-Based Temporal Pattern Mining
    連結:
  138. Scheme for Data Streams. Proceedings of the 29th International Conference on Very Large
    連結:
  139. Data Bases, pages 93—104, September 2003.
    連結:
  140. [85] W.-G. Teng, M.-S. Chen, and P. S. Yu. Resource-Aware Mining with Variable Granularities
    連結:
  141. April 2004.
    連結:
  142. [86] W.-G. Teng, M.-J. Hsieh, and M.-S. Chen. On the Mining of Substitution Rules for Statistically
    連結:
  143. Dependent Items. Proceedings of the IEEE 2nd International Conference on Data
    連結:
  144. Mining, pages 442—449, December 2002.
    連結:
  145. International Conference on Very Large Data Bases, pages 134—145, September 1996.
    連結:
  146. [88] University of Dayton and U.S. Environmental Protection Agency. Average Daily Temperature
    連結:
  147. [89] J. C. Vitter and M. Wang. Approximate Computation of Multidimensional Aggregates of
    連結:
  148. [90] K.Wang, Y. He, and J. Han. Mining Frequent Itemsets Using Support Constraints. Proceedings
    連結:
  149. of the 26th International Conference on Very Large Data Bases, pages 43—52, September
    連結:
  150. Proceedings of the 25th International Conference on Very Large Data Bases, pages
    連結:
  151. 363—374, September 1999.
    連結:
  152. Information and Knowledge Management, pages 488—495, November 2000.
    連結:
  153. [93] C. Yang, U. Fayyad, and P. Bradley. Efficient Discovery of Error-Tolerant Frequent Itemsets
    連結:
  154. in High Dimensions. Proceedings of the 7th ACM SIGKDD International Conference on
    連結:
  155. [94] J. Yang, W. Wang, P. S. Yu, and J. Han. Mining Long Sequential Patterns in a Noisy Environment.
    連結:
  156. of the 26th International Conference on Very Large Data Bases, pages 385—394,
    連結:
  157. September 2000.
    連結:
  158. Online Data Mining for Co-Evolving Time Sequences. Proceedings of the 16th International
    連結:
  159. in Real Time. Proceedings of the 28th International Conference on Very Large Data Bases,
    連結:
  160. pages 358—369, August 2002.
    連結:
  161. [1] R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. A Tree Projection Algorithm for
  162. pages 18—24, June 1998.
  163. in Large Databases. Proceedings of the 1993 ACM SIGMOD International Conference on
  164. Management of Data, pages 207—216, May 1993.
  165. [5] R. Agrawal and R. Srikant. Mining Sequential Patterns. Proceedings of the 11th International
  166. Conference on Data Engineering, pages 3—14, March 1995.
  167. of the 2000 ACM Symposium on Applied Computing, pages 294—300, March 2000.
  168. rules. Proceedings of the 1st SIAM Conference on Data Mining, April 2001.
  169. Rules with Negations. Proceedings of the 4th International Conference on Flexible Query
  170. Answering Systems, pages 425—434, October 2000.
  171. [10] S. Brin, R. Motwani, and C. Silverstein. Beyond Market Baskets: Generalizing Association
  172. Rules to Correlations. Proceedings of the 1997 ACM SIGMOD International Conference on
  173. the Management of Data, pages 265—276, May 1997.
  174. Rules for Market Basket Data. Proceedings of the 1997 ACM SIGMOD International
  175. Conference on Management of Data, pages 255—264, May 1997.
  176. A Primer. Prentice-Hall, Inc., 1998.
  177. [14] S. Chakrabarti, S. Sarawagi, and B. Dom. Mining Surprising Patterns Using Temporal
  178. [17] J. H. Chang and W. S. Lee. estWin: Adaptively Monitoring the Recent Change of Frequent
  179. Itemsets over Online Data Streams. Proceedings of the 2003 ACM CIKM International
  180. [22] Y. Chen, G. Dong, J. Han, B.W.Wah, and J.Wang. Multi-Dimensional Regression Analysis
  181. of Time-Series Data Streams. Proceedings of the 28th International Conference on Very
  182. Large Data Bases, pages 323—334, August 2002.
  183. Conference on Data Engineering, pages 106—114, February 1996.
  184. Association Rules. Proceedings of the Fifth International Conference On Database
  185. Systems for Advanced Applications, pages 185—194, April 1997.
  186. on Knowledge and Data Engineering, 13(1):64—78, January 2001.
  187. [26] G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing Data Streams Using
  188. [27] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule Discovery from Time
  189. Series. Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery
  190. and Data Mining, pages 16—22, August 1998.
  191. Windows. Proceedings of the 2002 Annual ACM-SIAM Symposium on Discrete Algorithms,
  192. [29] A. Dobra, M. N. Garofalakis, J. Gehrke, and R. Rastogi. Processing Complex Aggregate
  193. Queries over Data Streams. Proceedings of the 2002 ACM SIGMOD International Conference
  194. on Management of Data, pages 61—72, June 2002.
  195. SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 71—80,
  196. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery
  197. SIGKDD Explorations, 3(2):1—10, January 2002.
  198. Only Get One Look. Proceedings of the 2002 ACM SIGMOD International Conference on
  199. Management of Data, June 2002.
  200. Data Streams. Proceedings of the 2001 ACM SIGMOD International Conference on
  201. Management of Data, pages 13—24, May 2001.
  202. One-Pass Summaries for Approximate Aggregate Queries. Proceedings of the 27th International
  203. Streams. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge
  204. [38] S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering Data Streams. Proceedings
  205. Database. Proceeding of the 15th International Conference on Data Engineering, pages
  206. [43] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. C. Hsu. FreeSpan: Frequent
  207. Proceedings of the 2000 ACM-SIGMOD International Conference on Management of Data,
  208. [46] J. C. Hosseini, R. R. Harmon, and M. Zwick. An Information Theoretic Framework for
  209. Exploratory Multivariate Market Segmentation Research. Decision Sciences, 22:663—677,
  210. 1991.
  211. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data
  212. Time Series Data Sets Using Sketches. Proceedings of the 26th International Conference on
  213. Very Large Data Bases, pages 363—372, September 2000.
  214. 2001.
  215. [51] E. Keogh, K. Chakrabarti, S. Mehrotra, and M. Pazzani. Locally Adaptive Dimensionality
  216. SIGMOD International Conference on Management of Data, pages 151—162, May 2001.
  217. [52] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality Reduction for Fast
  218. for Evolving Databases. Proceedings of the 38th IEEE Annual Symposium on Foundations
  219. Queries with 2-Variable Constraints. Proceedings of the 1999 ACM SIGMOD International
  220. Conference on Management of Data, pages 157—168, June 1999.
  221. in a Publication Database. Proceedings of the 1st IEEE International Conference on Data
  222. [58] C.-H. Lee, C.-R. Lin, and M.-S. Chen. Sliding-Window Filtering: An Efficient Algorithm
  223. for Incremental Mining. Proceedings of ACM 10th International Conference on Information
  224. [59] C.-H. Lee, P. S. Yu, and M.-S. Chen. Causality Rules: Exploring the Relationship between
  225. Triggering and Consequential Events in a Database of Short Transactions. Proceedings of
  226. Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery
  227. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
  228. [63] H. Lu, J. Han, and L. Feng. Stock Movement Prediction and N-Dimensional Inter-
  229. Transaction Association Rules. Proceedings of the 1998 ACM SIGMOD Workshop on Research
  230. [67] H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient Algorithms for Discovering Association
  231. Sequences. Data Mining and Knowledge Discovery, 1(3):259—289, 1997.
  232. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data,
  233. pages 448—459, June 1998.
  234. [71] M. Misiti, Y. Misiti, G. Oppenheim, and J.-M. Poggi. Decomposition en ondelettes et
  235. methodes comparatives: etude d’une courbe de charge electrique. Revue de Statistique
  236. Appliquee, 17(2):57—77, 1994.
  237. [72] R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston,
  238. J. Rosenstein, and R. Varma. Query Processing, Resource Management, and Approximation
  239. Technical Report CS-TR-3515, Dept. of Computer Science, Univ. of Maryland,
  240. College Park, MD, 1995.
  241. [75] L. O’Callaghan, A. Meyerson, R. Motwani, N. Mishra, and S. Guha. Streaming-Data Algorithms
  242. Rules. Proceedings of the ACM-SIGMOD International Conference on Management
  243. of Data, pages 175—186, May 1995.
  244. forMining Association Rules. IEEE Transactions on Knowledge and Data Engineering,
  245. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data
  246. [79] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu. PrefixSpan:
  247. 2002.
  248. Rules in Large Databases. Proceedings of the 21th International Conference on Very Large
  249. Data Bases, pages 432—444, September 1995.
  250. [82] A. Savasere, E. Omiecinski, and S. Navathe. Mining for Strong Negative Associations in a
  251. Large Database of Customer Transactions. Proceeding of the 14th International Conference
  252. on Data Engineering, pages 494—502, February 1998.
  253. in Data Streams. Proceedings of the 4th SIAM International Conference on Data Mining,
  254. [87] H. Toivonen. Sampling Large Databases for Association Rules. Proceedings of the 22th
  255. Archive. http://www.engr.udayton.edu/weather/.
  256. Sparse Data Using Wavelets. Proceedings of the 1999 ACM SIGMOD International Conference
  257. on Management of Data, pages 193—204, June 1999.
  258. 2000.
  259. [92] Y.-L.Wu, D. Agrawal, and A. E. Abbadi. A Comparison of DFT and DWT based Similarity
  260. Search in Time-Series Databases. Proceedings of the 9th ACM International Conference on
  261. Knowledge Discovery and Data Mining, pages 194—203, August 2001.
  262. Proceedings of the 2002 ACM SIGMOD International Conference on Management
  263. of Data, pages 406—417, June 2002.
  264. [95] B.-K. Yi and C. Faloutsos. Fast Time Sequence Indexing for Arbitrary Lp Norms. Proceedings
  265. [96] B.-K. Yi, N. D. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris.
  266. Conference on Data Engineering, pages 13—22, February 2000.
  267. [97] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery
  268. of Association Rules. Proceedings of the 3rd ACM SIGKDD International Conference on
  269. Knowledge Discovery and Data Mining, pages 283—286, August 1997.
  270. [98] D. Zhang, D. Gunopulos, V. J. Tsotras, and B. Seeger. Temporal Aggregation over Data
  271. Streams Using Multiple Granularities. Proceedings of the 8th International Conference on
  272. Extending Database Technology, pages 646—663, March 2002.
  273. [99] Y. Zhu and D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams