自企業資料庫挖掘和彙整商情規則之研究

隨著電子商務的發展，企業面臨前所未有的全球化競爭，如何利用資訊科技創造出競爭優勢儼然成為各企業的新課題；因此，學界和實務者致力於將累積於企業資訊系統的資料，進行商情資訊的探勘和發掘。然而，目前資料探勘所使用的技術，大部分採取對資料先進行前置處理，也就是說先將企業交易資料轉出資訊系統，然後再轉換整理成一定格式的檔案，接下來，應用合適的演算法來進行資料挖掘。然而，某些研究中顯示，資料的前置處理是整個資料探勘過程中最耗費資源的部分，故本研究擬提將資料探勘技術整合入企業交易資料庫的方法，換言之，原始的交易資料直接被用以進行資料探勘，如此一來，由於資料格式轉換所耗費的資源可大幅被減低，商情資訊將能以更有效率的方式即時提供予企業管理者做出正確的判斷。本研究主要包含二種方法，第一種方法(FPN)著重於發展一個直接自企業原始交易資料表中找尋頻繁樣式的方法，進而有效率地產生關聯規則。主要特色包含：在既存的資訊系統中考量入資料轉換的前置處理，並提出一個較精簡的FPN-tree的資料結構來儲存及找尋頻繁樣式的資訊，伴以有效率的產生頻繁樣式之演算法，以幫助企業更即時快速地掌握有用資訊。除此，本方法於支持度門檻值(support threshold)調整時，不需重建資料結構，並可延伸用以找尋特定產品之頻繁樣式。第二種方法(Char)則是基於關聯式資料庫廣泛被使用於企業資訊系統，如何從關聯資料表中彙整其特徵規則，本研究提出一個利用冗餘值(Redundancy)的計算，讓企業使用者只需設定一個直覺的門檻值，即能找出該資料表的主要特徵規則；如此，決策者即能依循發掘出的規則進行各式銷售分析，以期增益企業競爭力。

關鍵字

資料探勘；企業資料庫；特徵規則

並列摘要

As data mining techniques are explored extensively, incorporating discovered knowledge into business leads to superior competitive advantages. Most data mining techniques nowadays are designed to solve problems based on transformed data files. Namely, the raw data tables should be transformed into specific formats before mining methods could be applied, and some previous works have pointed that such data transformation usually consumes a lot of resources. Therefore, this study proposes new methods which incorporate mining algorithms with enterprise transaction databases directly. In this study, two methods are proposed to discovery knowledge from raw data of Enterprise Systems. The first one, named FPN, is developed to mine frequent patterns from transaction tables. Traditionally, data mining technique has seldom being applied in real-time. However, in many cases, the decisions have to be made in a short time, such as the decisions of promoting fresh agriculture goods in retailing stores should be made daily and in the limit of one or two hours. So the FPN method has following advantages to support real-time mining performed in enterprise systems: (i) raw data of enterprise systems are used directly, (ii) when the threshold is tuned, only newly qualified data are read and the data structure built for original data is kept intact, (iii) product assortments centered on particular product can be effective performed, (iv) the performance of the mining algorithm is better than that of popular mining algorithms. The second method, Char, is proposed to find characteristics from database tables. It can be applied to find characteristics of customer tables or product tables… etc. In contrast to traditional data generalization or induction methods, the Char does not need a concept tree in advance and can generate a manual set of characteristic rules that are precise enough to describe the main characteristics of the data. The simulation results show that the characteristic rules found by Char are efficient as well as consistent regardless of the number of records and of attributes in the dataset.

並列關鍵字

frequent patterns ； data mining ； characteristic rules

參考文獻

[4] R. Agrawal, I. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of International Conference on Management of Data, 1993, pp. 207–216.

[7] R. Agrawal, R. Srikant, Fast algorithm for mining association rules in large databases, Tech. Rep. RJ 9839, IBM Almaden Research center (1994).

[9] J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Mining and Knowledge Discovery 8 (2004) 53–87.

[12] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. Verkamo, Fast discovery of association rules., in: Advances in knowledge Discovery and Data Mining., AAAI/MIT Press, 1996, pp. 307–328.

[21] J. Han, G. Dong, Y. Yin, Efficient mining of partial periodic patterns in the time series database, in: Proceedings of International Conference on Data Engineering (ICDE’99), 1999, pp. 106–115.

國際替代計量

自企業資料庫挖掘和彙整商情規則之研究

未授權

主題瀏覽