因資訊科技的進步,加上近幾年網路通訊技術的發展,經過日積月累之後,資料庫的資料量變得相當龐大,加上許多新的電腦分析工具問世,使得從資料中發掘寶藏成為一種系統性且可實行的程式,更使得資料探勘(Data Mining)成為近年來資料庫應用領域中相當熱門的工作。決策樹是目前常見的資料探勘技術,透過決策樹的使用,可將眾多的資料進行資訊的轉換且產生可供決策的規則。 本研究所選用的資料是取自台灣經濟新報資料庫(Taiwan Economic Journal Data Bank, TEJ),摩根成份股中的一十三個產業,共計二十八家上市櫃公司於2007年9月至2008年4月止的技術指標及交易量資料作為研究對象與範圍。本研究採用資料探勘(Data Mining),前置處理中的標準化(Normalizations)三種方法作為本研究的資料整備;分別為十進位正規化(Decimal Scaling Normalization ) ; 極值正規化(Min-Max Normalization) ;標準差正規化(Standard Deviation Normalization) 三種,經由分類迴歸樹(Classification and Regression Tree, CART),比較三種不同的前置處理的方法瞭解在利用不同的正規化的方法下,依分類迴歸樹的分類標準找出技術指標及交易量價情形下,股價漲跌的幅度,是否有不同的結果,以做為投資時參考。 由實證分析中得知:十進位正規化的預測結果用單因子變異數分析時其P值=0.475742;得知十進位正規化對各期間預測結果沒有差異性。極值正規化用單因子變異數分析時其P值=0.004526;極值正規化對各期間預測結果有顯著差異。標準差正規化在單因子變異數分析時其P值=0.012564;標準差正規化對各期間預測結果亦有顯著差異。尤其以第二天在與第四天、第五天最為顯著。
Thanks to the enormous and adequate databases accumulated from the past years, and the invention of computer based analyzing tools, it has become realistic and practicable process to seek valuable information from these databases and has thus made Data Mining a popular way in this field. Decision Tree is currently a common method of Data Mining, with which numerous data are converted to generate us able rules for decision making. Here in this study adopts from the Taiwan Economic Journal Data Bank(TEJ), the information of totally 28 listed companies in 13 industries of Morgan , taking their tech-index and transaction volume during Sep of 2007 and April of 2008 as the study object and scope. By Data Mining, the study takes the 3 pretreatment methods of Normalization for the information preparation, the 3 methods are: Decimal Scaling Normalization(DSN); Min-Max Normalization(MMN); Standard Deviation Normalization (SDN); by the means of Classification and Regression Tree (CART) it figures out how much the shares rise and fall on different technical index and transaction volume standard by different normalizations and finally educe a conclusion to be used as a reference for the stock investment . It can be learned from the above demonstration that, value of P is 0.475742 when DSN result is analyzed by One Way ANOVA, meaning no deviation; and value of P turns to be 0.004526 when MMN is analyzed, meaning a notable deviation, and SDM makes P value of 0.012564, SDM has notable deviations among the predictions made in all time-periods, especially that on the 2nd day ,the 4th and the 5th day.