  • 學位論文


Application of multiple logistic regression to select the CHAID classification in terms of comparative analysis

指導教授 : 陳雲岫


顧客關係管理(Customer Relationship Management, CRM) 系統是指包含每位顧客的基本資料及互動歷史紀錄及合成的顧客群資料庫,而系統的使用者可以透過一些分類手法例如資料探勘(Data Mining)、決策樹(Decision Tree) 等找尋資料庫中有價值的資訊。其中分類樹演算法在資料庫分析中佔有一席重要的地位,而多數的論文大都是利用一組資料庫資料,使用類似的分類樹演算法去比較哪種分類樹演算法的成效較佳,鮮少針對分類條件來探討。所以本研究的目的希望能夠提出一個能夠在資料庫分類結束後用來評量每個分類條件中子集合的基準模式。 因此本研究選用的分類樹演算法為CHAID分類樹演算法,原因在於CHAID能夠一次形成多個決策點,比較貼近實際情形。接著藉用多元邏輯斯迴歸求出期望機率π(x),利用π(x)評量在同一分類條件下那個子集合最符合分析人員的期待。 最後我們可以了解在分類後使用期望機率時,因為會考慮到兩個分類條件之間的相關性,故能比未使用期望機率更迅速達到目標資料筆數。


Customer Relationship Management (Customer Relationship Management, CRM) system that contains basic information on each customer interaction and synthesis of historical records and database customers, and systematic classification of users through a number of methods such as data mining (Data Mining), decision tree (Decision Tree) to find the database and other valuable information. Which the decision tree analysis of algorithms in the database occupies an important position, while the majority of the papers are the use of a database, use a similar decision tree algorithm to compare the decision tree algorithms which achieve better results, fresh less to discuss the conditions for classification. The purpose of this study therefore hopes to be able to propose a classification of the end of the database used to assess the conditions of each classification in a subset of the benchmark model. Therefore, we choose the CHAID decision tree algorithms in this study, because CHAID can be due to the formation of a number of policy points, compared close to real-world scenarios. Followed by multiple logistic regression used to derive expectations of probability π(x),π(x) to assess the use of classification in the same subset under the conditions that most analysts expectations. Finally, we can understand the use of expectations in the expectations of probability, because the two categories will take into account the correlation between the conditions, it can not use more than likely look forward to more rapidly achieve the target amounts of information.


Dominique Hanghton and Samer Oulabi(1997),“Direct marketing modeling with CART and CHAID”, Journal of Direct marketing, Vol.11 N0.4, pp-42-52.
Ful-Chiang Wu and Chi-Hao Yeh(2006), “A comparative study on optimization methods for experiments with ordered categorical data”, Computers & Industrial Engineering No.50, pp220–232.
G.V. Kass(1980), “An Exploratory Technique for Investigating Large Quantities of Categorical Data”, Applied Statistics, Vol.29 No.2, pp-119-127.
Liem Ferrysnto, “Analyzing Experiments with Ordered Categorical Data”, available at: http://www.isixsigma.com/library/content/c080804.asp .
V.N.Nair(1986), “Testing in Industrial Experiments With Ordered Categorical Data”, Technometrics, Vol.28 No.4, pp 283-291.
