透過您的圖書館登入
IP:3.15.202.214
  • 期刊
  • OpenAccess

Data Mining and Hotspot Detection in an Urban Development Project

並列摘要


Modern statistical analysis often involves large amount of data from many application areas with diverse data types and complicated data structures. This paper gives a brief survey of certain large-scale applications. In addition, this paper compares a number of data mining tools in the study of a specific data set which has 1.4 million cases, 14 predictors and a binary response variable. The study focuses on predictive models that include Classification Tree, Neural Network, Stochastic Gradient Boosting, and Multivariate Adaptive Regression Splines. The study found that the variable importance scores generated by different data mining tools exhibit wide variability and that the users need to be cautious in the applications of these scores. On the other hand, the response surfaces and the classification accuracies of most models are relatively similar, yet the financial implications can be very profound when the models select the top 10% of cases and when the cost and profit are incorporated in the calculation. Finally, the Decision Tree, Predictor Importance, and Geographic Information Systems (GIS) are used for Hotspot Detection to further enhance the profit to 95.5% of its full potential.

延伸閱讀