  • 期刊
  • OpenAccess


Sampling Method for Population Weighting in Data Mining Techniques


在實施資料採礦時,如果能處理全部完整的資料,對於分析人員而言是最理想的一件事。只是在現實世界裡,常會有所限制,或為經費所限、或為時間、甚至軟硬體本身限制,當在限制之下,抽樣應當是兩邊兼顧折衷的最佳方法。抽樣的優點在於降低分析成本與縮短分析時間,處理較少數量的資料自然比大量資料的費用低,且時間短,對於講求效率的商業活動而言,更加可以考慮。另外藉由嚴密設計的抽樣方法會更增進調查結果的精確性,並在控制抽樣誤差下,可消減資料本身的偏誤。本文乃應用SAS Macros(巨集)工具建構資料庫應用抽樣方法中三個重要問題:分層隨機抽樣、整數擴大係數、求算各種等分位截斷點。


Sometimes we are restricted by funds, time or even hardware and software of computer in implementing data mining. In this limitation, sampling method is a best solution for dealing with the large number of data or database. The advantage of sampling method dealing with fewer data is reducing the cost and time while analyzing and raising the efficiency of commercial activities. Through the rigorous design of sampling method and steps, in addition, it will derive the more precision results and conclusions. Also it will decrease bias from data itself under controlling the sampling error. In this study we provide a tool based on SAS Macros to solve 3 important issues in sampling method. They are stratified random sampling, weighting coefficient, and computing the equal stratified point of percentile.


Berry, Michael J.A.,Linoff, Gordon(1997).Data Mining Techniques for Marketing, Sales and Customer Support.John Wiley & Sons, Inc.
Cabena, P.(1997).Discovering data mining: from concept to implementation.Upper Saddle River, NJ.:Prentice Hall, Inc..
Fayyad, U.,Piatetsky-Shapiro, G.,Smyth, P.(1996).From data mining to knowledge discovery in databases.AI Magazine.Fall,37-54.
Hand, D.J.(1998).Strategy, methods, and solving the right problem.Computational Statistics.13,5-14.
SAS Institute Inc(1995).SAS Language: Reference, Version 6.NC, USA:Cary.
