透過您的圖書館登入
IP:18.218.129.100
  • 學位論文

基於中國餐廳過程之貝氏探索分群法

Bayesian Exploratory Clustering Based on Chinese Restaurant Process

指導教授 : 李嘉晃 劉建良

摘要


隨著科技日新月異,資料的數量也有著爆炸性的成長,因此資料的分群問題也逐漸變得重要。我們無法再透過人工的方式處理如此龐大數量的資料,所以必需利用電腦自動化完成人力所無法達成的,如此既快速、經濟又節省人力。本論文提出一個貝氏無母數非監督式學習法(Bayesian Non-parametric unsupervised Learning)。貝氏無母數方法在分群時不必事先決定資料群數,而是在分群的過程中讓資料自行決定需要分的群數。並且引用統計學裡的切比雪夫不等式 (Chebyshev's inequality) 的概念來調整先驗分布 (Prior) 的參數,使分群效果更佳。因此本論文延伸中國餐廳過程(Chinese Restaurant Process, CRP)進一步加入一個以統計觀點為基礎的計算方法,使分群的效果更為提升。此外,本論文在調整群數方面,提出了一個不同於原始 CRP 的方法,在決定資料點屬於哪一群時,若現有的資料群皆不適合該資料點時,則該資料點自成一群。最後實驗結果顯示本論文提出的方法表現優於其他非監督式學習法。

並列摘要


In big data era, data explorations is essential to data analytics, since it can provide data insight for the analysts. Therefore, data clustering plays an important role nowadays. This thesis proposes a Bayesian non-parametric unsupervised learning, in which the number of clusters does not need to be given before clustering. The proposed method let data speak by themselves, and the number of clusters is determined by observed data automatically. We use the concept of Chebyshev's inequality to set the prior parameters to yield better clustering results. Besides, this work proposes a novel way to create a new cluster based on entropy. The main difference between the proposed method and Chinese Restaurant Process is determined by existing clusters rather than a hyperparameter. The experimented results show that the proposed algorithm outperforms other unsupervised learning algorithms.

參考文獻


[7] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.
[16] Frigyik, Bela A., Amol Kapila, and Maya R. Gupta. "Introduction to the Dirichlet distribution and related processes." Department of Electrical Engineering, University of Washignton, UWEETR-2010-0006 (2010).
[1] Joachims, Thorsten. Text categorization with support vector machines: Learning with many relevant features. Springer Berlin Heidelberg, 1998.
[2] Ng, Hwee Tou, Wei Boon Goh, and Kok Leong Low. "Feature selection, perceptron learning, and a usability case study for text categorization." ACM SIGIR Forum. Vol. 31. No. SI. ACM, 1997.
[3] Hosmer Jr, David W., and Stanley Lemeshow. Applied logistic regression. John Wiley & Sons, 2004.

延伸閱讀