以廣義高斯函數成份為基礎之密度估計演算法及其相關應用

在常見的分類演算法中，邏輯式分類器主要提供使用者可解讀之分類模型，而核心函數式分類器則主要追求精確的分類預測結果。在此篇論文中，我們主要提出一以廣義高斯函數成份為基礎之密度估計演算法(簡稱為高斯密度估計算演算法)，利用少數個廣義不受限的高斯函數成分建構出的分類模型，來縮小邏輯式分類器及核心式分類器之間在預測效能及可解讀特性上的差距。高斯密度估計算演算法所建構出的分類模型其最顯著的優點在於藉由分析其中高斯函數成分所對的共變異數矩陣之特徵向量及特徵值，使用者可以有效看出資料分佈的特性及概括狀況以利使用者進行更深入的資料分析。此篇論文中，我們也提出如何將高斯密度估計算演算法分類模型的學習程序參數化為參數最佳化問題之方法，並藉由一高效率的位階基礎適應性突變演化式方法求解最佳的分類模型。經由將高斯密度估計演算法分類模型應用於人工合成及實際案例的測試資料後，可由實驗結果看出高斯密度估計算演算法僅利用少數高斯函數成分做為核心函數建構出的分類模型能提供比一般邏輯式分類器或期望值最佳化式的分類器更為準確的分類預測結果。此外藉由實驗結果亦可知高斯密度估計算演算法所建構出的分類模型確實幫助使用者更能了解現有資料分佈的特性及概括狀況。在此篇論文的最後，我們利用所提出的高斯密度估計演算法進一步的建構出兩階段的高斯密度估計演算法架構，並將此方法應用於辨識微型核糖核酸的應用問題之中，藉由與數個常見分類演算法的比較實驗證明，本論文方法對於微型核糖核酸的分類辨識能同時得到:1) 準確預測及 2) 提供有價值可解讀的分類模型之結果。

關鍵字

機器學習；高斯函數混合模型；分類演算法；演化式計算；密度估計演算法；微型核糖核酸

並列摘要

In this thesis, aiming to close the gap between interpretability of the logic based classifiers and the accurate prediction of kernel based classifiers, we propose a generalized Gaussian Density Estimation algorithm (G2DE), to carry out density estimation based on a mixture model composed of a limited number of generalized Gaussian components. One of the most distinct features of the classifier constructed with the proposed approach is that users can easily obtain an overall picture of the distributions of the data set by examining the eigenvectors and eigenvalues of the covariance matrices associated with the generalized Gaussian components. The learning process of the proposed method are parameterized and modeled as a large-parameter-optimization problem solved by an efficient rank-based adaptive mutation evolutionary approach. Experiments on standard benchmarks and synthesized data show that the proposed G2DE, with just a few number of kernels, outperformed the conventional logic based classifiers and the EM (Expectation Maximization) based classifier in terms of prediction accuracy. Furthermore, the proposed classifier enjoys a major advantage that it provides users an overall picture of the underlying distributions. An application of the proposed method on identifying microRNA precursor from pseudo hairpins is also demonstrated in this study. By developing a two-stage framework of the G2DE, it is shown that the proposed G2DE provides i) accurate prediction results comparable to several well-known classifiers and ii) valuable information to interpret the data.

並列關鍵字

Machine Learning ； Gaussian Mixture Model ； Classification Algorithm ； Evolutionary Computing ； Kernel Density Estimation Algorithm ； MicroRNA

參考文獻

[1] A. K. Jain, R. P. W. Duin, and J. C. Mao, "Statistical pattern recognition: A review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 4-37, 2000.

[2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2 ed: John Wiley & Sons, 2001.

[3] I. H. Witten and E. Frank, Data mining : practical machine learning tools and techniques, 2nd ed. Amsterdam ; Boston, MA: Morgan Kaufman, 2005.

[4] C. M. Bishop, Pattern recognition and machine learning. New York: Springer, 2006.

[5] C. Neocleous and C. Schizas, "Artificial neural network learning: A comparative review," Methods and Applications of Artificial Intelligence, vol. 2308, pp. 300-313, 2002.

國際替代計量

以廣義高斯函數成份為基礎之密度估計演算法及其相關應用

全文下載

主題瀏覽