透過您的圖書館登入
IP:3.21.233.41
  • 學位論文

以蒙地卡羅方法量測線性分類器下資料之可驗證度

Measuring Verifiability using Monte Carlo Methods for Linear Separators

指導教授 : 于天立

摘要


可驗證度是以一種新的觀點討論機器學習,它代表一個未知的資料在最糟情況下被分類器預測錯誤的可能性。目前為止,尚沒有有效的方法去量測線性分類器的可驗證度,即使線性分類器算是相對簡單的模型,由於維度災難,量測其可驗證度依舊是個困難的目標;所以,我提出了數種蒙地卡羅方法去解決這個問題,並精準快速地量測資料的可驗證度。 本論文中,不同的方法針對不同的情境所設計,目標是為了滿足不同使用者的需求。最大精確與泛化邊界量測法可藉由找出版本空間的邊界,在低維度空間量測可驗證度;合成凸包量測法可在多項式時間內準確的量測可驗證度;局部性學習估計法即使在維度跟資料數量都很多的情況下,也可加快量測可驗證度的時間;同時,我亦根據實驗結果,提出一個決策流程,幫助使用者選出最好的方法,提昇這些方法的可應用性。實驗結果顯示,我的方法可以在高維度的空間中快速準確地量測可驗證度,這表示我的方法克服了維度災難並達成了量測可驗證度的目標。

並列摘要


The verifiability, which represents the probability that an unknown instance is wrongly classified in the worst case, is a novel perspective of learning. Nevertheless, there is no effective way to measure the verifiability for linear separators. Even if the hypotheses are linear separators, measuring the verifiability is a challenging task due to the curse of dimensionality. Therefore, I propose several Monte Carlo methods to deal with the problem and aim to measure the verifiability accurately and efficiently. Different methods proposed in the thesis are designed for different situations and can meet various users' needs. Most specific and general boundaries measurement can measure the verifiability in the low-dimensional space by specifying the version space. Synthesized convex hull measurement has the ability to measure the verifiability accurately in polynomial time in terms of the dimensionality and the number of labeled instances. Estimated measurement by locality learning estimates the verifiability efficiently even if the dimensionality and number of labeled instances are large. A decision making process is also given to help users to select the most suitable method. It also increases the applicability of the methods in this thesis. Experiment results show that my methods obtain the verifiability correctly and quickly with some test cases in the high-dimensional space. It indicates that my methods have the ability to conquer the curse of dimensionality and measure the verifiability well.

參考文獻


Reference
1 D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4):671–687, 2003.
2 D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.
3 C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
4 D. R. Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), 20(2):215–242, 1958.

延伸閱讀