透過您的圖書館登入
IP:18.191.171.235
  • 學位論文

基於隨機子集之穩健監督式降維法

A Robust Random Subspace-based Supervised Dimension Reduction Method

指導教授 : 洪弘

摘要


癌症的精准預測在近幾年發展出許多不同方法,由於資料型態多為高維度資料,必須先將維度降低以利分析,一個基於隨機子集的局部和全局保持的降維法利用隨機子集去建構出半監督式模型,然而,其採用人為的參數設定直接將監督式與非監督式的資訊加起來,再建構出拉普拉斯矩陣表示出資料點間的關係,在此種人為設定下,該參數並無選擇標準,僅能依不同的處理經驗去做設定,因此參數設定高度影響各種資料型態的準確率。 在本篇研究中,為了解決參數無法固定的狀況,本研究改良出另一穩健隨機子集監督式降維法RRS-SDR,改以利用伽馬邏輯斯回歸(r–logistic Regression)直接估計該資料點被分為某一類別的機率,再計算兩資料點被分為同類的機率,並代入拉普拉斯矩陣中,以此取代需要比例混合參數的半監督式學習演算法,此外,對於有錯誤標記的資料集,RRS-SDR也有較佳的分類表現。

並列摘要


Precise cancer classification developed various methods in these years. Because of the high-dimensional data type, dimensionality reduction is an essential preprocessing tool. A local and global preserving semi-supervised dimensionality reduction based on random subspace algorithm (RSLGSSDR) utilized random subspace for semi-supervised dimensionality reduction. It used tuning parameter to combine the information between the supervised and the unsupervised parts, constructing Laplacian matrix which connects the relationship between each data point. Whereas this tuning parameter did not have selecting principle, the characteristic of datasets could be diverse. Thus, it highly influenced the classification accuracy. In this thesis, to solve the instability of the tuning parameter, we developed Robust Random Subspace-based Supervised Dimension Reduction method (RRS-SDR). We utilized r–logistic regression to estimate the label probability, and then calculated the probability of two data points which are regarded as the same class. By substituting the probability into Laplacian matrix, we replaced semi-supervised learning with our new method. We showed that RRS-SDR has superior classification performance on mislabel datasets.

參考文獻


Alladi, S. M., P, S. S., Ravi, V., & Murthy, U. S. (2008). Colon cancer prediction with genetic profiles using intelligent techniques. Bioinformation, 3(3), 130-133.
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When Is “Nearest Neighbor” Meaningful? Paper presented at the Database Theory — ICDT’99, pp. 217-235, Berlin, Heidelberg.
Cevikalp, H., Verbeek, J., Jurie, F., & Klaser, A. (2008). Semi-supervised dimensionality reduction using pairwise equivalence constraints. In: Proc. VISAPP 2008, pp. 489–496.
Donoho, D. L., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591-5596.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M.; Mesirov, J. P. et al. Science (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), 531-537.

延伸閱讀