近年來由於質譜技術的快速成長,使得蛋白質體學在生物醫學上對於癌症之診斷、檢驗等逐漸受到重視。但是,質譜技術在分類工作中往往受到資料維度過高與雜訊干擾所困擾,所以蛋白質體分析的前置處理在本文中是很重要的一環。本文主要探討的癌症是肝癌與卵巢癌,其原始資料皆由表面強化雷射解析電離飛行質譜技術 (surface enhance laser desorption/ionization time-of-flight mass spectrometry)所產生,其中肝癌的原始資料先經由Ciphergen ProteinChip Software分析,再加上本文提出ㄧ種特徵向量產生的方法處理後,能有效的解決維度過高與雜訊干擾的問題。由於卵巢癌的原始資料過於複雜,所以降低質譜的高維度並從中擷取出有意義的特徵峰值便成為本文另一個研究目標,特徵峰值選取的方法諸如峰點偵測、質譜校準。最後再將這些篩選過後的特徵峰值經由類神經網路來做分類,辨識效果皆可達到90%以上。未來之研究,可考慮結合其他特徵選取方式,更進一步縮減維度甚至提高分類準確度。
Recently, the mass spectrometry is developed with a fast rate. So the proteomics are applied to the classification and diagnosis of cancer has more respect. However, the classified proteomic data isn’t gotten easily, because of high dimension and noise. So, to analyze the data preprocess of proteomics is a very important part in this thesis. In this study, the cancer we discussed with are hepatocellular carcinoma and ovarian cancer. Their original samples were produced by surface enhance laser desorption/ionization time-of-flight mass spectrometry. And the original samples of hepatocellular carcinoma were analyzed by Ciphergen proteinChip Software first. And then, we could solve the problem, high dimension and noise, effectively by using the method that was proposed in this thesis to find the eigenvectors. Because the original samples of ovarian cancer are complex, the goal of this thesis is to reduce the high dimensionality of the mass spectrometry and to extract the significant peak-features for further study. The methods such as peak detection and spectra alignment are used for feature extraction. Finally, classifing the sifted significant peak-features by neural network and the rate of recognition can achieve ninety percent. In the future study, we could consider combining with other feature selection ways to reduce the number of dimensions and to increase the accuracy of classification.