蛋白質為構成生物體非常重要的基本分子,不同的蛋白質會影響細胞的功能內容與表現,更重要的是,蛋白質需要在對的時間出現在對的地點才能發揮作用。決定蛋白質作用位置的因子為其前端的訊號序列,但是訊號序列很短不容易由基因序列分析得比較觀察到;有研究指出,由於基因序列相近時,其蛋白質分佈亦相似,本研究認為藉由蛋白質分佈影像的分類,將具有類似序列結構的蛋白質依其分佈影像進行粗分,再進行序列比對分析,可以成為找到目標序列更有效的方法。本研究針對內質網蛋白質影像,以數位影像處理技術取得內質網結構特徵,建構基於內質網基因種類的分類系統,期望讓研究人員在分析蛋白質序列前,能夠先獲得分佈形態的粗分結果。 本研究架構的系統利用內質網蛋白質的原始影像、骨架影像和較亮區塊的影像,可以由影像的紋理特徵、外圍網狀特徵及鑲有核醣體的囊狀構造特徵擷取總共23種的內質網影像特徵。在得到所有特徵之後,以SDA找出最佳特徵組合,並利用SVM建立分類模型及未知組別。比較所有訓練組合的特徵選取,遍佈於紋理特徵、網狀骨架特徵及明亮區塊形態特徵,表示本系統所擷取的特徵對於內質網分類是有意義的。目前本系統可以達到訓練已知影像的準確率為93.4%、包含21.4%的未知影像,測試影像的準確率為86.8%、包含26.5%的未知影像,與不加入未知組別的結果相比較,準確率提高了7%左右,而且未知組別在30%內,可以幫助後端研究人員大幅減少分析需要的時間。
Protein is an important factor for maintaining the normal function of creature. The cell’s functions will be different with different proteins, and the proteins only work when they are in the right place at the right time. There is a signal peptide in the front of a protein to decide its location. The sequence of the signal peptide is very short and hard to find only by sequence analysis, so the analysis of protein distribution in the image will be a good way to do a simple classification of protein functions. The structure of endoplasmic reticulum is highly relative with the cell’s function, and the signal peptide might be found by analysis of ER distribution. The research is going to present a system with digital image processing, feature acquisition and classified model building. The texture features are acquired first from original image, and the image processing steps are comprised with skeletonized image and brighter-area image, to extract the features of network structure and ribosome-studded sheet structure. There are total 23 features, and except one feature is unused for all kinds of training set, all other features are meaningful for ER classification. The result shows the features this system acquired are useful for ER classification. The best accuracy of classification is 93.4% for training set, including 21.4% images in unknown group, and 86.8% for testing set, including 26.5% images in unknown group. Compare the accuracy with unknown group; it’s about 7% higher than the one without unknown group.