蛋白質是生物內重要的分子,許多生物反應都與其相關。當蛋白質折疊錯誤時,生物機制無法正常運作並導致疾病。如果能找出控制蛋白質形態的因子,即能對症下藥。許多研究指出蛋白質的位置與形態是由其前端的訊號序列控制,而蛋白質的生成與內質網有關。內質網會生成、運輸、篩選蛋白質,更會因蛋白質的不同有相對應的形態。因此本研究認為形態相似的內質網會有相似的蛋白質控制序列。而建構以形態將內質網影像分群的模型能幫助生物學家解析蛋白質控制序列。 本研究從內質網顯微影像中擷取亮度、紋理、骨架、亮點的23種特徵,並用逐步區別分析法從23種特徵中找顯著特徵。再以階層式分析法找出分群依據,並利用決策樹從分群依據中創建分類規則。目前已能將14組分入兩大群集中,其正確率都有70%以上。利用Ten-folds cross-validation和Leave-one-out cross-validation等驗證法驗證分類規則,並以不同set的四組內質網影像為測試組。驗證中,除了不同set的其中一組分類錯誤,其他三個組別測試和兩個驗證法正確率皆達80%以上。此架構能將內質網影像大略的分類,幫助研究人員比對序列。
Protein are important molecules within organism. There are many bio-reactions associated with its. If protein is unfolding or misfolding, the bio-reaction won’t exercise normally than creating disease. Therefore, finding out the control factor of protein’s morphology can help people get the resolving of protein problem. There are many researches find out that there is a signal peptide in front of a protein to decide its location. We also know that the generation of protein relate with endoplasmic reticulum(ER). ER produce, transport , and select protein. Beside, ER’s morphology is corresponding with its. So, The research is building a model to group ER image according morphology. There are 23 feature associate with intensity, texture, bright area, and skeleton. Use SDA to find out significant feature. After SDA, use hierarchical cluster analysis to get a grouping foundation. The foundation can help decision tree create the classified model. The model we created can group 14 group into 2 cluster. Those accuracy are higher than 70%. Results of two validation show great accuracy by Ten-folds cross-validation and Leave-one-out validation. Use different set’s 4 group test model. Only one group is classified to wrong cluster which isn’t belong to two clusters. Another 3 group are classified to right cluster and accuracy higher than 80%. Within this model, ER image can be classified in primary. It can help biologist comparing sequence more quickly than before.