透過您的圖書館登入
IP:3.21.76.0
  • 學位論文

設計最佳化演算法預測蛋白質功能和辨認神經細胞影像

Designing Optimization Methods to Predict Protein Functions and Recognize Neuron Images

指導教授 : 何信瑩

摘要


The massive growth of protein sequence and neuron image datasets leads to the need of computation-based methods to predict and analyse their biological functions. To predict protein functions and recognize neurons images, machine-learning-based classifiers are regularly suggested. In present, the desired predictor of protein functions should provide both prediction efficiency and knowledge discovery. Meanwhile, the identification of informative features for recognizing neuron images is not easy due to a large number of available image features. This dissertation develops optimization methodologies for both predicting protein sequences and recognizing neuron images based on an intelligent genetic algorithm (IGA). The scoring card method (SCM) is a simple and highly interpretable method for prediction and analysis of protein functions. The SCM calculates dipeptides propensity scores of an interested protein function from the difference of dipeptide compositions between positive and negative sequences. The propensity scores of 400 dipeptides are optimized by IGA to enhance prediction accuracy while conserving the original characteristics of amino acid composition. A sequence score is derived by utilizing these propensity scores to predict its protein function. Two SCM-based methods, SCMSOL and SCMCRYS, are proposed for prediction and analysis of protein solubility and crystallizability, and their tests accuracies are 84.3% and 76.1%, respectively, which are comparable to the support vector machine based methods using the same dipeptide composition features. Moreover, the biological knowledge discovery and mutagenesis analysis for soluble and crystallizable proteins from the propensity scores are illustrated. The procedure of developing SCM-based methods for protein function prediction can also be applied to design other methods for predicting protein functions with high prediction performance and high interpretable results. This dissertation also presents an automated neuron image feature identification system (Auto-NIFI) which is a user-friendly tool for automatically extracting and identifying a small set of informative neuron image features utilizing an inheritable bi-objective combinatorial genetic algorithm (IBCGA). The feature selection of Auto-NIFI allows biologists to construct a suitable classifier for particular neuron image classification problems. To identify neuron image features, Auto-NIFI provides a comprehensive set of image feature extraction modules together with the IBCGA feature selection modules. Notably, according to the huge collection of image feature extraction modules available in this tool, this system is also capable of applying to a wide variety of biological image classification problems. Two methods, HCS-Neurons and DescNeuro, are proposed for neuron image classification. In the HCS-Neurons method, the usefulness of Auto-NIFI is demonstrated in identifying phenotypic changes in multi-neuron images upon response to drug treatments of high-content screening. The identified three features of morphology were able to achieve an independent accuracy of 90.28% for recognizing neurons into six classes corresponding to six different nocodazole drug concentrations. By using the Auto-NIFI, DescNeuro can recognize a neuron in the 3D Drosophila neuron database from a 2D image with promising recognition results.

並列摘要


The massive growth of protein sequence and neuron image datasets leads to the need of computation-based methods to predict and analyse their biological functions. To predict protein functions and recognize neurons images, machine-learning-based classifiers are regularly suggested. In present, the desired predictor of protein functions should provide both prediction efficiency and knowledge discovery. Meanwhile, the identification of informative features for recognizing neuron images is not easy due to a large number of available image features. This dissertation develops optimization methodologies for both predicting protein sequences and recognizing neuron images based on an intelligent genetic algorithm (IGA). The scoring card method (SCM) is a simple and highly interpretable method for prediction and analysis of protein functions. The SCM calculates dipeptides propensity scores of an interested protein function from the difference of dipeptide compositions between positive and negative sequences. The propensity scores of 400 dipeptides are optimized by IGA to enhance prediction accuracy while conserving the original characteristics of amino acid composition. A sequence score is derived by utilizing these propensity scores to predict its protein function. Two SCM-based methods, SCMSOL and SCMCRYS, are proposed for prediction and analysis of protein solubility and crystallizability, and their tests accuracies are 84.3% and 76.1%, respectively, which are comparable to the support vector machine based methods using the same dipeptide composition features. Moreover, the biological knowledge discovery and mutagenesis analysis for soluble and crystallizable proteins from the propensity scores are illustrated. The procedure of developing SCM-based methods for protein function prediction can also be applied to design other methods for predicting protein functions with high prediction performance and high interpretable results. This dissertation also presents an automated neuron image feature identification system (Auto-NIFI) which is a user-friendly tool for automatically extracting and identifying a small set of informative neuron image features utilizing an inheritable bi-objective combinatorial genetic algorithm (IBCGA). The feature selection of Auto-NIFI allows biologists to construct a suitable classifier for particular neuron image classification problems. To identify neuron image features, Auto-NIFI provides a comprehensive set of image feature extraction modules together with the IBCGA feature selection modules. Notably, according to the huge collection of image feature extraction modules available in this tool, this system is also capable of applying to a wide variety of biological image classification problems. Two methods, HCS-Neurons and DescNeuro, are proposed for neuron image classification. In the HCS-Neurons method, the usefulness of Auto-NIFI is demonstrated in identifying phenotypic changes in multi-neuron images upon response to drug treatments of high-content screening. The identified three features of morphology were able to achieve an independent accuracy of 90.28% for recognizing neurons into six classes corresponding to six different nocodazole drug concentrations. By using the Auto-NIFI, DescNeuro can recognize a neuron in the 3D Drosophila neuron database from a 2D image with promising recognition results.

參考文獻


1. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, et al. (2013) A large-scale evaluation of computational protein function prediction. Nat Methods 10: 221-227.
2. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-3402.
3. Al-Shahib A, Breitling R, Gilbert DR (2007) Predicting protein function by machine learning on amino acid sequences - a critical evaluation. Bmc Genomics 8.
4. King RD, Karwath A, Clare A, Dehaspe L (2000) Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining. Yeast 17: 283-293.
5. Jensen LJ, Skovgaard M, Brunak S (2002) Prediction of novel archaeal enzymes from sequence-derived features. Protein Sci 11: 2894-2898.

延伸閱讀