資料挖礦於生物晶片資料分析之研究

生物科技和生物晶片的研究和應用在過去十年有非常蓬勃的發展，其中最顯著的成果包含生物晶片（bio-chip）之基因微陣列技術（microarray）及基因選殖技術（gene cloning）的突破，其中資訊科技的進展能力功不可沒。然而許多衍生的資料處理和分析問題亟待克服，特別是生物晶片資料變數多而樣本數少的問題。本研究目的係針對生物晶片資料的特性，發展生物晶片資料挖礦（Data Mining）方法和模式藉以探索與尋找疾病和特定基因的關係，並建構其規則；再設計生物晶片資料的集群分析（Cluster Analysis）演算法，建構基因組（genome-wide）關係，從中萃取有價值之資訊，以作為醫療診斷決策支援參考。本研究採用史丹佛大學晶片資料庫中乳癌晶片資料驗證研究效度，從四萬多個基因與64個樣本當中，使用顯著性分析（Significant Analysis of Microarray）與決策樹分析（Decision Tree）挖掘出具影響力的基因及診斷決策規則；並以乳癌晶片基因資料提供完整基因組集群分析結果。

關鍵字

生物晶片；資料挖礦；決策樹分析；集群分析；基因微陣列技術；顯著性分析

並列摘要

Owing to increasing breakthroughs for microarray in biochips and gene cloning technologies, biotechnology is now an emergent and promising industry worldwide. Although information technology advancements enable complex calculation and comprehensive data storage involved in biotechnology, a number of critical issues need to be addressed for both practice and research needs. This study aims to develop a data mining framework within a proposed cluster analysis algorithm for analyzing huge bio-chip data that are different from the data addressed in manufacturing and service industries. Bio-chip data that consists of high-dimensional attributes have more attributes than specimens. Feature selection and extraction is critical to remove noisy features and reduce the dimensionality in microarray analysis. In particular, specific genes between normal and abnormal individuals are extracted in decision rules to clarify the relationships among genes and diseases; the relationship of in-group and with-group among genes is needed to be built up. We adopt the breast cancer patient cDNA microarray dataset for validating the proposed approach. We firstly extracted significant genes from more than 44,000 genes and then use decision tree to derive classification rules, and use the proposed algorithm to build up cluster relationship by displaying table list to support medical diagnosis and reference. The results showed practical viability of this framework.

並列關鍵字

Bio-chip ； Data Mining ； Decision Tree Technology ； Cluster Analysis ； Microarray Technology ； Significant Analysis of Microarray

參考文獻

簡禎富、李培瑞、彭誠湧（2003），「半導體製程資料特徵萃取與資料挖礦之研究」，資訊管理學報，第10卷，第1期，頁63~84。

簡禎富、林國勝（2006），「建構cDNA生物晶片之二元資料挖礦模式及其實証研究」，資訊管理學報，第13卷，第4期，頁133-159。

Baldi, P., and S. Brunak (2004), Bioinformatics: The Machine Learning Approach, The MIT Press, London.

Berry, M. J., and G. S. Linoff (2004), Data Mining Techniques for Marketing Sales, and Customer Relationship Management, Wiley Publishing Inc., Indianapolis.

Chen, X. (2003), “Gene Selection for Cancer Classification Using Bootstrapped Genetic Algorithms and Support Vector Machines,” Proceedings of the 2003 IEEE Bioinformatics Conference, pp. 504-505.

國際替代計量

資料挖礦於生物晶片資料分析之研究

未授權

主題瀏覽