透過您的圖書館登入
IP:18.222.69.152
  • 學位論文

由基因表現之時間序列資料探勘基因群集之調節關係

Mining Regulation Relationships between Gene Clusters by Using Time-Series Gene Expression Data

指導教授 : 李瑞庭

摘要


基因表現之時間序列資料分析能夠找出基因與基因群集之間的調節關係。然而,現有的分析方法如event method和q-cluster method都有其限制。Event method只能找出基因對之間的調節關係,而無法提供調節關係詳細的資訊;q-cluster method則受限於調節樣式長度限制。因此,我們提出一個有效率的資料探勘方法,以找出所有重要的調控樣式而且不受到樣式長度的限制。利用所找出的樣式資訊,我們可以進一步分析基因與基因群集之間的調節關係。 首先,我們將生物晶片資料矩陣轉換為基因改變傾向矩陣。然後,我們將在某一連續時間內擁有相同調控樣式的基因分群,並記錄其詳細資訊,包括基因編號以及發生時間。藉由這些詳細資訊,利用逐層式的組合,我們能夠進一步擴展調控樣式的長度,找到所有重要的調控樣式。最後,我們分析這些基因群集的特性並找出彼此之間的調節關係。 為了評估所提出方法,我們進行兩個實驗。首先,我們利用模擬的資料來評估所提出方法的效率及擴充性。接著,我們利用基因本體論(Gene Ontology)以及439個已被生物學家證實的調控基因對來評估所提出方法的效能。結果顯示,我們所提出的方法不僅具有效率及擴充性,並且可以有效的找出基因群集之間的調節關係。

並列摘要


Analyzing time series gene expression data provides a great opportunity to discover regulation relationships among genes and gene clusters. However, existing methods of mining gene regulation relationships, such as event method and q-cluster method, have their own limitations. The event method can only identify the relationships between gene pairs without the detailed time-lagged information and the q-cluster method limited by its pattern length can only find localized patterns. Therefore, in this thesis, we propose an approach that can efficiently mine all frequent regulation patterns without the limitation of pattern length and discover the regulation relationships among gene clusters with the detailed time-lagged information. We first transform the raw data into a tendency matrix. Next, we group together genes sharing the same expression tendency over certain consecutive time points, and obtain their patterns and detailed information. Then, we extend the patterns obtained into longer patterns by a level-wise combination approach. Finally, we can analyze the characteristics of gene clusters and infer the regulation relationships among them. The experimental result demonstrates that our proposed method is efficient and scalable. Moreover, we use Gene Ontology and 439 regulation relationships proved by biologists to evaluate the effectiveness of our proposed method. The experimental result shows that our proposed method can reliably find those regulation relationships among gene clusters.

參考文獻


[2] Orly Alter, Patrick O. Brown and David Botstein, Singular value decomposition for genome-wide expression data processing and modeling, In Proceedings of the National Academy of Sciences, vol. 96, 2000, pp. 10101-10106.
[3] Dhammika Amaratunga and Javier Cabrera, Exploration and analysis of DNA microarray and protein array data, Wiley series in probability and statistics, New Jersey, USA, 2004.
[7] Yen-Liang Chen, Shih-Sheng Chen and Ping-Yu Hsu, Mining hybrid sequential patterns and sequential rules, Information Systems, vol. 27, 2002, pp. 345-362.
[8] Yizong Cheng and George M. Church, Biclustering of expression data, In Proceedings of the 8th International Conference on Interlligent Systems for Molecular Biology, AAAI Press, 2000, pp. 93-103.
[11] Chad Creighton and Samir Hanash, Mining gene expression database for association rules, Bioinformatics, Vol. 19, 2003, pp. 79-86.

延伸閱讀