透過您的圖書館登入
IP:3.141.202.187
  • 學位論文

適用於短時間序列之貝氏基因網路重建演算法之研究

Development of Genetic Network Reconstruction Algorithm for Small Sample Size Based on Bayesian Networks

指導教授 : 陳中明

摘要


隨著人類基因研究持續進行,越來越多的基因被發現與疾病有直接關係,探索基因的功能已成為生物技術研究的重點項目。基因的運作並非單獨進行,而是透過一連串複雜的交互作用機制對生物體產生影響。由於人體的複雜性,大部份疾病與基因間的關係並不容易釐清。重建基因網路的目的就是為了分析基因之間互相調控的運作機制,進一步了解基因對生物體產生影響的運作細節。 受限於微陣列晶片成本過高,生物實驗通常無法提供大量的連續觀測資料用以重建基因網路。為發展一套能利用少量觀測值有效重建基因網路的演算法,本論文選擇以貝氏網路為演算法基本架構,並以Variational Bayesian (VB)為基礎提出Divide and Conquer VB (DCVB) 演算法。因VB 以長時間序列資料重建基因網路有很好的表現, 但用於短時間序列時成效明顯過低。DCVB 在重建基因網路過程中同時只計算少數幾個基因之間的關聯,透過建立小型子網路的方式分析出部份基因的交互作用,並以多個子網路分析的結果重建出完整的基因調控網路。之所以能將完整網路拆解成小型子網路而不影響最終分析結果,是因為VB 具有處裡潛在影響因子的能力,因此DCVB 並不會過度估計參與子網路的基因之間的交互關聯。DCVB 大致上可細分為單層式DCVB 與階層式DCVB,前者將基因網路拆解成小型子網路進行重建,後者則是整合多次單層式DCVB 得到最終結果。因DCVB 並非對完整的網路作整體參數估計,而是減少參與交互作用的基因數量進而達到減少模型參數的目的,讓估計參數模型的過程更為容易,因此能有效應用於短時間序列資料之網路重建,這種特性用於建立大型網路時效果更加顯著。 本論文分別以模擬資料與p53R2 實驗資料測試DCVB 與VB 作比較。模擬資料為三個虛擬基因調控網路所產生不同長度的時間序列資料。根據模擬資料分析結果顯示,本論文提出方法用於分析短時間序列,重建基因網路之效果的確比VB 好,用於長時間序列及基因數較多的基因網路效果提昇更為顯著。在p53R2 實驗資料分析部份,取出四組經分群後具顯著表現的基因群,分別以DCVB 與VB 作網路重建工作。目前尚未於文獻中證實DCVB 可以有效找出VB 未能找出的網路連結,只能以模擬資料評斷在各種時間序列長度DCVB 皆有優於VB 的表現。

關鍵字

基因網路 貝式網路

並列摘要


With the continual progress of human genome researches, more and more genes have been found to be closely related to human diseases. Accordingly, exploration of genetic functions has become one of the major foci in biotechnology researches. It is well known that each gene does not work alone. Instead, it may involve enormous complicated interactions among genes in a biological process. Because of the complexity of physiological and biochemical processes in the human body, the relations between the genes and most diseases are not clear currently. Therefore, the ultimate goal of gene network reconstruction is to analyze the regulatory mechanisms among genes and understand how genes involve in biological processes. Limited by the high cost of microarrays, most biological experiments can not offer a large number of observations for gene network reconstruction. To overcome this limitation, a new gene network reconstruction algorithm, called Divide-and-conquer Variational Bayesian (DCVB) algorithm, is proposed in this study. Although the VB algorithm, which is the basic construct of DCVB, has been shown to be effective for long time-course data, its performance for short time-course data is far from satisfactory. The DCVB algorithm decomposes the large gene networks into multiple small subnets. By considering those genes not included in a subnet as latent factors, the DCVB algorithm is capable of estimating gene-gene interactions for each subnet independently, thanks to the ability of the VB algorithm in incorporating latent factors. Two classes of DCVB algorithms will be evaluated, namely, single-level and hierarchical DCVB. While the former decomposes the entire network into small subnets of fixed sizes for reconstruction, the latter integrates the results of multiple levels, each with a different network size, to form the final reconstructed network. Because DCVB does not estimate all gene-gene interactions for the entire network at a time, the number of parameters to be estimated is greatly reduced compared to the conventional VB algorithm. It thus promises a better performance for reconstructing a large network with short time-course data than the VB algorithm. Performance comparison between the DCVB and VB is carried out by using simulated time-course data and p53R2 experimental data. For the simulated data, three gene networks with various lengths of time-course data are simulated. According to the simulation results, the proposed DCVB outperforms the VB for both short and long time-course data. Especially, the DCVB is substantially superior to the VB for large networks and long time-course data. For the data of p53R2 study, it requires further experiments to validate the networks reconstructed by the DCVB and the VB, respectively. In summary, the DCVB is shown to be better than the VB only for the simulation data. Further validations are required for the performance comparison between both algorithms for the real data.

參考文獻


[1] Cleveland WS. (1979). Robust locally weighted regression and smoothing scatter plots. J. Amer: Statist. Assoc, 74: 829-836.
[2] Schadt EE, Li C, Ellis B, and Wong WH. (2002). Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. Journal of Cellular Biochemistry Supplement, 37: 120-125.
[3] Huber W, Heydebreck AV, Sueltmann H, Poustka A and Vingron M. (2002). Variance stabilization applied to microarray data calibration and to the quantfication of di erential expression. Bioinformatics, 18: S96-S104.
[4] Pan W. (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics, 12: 546-554.
[5] Tusher VG, Tibshirani R and Chu G. (2001). Signi cance analysis of microarrays applied to the ionizing radiation response. PNAS, 98: 5116-5121.

被引用紀錄


陳俊佑(2013)。建構中小企業營運風險、製造策略及績效之關聯模型〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-1707201316221400

延伸閱讀