在後基因組世代,如何針對高通量技術所產生之大量資料進行分析,已成為各生資領域受矚目的議題之一。許多計算方法已被提出,利用近代電腦之高速運算能力,進行基因調控網路或轉錄調控網路之預測。而學者們希望透過調控網路的預測,用於開發藥物與新式的疾病療法。因此,近年來已有許多針對調控網路的計算分析方法被發表與廣泛應用。本論文依循過去學者的研究經驗與成果,提出三種不同的計算方法,分別可用於預測基因調控網路或轉錄調控網路,並使這些結果具有高準確率與生醫應用上的意義。 首先,本論文提出之第一個計算方法(PARE)利用基因微陣列資料,以模式識別方法為基礎,預測基因間具時間差之調控關係。PARE包含非線性的指標分數,可用於萃取基因微陣列資料中,每對基因表現模式間的三項特徵,分別是一階關係、二階關係與封閉面積。經由訓練的方法,PARE可學習成對且已知調控關係之基因表現模式,並利用學習的結果來預測其它基因對間未知之調控關係。本論文提出之第二個演算法,是一個非線性曲線擬合方法。此方法具有兩個主要元件,分別是強健相關係數預測法,以及非線性迴歸模型。此方法透過非線性曲線擬合方法,已非監督式方式來模擬並探掘基因網路調控關係。本論文分別利用酵母菌基因表現量資料,驗證了此方法在預測一般調控網路之有效性。另外,本論文亦使用人類基因表現量資料,進行疾病調控路徑之預測,並且找出一些可深入探討的疾病調控關係。第三個方法(AdaFuzzy)則是一個整合基因序列資料、基因微陣列資料,以及染色質免疫沉澱法資料,進行轉錄調控網路之預測。其中,AdaFuzzy提出了一個強健位置加權矩陣,可用於找尋各轉錄因子之結合序列中具有保守共通特徵之片段。AdaFuzzy亦可將預測所得之啟動子片段分類至四個啟動子結構。本論文利用酵母菌之資料,驗證了AdaFuzzy在調控網路預測上的可用性。
In the post-genome era, the analysis of high-throughput data has become a critical requirement in many laboratories. Many computational approaches have been developed to identify genetic or transcriptional interactions that may be used to prevent or disable unwanted state, such as those associated with oncogenesis or a disease. Therefore, inferring genetic interactions and transcriptional interactions through inspection of high-throughput data are essential issues in post-genomic research. In this study, we developed three computational models to extract the nonlinear relationship between genes, and also construct transcription regulatory networks and genetic regulatory networks with higher accuracy and larger biological significance. The first method is a pattern recognition (called PARE) approach that infers time-lagged genetic interactions from time-course microarray data. A non-linear score extracts some characteristics, the first and second derivatives and the enclosed area, of paired gene-expression curves to approximate the non-linear association and dynamics between the curves. Such non-linear score is then used to identify subclasses of gene pairs with different time lags. Finally, PARE integrates both MGED and existing knowledge via machine learning, and subsequently predicts the other genetic interactions in the subclass. The second method consists of two components, a robust correlation estimator and a nonlinear recurrent model. The method was used to simulate the underlying nonlinear regulatory mechanisms in biological organisms without any prior knowledge. The proposed algorithm was applied to infer the regulatory mechanisms of the general network in Saccharomyces cerevisiae and the pulmonary disease pathways in Homo sapiens with interesting outcomes. The third method is a fuzzy-logic approach, called AdaFuzzy, which integrates DNA sequence, microarray and ChIP-chip data to infer TIs. A robust position weight matrix and a feature vector are proposed in AdaFuzzy to search for consensus sequence motifs. AdaFuzzy was also able to classify all predicted TIs into one or more of the four promoter architectures. The validated success in the prediction results implies that AdaFuzzy can be applied to uncover TIs in yeast.