由於人類基因體計畫(Human Genome Project)提供了完整的人類基因序列,使用基因體序列資訊來分析基因功能之相關研究,已經在世界各地展開。利用基因序列資料能夠進行基因功能研究,而基因的結構可以分成幾個部份,在這些部份中,對基因表現來說最重要的部份是啟動子(promoter)。本論文提出一個在未知序列中,預測分析啟動子區域的方法。此方法以調控序列的位置權重矩陣(position weight matrix)為基礎,統計調控因子對序列結合強度與位置的相關性,再加上G+C含量與CpG雙核苷酸的數量,利用線性迴歸的方法計算出三個線性分析函數,用於預測分析人類啟動子在基因序列中的位置。除此之外,還能對啟動子附近的區域進行分析,如核心啟動子(core promoter)的位置、調控蛋白結合部位(regulatory protein binding sites)的位置。最後,本論文與其他方法比較預測結果,證實本論文的方法在預測啟動子的位置,並且包含轉錄起始點的區域,確實是比其他方法優秀。
As a result of the Human Genome Project, people can utilize the whole genome sequence of human. We can use the information of DNA or protein sequence to analyze the function of genes. In general, the structure of gene can separate to several parts. We are interesting to the promoter of the gene. This paper provides a method that can predict the location of promoter in the DNA sequence. Basing the weighted matrix of regulatory protein binding sites, and the formula of liner regression, we try to predict the position of promoter of human DNA sequence and to analyze the proximity of promoter, such as the position of core promoter, regulatory protein binding site and transcription mechanism model. We hope this method can abbreviate the preparing time of biological experiments and provide more information about promoter when biologists prepare the biological experiment.