基因表現為細胞分化與行使功能的一個重要過程,不同的細胞藉由表現不同的基因以構成細胞的多樣性。第二型RNA聚合酶所轉錄的基因啟動子是座落於轉錄起始點(TSS,Transcription Start Site)5'端上游的一段DNA區域,其中有許多調控序列(regulatory sequences或regulatory element)可被不同的轉錄因子(Transcription Factor)辨識並結合來達到調控基因表現的目的。因此轉錄起始點對於promoter序列的確立有著舉足輕重的地位。一般來說,尋找promoter區域的方式通常以已知特定樣式來預測的方式進行,但由於promoter樣式尚未完全被解析,因此結果往往不盡人意;而目前由實驗數據取得的promoter序列資料庫如EPD(Eukaryotic Promoter Database,http://www.epd.isb-sib.ch/)又有著數量少、更新速度慢的缺點。 在這份研究中,我們用NCBI BLAST工具與其延伸開發的miBLAST,以人類EST(Expression Sequence Tags)與mRNA做序列比對以建構全長的cDNA(Full Length cDNA, FL-cDNA),同時根據比對到的EST內容,取得該基因的表現型態。接著使用全長cDNA與人類基因體序列比對,根據其比對結果可得知轉錄起始點位置,進一步取得−1000 ~ +50的區域做為promoter候選區域。最後,我們在全長cDNA、promoter與EST之間建立關聯資料庫,並設計一個初步的網頁搜尋介面,可供使用者查詢某基因的表現型態及其promoter序列,服務提供於http://140.128.139.59:50124/vpbs/index.php。
Gene expression is a very important process in cell development and cell function. Cell diversity was constituted by differential gene expression. The promoter of RNA polymerase II transcribed gene is a DNA region just 5' upstream to the transcription start site (TSS) and there are many regulatory elements within this region which can be recognized by various transcription factors. Through the binding of transcription factors to these sites, it regulates the expression of gene. Therefore, via TSS identification is the most reliable method for promoter finding. Most of current methods used to find promoter region were working in a prediction way. But the results were not as ideal as we thought because the promoter pattern was not entirely known. Public promoter database obtained from experimental data, such as EPD (Eukaryotic Promoter Database, http://www.epd.isb-sib.ch/), contain more authentic promoter sequences, but the number was far less than estimated, and the update speed is quite slow, too. Here, we use human EST (Expression Sequence Tags) to against human mRNA by using miBLAST, an extended development tool of NCBI BLAST, to obtain full length cDNA (FL-cDNA). Simultaneously, according to the content of GenBank format of hit ESTs, we construct the gene expression profile database. On the other hand, we BLAST the FL-cDNA to human genomic DNA to find the TSS and the promoter region of each gene (mRNA). According to the TSS, we collect the sequence of −1000 ~ +50 nt region to establish the promoter candidate region database. Finally, we construct a relational database among promoter sequences, cDNA sequences and gene expression profiles and design a simple web service to search expression profile and promoter sequences of certain human mRNA. The web service is available at http://140.128.139.59:50124/vpbs/index.php.