古細菌是生存在極端環境中,擁有非常特殊的生物特性。在本研究中,我們根據NCBI 所提供古菌(Archaea)嗜熱溫度的差異,分別將古菌分為嗜熱跟非嗜熱兩種類別, 希望能從古菌基因體中挖掘出單獨存在某類別的特徵,提供給古菌學者對於嗜熱古 菌得研究參考。我們的研究方法分為三個步驟:(1)CDS 群組、(2)挖掘獨特群 組、(3)獨特群組生物特徵調查。首先我們收集古菌基因體中蛋白質胺基酸編碼序 列(CDS),利用BLAST 來做CDS 序列相似性的分析,將相似的CDS 分在同一群 組,然後利用集合論中差集的方法,找出只存在於某個類別中的CDS 群組,再檢查 此CDS 群組是否有同時出現在這個類別中大部份的古菌。在實驗中,我們從NCBI 下載50 株古菌全部的CDS(coding sequence )序列(29 株嗜熱古菌,21 株非嗜熱 古菌),並建利CDS 序列資料庫(共有100542 條),找到了12 個獨特CDS 群組。
Archaea mostly survives in the extremely severe environment and contains specific biological characteristics. According to the degree of temperatures for archaea that the NCBI provided, we divided the archaea into two classes: thermophiles and mesophiles. In this study, our objective is to extract some distinct characteristics from the archaea genomes that existed in the two classes as defined above. Thus, we could provide these characteristics to biologists for further research about the thermophilic or mesophilic archaea. Our approach included three steps: (1) constructing CDS groups, (2) searching for the unique groups, and (3) mining biological characteristics of unique groups. Firstly, we collect the protein amino-acid sequences (CDS) from the archaea genomes according to the degree of thermophilic of archaea, and then use BLAST to group the amino-acid sequences that provide the CDS with similar function in the same group. Secondly, we use the difference set property from set theory to find the distinct CDS group that exists in only one class, and verify the CDS groups as representative ones, if these groups appear in the majority of the members in that class. Finally, we mine the biological characteristics of these distinct groups from the well-known biological web site. We retrieve information of whole sequence of CDS of 50 archaea strains (29 strains are thermophilic , 21 mesophilic) from NCBI to establish a database of the CDS sequence (totally 100542 band ), we found 12 unique CDS groups that existed only in the anaerobic class.