透過您的圖書館登入
IP:52.15.112.69
  • 期刊

人類第22對染色體核苷酸各種序列組合頻率之統計分析

Statistical Analysis of the Genomic Sequence of Human Chromosome 22

摘要


目的:目前已知與第22號染色體變異相關疾病有:迪喬治症、第二型神經纖維瘤病、慢性骨髓白血病等。本研究針對第22號染色體之全序列約五千多萬鹼基,進行統計分析,以自行撰寫程式方式尋找所有可能的序列組合之出現頻率。材料與方法:針對第22號染色體,以自行撰寫程式的方式分析長度大約為3千4百多萬鹼基對,從中尋找所有可能的序列組合之出現頻率,該程式可以針對任何鹼基數目進行統計分析,本文首先以8個鹼基為實例,依照組成(A、G、T、C)分析其所有序列組合,共65,536組之統計頻率。結果:結果顯示A、T出現之比率約各占總序列之26%,G、C出現比率約各占總序列之24%,另外序列重複次數最多與最少的50組,分別約占總序列的1.873%與0.00066%。結論:本研究不但可以量化鹼基組合,更可快速計算其所有序列組合出現次數的比率,精確統計分析各種序列組合之出現頻率,幫助生物學家在龐大的鹼基序列尋找出所代表的意義並且能快速搜尋可能的SNP,對於尋找致病基因特徵,具有相當大的幫助。

關鍵字

染色體 序列組合 統計分析

並列摘要


Objective: The human chromosome 22 is approximately 50,000,000 base pairs in length and is made up of many combinations of the four bases, A, T, G and C. Human chromosome 22 has been associated with a number of common genetic diseases, such as DiGeorge syndrome, Neurofibromatosis, Chronic Myeloid Leukemia (CML) as well as others and is this of interest to biologists. The purpose of our research is to identify and measure statistically the frequency of interesting patterns in such a sequence and use this to help the generation of hypotheses for biologists. Materials and Method: A total of 65,536 different sequence patterns were identified in human chromosome 22 and these were listed by an efficient program developed by us. The program could perform its functions of pattern identification for any number of base pairs, customizable by the user. However, in this case, our program was used for eight base pair combinations and used to classify occurrence of these on human chromosome 22. Result: The occurrence rate for A and T in the chromosome was approximately 26% and for G and T, approximately 24%. The highest frequency fifty combinations occurred at a rate of 1.873% and the lowest frequency fifty combinations occurred at a rate of 0.00066%. Conclusion: All of the possible eight base pair sequences were quantified and this was done rapidly for all possible combinations. Thus, it is now possible to analyze all possible patterns in a chromosome for the biologist and this will allow the investigation of possible functions of these sequences on such chromosomes as part of the analysis of the massive amounts of biological data that is now available from genome sequencing.

延伸閱讀