基於距離限制之蛋白質Motif共現分析系統

蛋白質motif代表在蛋白質家族高度保留下來的區域而且通常可以說是蛋白質穩定及功能必須的關鍵區域。 ProMotif有提供蛋白質motif共現的關聯規則，他用是資料探勘的演算法Apriori演算法來得到關聯規則但是他所產生太多的規則，此在ProMotif並不能依照個人需求選擇蛋白質物種來看個別的蛋白質motif共現關聯式規則。為了減少這些許多的規則而保留下有用而精簡的規則，我們考慮到在生物上的一個特性，如果兩個連續的motif比較靠近，他們的生物功能的關係會比較明顯而且他們在蛋白結構皺摺時比較有可能在同一個群組(group)。因此我們提出一個有條件限制的探勘規則的方法並且建構一個用來探勘這些規則的系統。大家可以透過網路到我們的網站來得到他們想要的蛋白質motif共現關係而且他們可以依照他們的個別需求來選擇蛋白質物種和輸入最小支持度、最大距離、最小正面率來得到蛋白質motif共現關係。

關鍵字

蛋白質；關聯式規則；距離限制

並列摘要

Protein motifs represent highly conserved regions within protein families and are generally accepted to describe critical regions required for protein stability and/or function. ProMotif has provided the association rules of the correlation of protein motifs. ProMotif generated association rules by data mining technique, Apriori algorithm, yet the rules it generated are too many to apply. Moreover, the protein species can’t be chosen with the demand specifically in ProMotif. In order to reduce numbers of rules and keep the useful and terse ones, it is considered that the characteristic in biology that if the distance between two sequential motifs is closer, their biological function relationships are more obvious and those motifs could belong to one group while protein folds. Therefore we propose for a method to mine rules with constraints and construct a system to mine the correlations of protein motifs with constraint-based association rule. Everyone could access on the internet and they are able to obtain the correlations of protein motifs from our web site. And they can choose protein species and to set the value of minimum support, maximum distance and minimum positive rate with their demand.

並列關鍵字

association rules ； protein ； motif ； distance constraint

參考文獻

[1] Agrawal R., Imielinski T. and Swami A. “Mining association rules between sets of items in large databases,” in proc. Of the ACM SIGMOD Conference on Management of Data, 1993.

[2] Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J.A., Hofmann K. and Bairoch A. “The PROSITE database, its status in 2002,” Nucl. Acids. Res. 2002, 30, pp. 235-238.

[3] Bairoch A. and Apweiler R.. “The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000,” Nucl. Acids. Res. 2000, 28, pp.45-48.

[4] Conte L.L., Ailey B., Hubbard T.J.P., Brenner S.E., Murzin, A.G. and Chothia C. “SCOP: a Structural Classification of Proteins database,” Nucl. Acids. Res. 2000, 28, pp. 257-259.

[5] Bateman A., Birney E., Durin R., Eddy S.R., Howe K.L. and Sonnhammer E.L. “The Pfam Protein Families Database,” Nucl. Acids. Res. 2000, 28, pp.263-266.

國際替代計量

基於距離限制之蛋白質Motif共現分析系統

主題瀏覽