隨著次世代基因定序 (next-generation sequencing, NGS) 科技的發展,遺傳流行病學家開始探討次要對偶基因頻率 (minor allele frequency, MAF) 小於1%的罕見變異 (rare variants) 或小於5%的低頻率變異對於複雜疾病 (complex disease) 的影響。在包含多位染病同胞的家族資料中,有機會觀察到較豐富的罕見變異資訊,使得家族型態資料如染病同胞對 (affected sib pairs, ASPs) 的研究設計逐漸受到注意。此外,家族性研究設計亦可穩健地處理族群結構 (population substructure) 帶來的問題。Epstein等人近期發展一針對染病同胞的罕見變異關聯檢定,將染病同胞對在某基因或區域內所帶有的罕見變異總個數對其同源全等基因 (identity by descent, IBD) 數量作迴歸。雖Epstein等人之模擬結果呈現了適當的型一錯誤率 (type I error rate) 及高度的統計檢定力 (statistical power),吾人發現當IBD估計數量與真實數量有差距時,檢定結果的正確性會受到影響。然而,於實際狀況下,真實IBD數量通常未知而需藉由基因型資料來估計。吾人發現加入父母親基因型資料來改善染病同胞之IBD數量估計,為此統計分析正確性及檢定力之關鍵要素,因此加入親本基因型資料在此方法中至為重要。在親本基因型無法取得的情況下,亦應收集家族內其他成員 (如未染病同胞) 的基因型,以改善IBD數量估計的準確性。
With the advent of next-generation sequencing technologies, genetic epidemiologists now search for rare variants (minor allele frequency (MAF) < 1%) or low-frequency variants (MAF < 5%) that are responsible for susceptibility to complex diseases. Family-based study designs such as recruiting affected sib pairs (ASPs) are promising because rare variants can be enriched in families with multiple affected subjects. Moreover, family-based designs are robust against population substructure. Recently, Epstein et al. developed rare-variant association tests using ASPs. ASPs’ total numbers of rare variants in a gene/region are regressed on their identity by descent (IBD) scores. Despite promising simulation results, including desired type-I error rates and high statistical power, we find their methods are valid only when the IBD scores are unambiguous. Unfortunately, in reality, real IBD scores are usually not known and need to be estimated from genotypes. We here find incorporating parental genotypes is crucial to the validity and power of rare-variant association testing with ASPs. Cautions need to be taken to analyze rare variants of ASPs without their parental genotypes. When parental genotypes are not available, genotypes of other family members (such as unaffected siblings) should be collected to improve the accuracy of IBD estimation.