資料採礦技術應用於微陣列資料分析以篩選阿茲海默症候選基因之研究

由於人口高齡化的趨勢，老年疾病醫學研究近年來備受重視，其中阿茲海默症是最嚴重的失智症，雖然有許多醫學研究提出一些基因與阿茲海默症有相關，但還是無法有效防範及治癒阿茲海默病症；基於此因，本研究欲找出更多與阿茲海默症病患的基因，提供未來醫學研究之探討。本研究分析利用美國國家衛生研究中心（NCBI）資料庫所提供的HGU-133A平台的GSE1297微陣列資料，首先進行差異性分析篩選出影響阿茲海默病症的表現期顯著的相關基因，總共有1,681筆顯著，將此分別以「全部的顯著基因群」、「與MMSE智能分數相關的基因群」、與「NFT神經纖維纏結相關的基因群」以及「與MMSE或NFT任一有相關的基因群」四大種基因群，利用CART決策樹反覆運算挑選出最能顯著判別的基因，結果剩餘64種基因；再以HGU-1332Plus平台的微陣列資料進行驗證，結果排除1種基因，並且計算每一種基因的顯著率，作為一種縮減維度的指標。最後，結合群集分析以及GO-terms分析來進行功能探討，詴圖描繪阿茲海默症候選基因的表現與調控，提供阿茲海默症研究之參考。

關鍵字

阿茲海默症；微陣列； MMSE智能分數； NFT神經纖維纏結； CART決策樹；群集分析； GO-terms分析；資料採礦

並列摘要

According to the trend of population aging, medical research of old age diseases receive much attention in recent years, Alzheimer's disease is the most serious senile dementia, there are a lot of medical researches trying to find significant genes associated with Alzheimer's disease, but still unable to effectively prevent and cure Alzheimer's disease. So, the study tries to find more genes which could induce or regulate the disease, and provides the candidate gene list for the future experiments.This study uses the GSE1297 microarray data provided by NCBI database. First, theis study analyzes of differences between four groups, there are 1,681 genes which significant. Second, this study uses MMSE (Mini-Mental State Examination) index and NFT (NeuroFibrillary Tangle) index to classify the significant genes into four types. Third, this study using Data Mining tools-CART Decision Tree to select the candidate genes, the results are remaining 64 genes; then use the GSE5281 microarray data for correction, the results eliminate one genes, 63 genes are finally selected. Finally, the pathways cluster analysis or GO terms of candidate associated genes are integrated to recover the mechanism of AD genes. This study may provide new insights into the research on progression of AD.

並列關鍵字

Alzheimer 's Disease ； CART ； Cluster Analysis ； GO-terms Analysis ； Data Mining ； Microarray ； MMSE(Mini-Mental State Examination) ； NFT(NeuroFibrillary Tangle)

參考文獻

Chien, C.,Lin, K.(2006).A data mining framework for binary Cdna bio-chip data analysis and its validation.journal of Information Management.13(4),133-159.

行政院衛生署。衛生統計系列（一）死因統計。取自 http://www.doh.gov.tw/