近年來,由於資料的收集與統整資訊化的成果,造成大型臨床醫學資料庫,與生物醫學資訊學的出現;而除傳統統計分析方法外,更引進機器學習概念,著重自動化資料處理和智慧決策。在本論文,我們提出一應用密度估計演算法的資料分析過程,來調查偏頭痛與眾多疾病之間的共病關聯。這項研究的主要目的旨在發展一個新的分析過程,可以從大型醫療資料庫,發掘有見地的知識。整個分析過程分為兩個階段:在第一階段,一種名為RVKDE 的核心密度估計演算法將用以確定「興趣樣本」。然後,在第二階段,另一種植基於廣義高斯元件的密度估計演算法,G2DE,用以提供樣本分群的摘要說明。偏頭痛是一種流行,但經常被低估的神經功能障礙,因此我們希冀發掘其與多種身心疾病的共病關聯。臺灣的全民健康保險研究資料庫(NHIRD)被用作這項研究的資料來源,其主要優勢為一個植基於全國人口範圍的大型醫療保險申報資料庫。根據本論文提出的兩階段分析過程,分析偏頭痛共病關聯而取得的結果,可以有效識別特徵鮮明的「興趣樣本」。此外,所識別樣本進一步的分群,其特性符合最近生物醫學研究中發現的知識。因此,本論文所提出的分析過程,可以針對發病機制提供有價值的線索,從而促進適當治療策略的發展。
Current trends in biomedical informatics have been toward developing automatic data processing and intelligent decision-making systems. This thesis proposes a method of analyzing data based on density estimation in order to investigate co-morbidities associated with migraine and suspected diseases. The primary objective was to develop a means of analysis capable of providing insight into knowledge obtained from large medical databases. In the first stage of analysis, a kernel density estimation algorithm named RVKDE was used to identify subjects of interest. In the following stage, a density estimation algorithm based on generalized Gaussian components and named G2DE was used to provide a summarized description of the distribution. Migraine is a prevalent but largely overlooked neurological disorder; therefore, this study sought to mine associated co-morbidities, such as certain psychiatric and somatic illnesses. Data was obtained from the large population-based medical claims records in the National Health Insurance Research Database (NHIRD) of Taiwan. Our results demonstrate the effectiveness of using the proposed analysis procedure to identify clusters of subjects sharing distinctive characteristics. Furthermore, these characteristics are related to a number of recent discoveries in biomedical research. The proposed analysis procedure is capable of providing valuable clues into the pathogenesis of diseases as well as facilitating the development of effective treatment strategies.