透過您的圖書館登入
IP:3.149.232.236
  • 學位論文

基於Hadoop MapReduce與HBase之醫療資訊快速分析平台

The Efficient Analysis Platform of Medical Informatics Based on Hadoop MapReduce and HBase

指導教授 : 歐陽彥正
共同指導教授 : 黃乾綱(Chien-Kang Huang)

摘要


大型醫療資料庫的研究是近年來熱門的研究主題,但是在實務上常會面臨分析速度緩慢的問題,自一般關聯式資料庫中擷取所需的資料往往需要很長的時間,使得研究主題和規模受到限制。在本篇論文中,我們透過資料庫架構上的重新設計和資料重建,使用HBase儲存健保資料分析時常用的關鍵資料,搭配Hadoop MapReduce對這些資料做分散式與平行化的分析,加快健保資料庫的分析速度;最後將整個分析流程整合成一套自動化的快速分析平台,方便各種不同主題的研究。相容於雲端運算環境的設計使得未來的擴充相對容易,可以直接移植到商業的雲端環境,也使得即時分析系統的開發變得可能。 為了達到這些目的,我們首先分析了相關文獻和熱門的研究方法,統整出需要儲存的重要資訊,並使用HBase將原始的健保資料庫重建成一個適合大規模快速分析的資料庫。透過新資料庫的設計,在分析上可以有效率的取得關鍵資訊,減少在反覆查詢資料上所消耗的運算資源和時間。在分析的流程上,我們設計了全自動化的分析流程,透過制式的疾病定義檔,系統可以自動在資料庫中挑選所需的實驗組和對照組,並計算出勝算比,提供結果分析和探討。每一次完整的分析流程在由三台電腦組成的環境中僅需要大約5分鐘,遠比傳統的流程快上許多。由於速度的加快,我們得以大規模的對數十種疾病做全對全的交叉分析,試圖找出仍然不為人所知的共病關係和因果關係。

並列摘要


Analysis of large-scale medical database has become a popular research topic in recent years. The increasing power of computers and the massive collections of medical records allow us to conduct population-based studies to identify the relationship among diseases. In practice, this kind of studies faces a serious efficiency issue due to the scale of the databases, which then severely limits the productivities of scientists. In this thesis, this efficiency issue is addressed by incorporating HBase, instead of the conventional relational database software, as the data storage framework. Based on the distinct data storage structure of HBase, a new database schema designed to support the MapReduce programming model has been proposed for carrying out distributed and parallelized analyses highly efficiently. Experimental results show that with the proposed design analyses that takes hours or even days with the conventional database framework can be completed within minutes. Another major merit of the proposed design is that the framework works smoothly with the cloud computing environment and therefore enjoys good scalability.

參考文獻


1. Grossman, R.L., The Case for Cloud Computing. IT Professional, 2009. 11(2): p. 23-27.
2. Lee, D.D., et al., Association of primary cutaneous amyloidosis with atopic dermatitis: a nationwide population-based study in Taiwan. British Journal of Dermatology, 2011. 164(1): p. 148-153.
3. Lin, L.-Y., et al., Risk factors and incidence of ischemic stroke in Taiwanese with nonvalvular atrial fibrillation--A nation wide database analysis. Atherosclerosis, 2011. 217(1): p. 292-295.
4. Chung, K.-H., C.-C. Huang, and H.-C. Lin, Increased risk of gout among patients with bipolar disorder: A nationwide population-based study. Psychiatry Research, 2010. 180(2-3): p. 147-150.
5. Cheng, C.-L., et al., Validation of the national health insurance research database with ischemic stroke cases in Taiwan. Pharmacoepidemiology and Drug Safety, 2011. 20(3): p. 236-242.

延伸閱讀