巨量資料視覺化模型建構之探討大腸癌盛行率與飲用水質關係

資料視覺化是本論文的主軸，它也是使用者與資料溝通最直接的方法，利用D3.js函式庫建立整套疾病地圖系統，透過互動事件與動畫呈現讓使用者感受資料的特性；透過疾病地圖從空間面向觀察臺灣整體的疾病分布與趨勢；透過疾病趨勢圖觀察特定縣市之時間面向資料，展示疾病各年度間的盛行率、通報數以及平均年齡…等不同資訊，同時系統也提供一個便於分析資料的介面清楚地比較不同縣市的差異；利用非同步技術動態載入環境資料庫(自來水水質資料庫、水庫水質資料庫)，在整合趨勢圖中隨著時間演進泡泡的位置移動與大小的縮放除了可以瞭解疾病的趨勢也可以觀察特定環境屬性是否與疾病有一定程度上的相關。本研究系統採用國家衛生研究院全民健康保險研究資料庫百萬人抽樣歸人檔做為疾病資料基礎；為了處理如此大量資料而選用NoSQL資料庫MongoDB做為資料儲存的系統，利用mapreduce技術提升在分散式資料庫查詢的效能並能執行較複雜的運算，剔除不符合條件的就醫紀錄並將原本歸人的資料依據區碼歸檔，建立cache系統避免頻繁的資料庫伺服器存取，將系統資源做最有效的應用。系統開發採用MVC架構，讓系統模組化以增加其擴充性，可依據使用者查詢的疾病代碼(ICD-9)載入適當的預測模組或者功能模組。

關鍵字

健保資料庫； NoSQL資料庫；疾病地圖； mongoDB ；巨量資料；資料視覺化

並列摘要

The core of this thesis, Data visualization, is a way of user communication with data. Using D3.js tools to build this disease mapping system, which allows user to feel the change of the data by events selection and animation. With the Disease Map function, users are able to observe the distribution of the disease in spatial aspect. With the Disease Trend function, users are able to read the prevalence, count, and average age etc. of any city in time scope. These functions, also provide a interface to compare data between different cities. Loading environment database dynamically, binding with Hybrid Bubble Chart function by observing the position and the radius change of the Bubbles at different time points and let users be able to feel whether is there any relative trend between environment attributes and the disease occurrence. We used Nation Health Insurance Research Database (NHIR) as the database of this system which contains medical records of a million patients. In order to deal with this enormous amount of patient data, we select MongoDB, which is a distributed document NoSQL database. With mapreduce technique we can run complicated operations. Eliminating those data which doesn’t fit the query condition, then restructure the data by geographical distribution. By using Cache system to keep our database away from busy accessing to increase the query efficiency. We also applied MVC framework to make this system more expendable and able to load specified prediction module or function module depend on the ICD-9 code user input.

並列關鍵字

Nation Health Insurance Research Database ； NoSQL Database ； Disease Mapping ； Big Data ； Data Visualization

參考文獻

7. Chang, C.-L., The research and development of disease mapping in Taiwan. 2006.

9. 行政院環境保護署環境資源資料開放平台.

1. Tai, Y.-M. and H.-W. Chiu, Comorbidity study of ADHD Applying association rule mining (ARM) to National Health Insurance Database of Taiwan. international journal of medical informatics, 2009.

3. 衛生福利部疾病管制署, 衛生福利部疾病管制署傳染病統計資料查詢系統. 2015.

4. Ken Ka-Yin Leea, W.-C.T., Kup-Sze Choi, Alternatives to relational database: Comparison of NoSQL and XML approaches for clinical data storage. ELSEVIER, 2012.

國際替代計量

巨量資料視覺化模型建構之探討大腸癌盛行率與飲用水質關係

主題瀏覽