健保資料庫披露風險之研究與探討

台灣健保資料庫已經累積二十五年的資料量，資料相當豐富且巨量，但在享受這些便利的同時，最需要擔心的是個人資料外洩的問題。本研究目的是探討台灣健保資料庫之披露風險(Disclosure Risk)，以及計算重組健保資料庫之披露風險，並提出對台灣健保資料庫可能有效的統計披露控制(Statistical Disclosure Control)，讓發佈的資料更能夠保護個人的隱私，同時又能兼顧分析資料的能力。在研究健保資料庫之披露風險時，研究者發現，若已知病患的性別、年齡、看診日期與看診科別，就診資料被披露的機率高達2.85%，因此提出使用重組模型方法(Regrouped Model)來降低披露風險。重組模型方法不但能將披露風險降低，並且發現檢定統計量也會隨著降低，代表著與原始數據相比，較不容易被拒絕。

關鍵字

健保資料庫；披露風險；統計披露控制(SDC) ；重組模型方法

並列摘要

The National Health Insurance Database in Taiwan has accumulated for 25 years, which is abundant and huge amount. Although this is very convenient, the most worrying aspect is personal data leaked. The purpose of this study is to explore the disclosure risk of National Health Insurance Database in Taiwan, calculate the disclosure risk of the restructured National Health Insurance Database, and proposed statistical disclosure control that may be effective for National Health Insurance Database in Taiwan. It can more protect the privacy of individuals which is released from disclosure at the same time, taking into account the ability to analyze data. When studying the disclosure risk of the National Health Insurance Database, the researcher found that if the patient’s gender, age, date of visit and clinic were known, the probability of disclosure of medical information is up to 2.85% , so it is proposed to use the Regrouped Model to reduce disclosure risk. The Regrouped Model not only reduces the disclosure risk, but also find that the test statistic will decrease, indicating that it is less likely to be rejected than the original data.