健保資料庫(Claim Database)所記載的縱向數據資料能反應日常生活實際現象,因此成為藥物流行病學研究於疾病治療的效用等相關研究之重要工具。然而健保資料屬於在非控制環境下依照特定目的而收集的次級資料(Secondary Data),因此潛藏著無法完全被觀察到的相關潛在干擾共變數。若能在面對巨量建保資料庫時,適當的應用以傾向分數(Propensity Score, PS)為基礎的高維度傾向分數(High-Dimensional Propensity Score, HD-PS)來針對未知或潛在的干擾共變數進行分析,其結果會比傳統傾向分數更接近隨機臨床試驗及觀察型研究結果。 然而目前HD-PS相關研究多探討其優越性,卻較少探討如何由巨量健保資料中的篩選出適合分析的病患數量,因此當面巨量健保資料時,並無法明確的定義病患樣本的擷取數量,以及病患數量是否會對HD-PS在探勘干擾共變數於偏差調整時造成影響。因此本研究採用免疫演算法結合HD-PS與網格運算來進行病患數量對於探勘干擾共變數之影響的探討。
Longitudinal data in Taiwan's National Health Insurance claim database, revealing the actual situation in people's daily live, has become a significant tool for pharmacoepidemiology to conduct researches regarding the effectiveness of therapies. However, data collected based on specific purposes under the non-controlled environment in claim database are considered secondary data. Therefore, relevant potential interference factors may not be observed completely. If the high-dimensional propensity score on the basis of propensity score (HD-PS) can be applied to analyze unknown or potential interference factors, the results will be more similar to those revealed from randomized controlled trials or observational studies. Most of the current HD-PS studies focus on exploring its advantages instead of investigating how to choose a certain number of patients from the huge database for further analysis. Consequently, a specific number of patients and whether this number affects interference factors or bias adjustment are not able to be drawn during the investigation. This paper applies immune algorithm combining with HD-PS and grid computing to explore the influence of patient's number over interference factors.