多重指標所製成的組織成效組合分數之發展與應用探討–以治療糖尿病論成效計酬病人的醫院排序為例

背景：隨著越來越多的論成效計酬和品質報告卡計畫在世界推行，醫療政策專家對於醫療機構的排名和組合分數(composite score)的議題感到有興趣並且有一些爭議出現。雖然一些研究已經討論過組合分數，但是仍然很少的實證研究利用組合分數及全國的資料庫來探討論成效計酬(P4P)下的醫院排名。簡單來說，我們仍然不清楚選擇組合分數的方法不同是否會對P4P的醫院排名有重大影響，特別是在組合分數包含有風險校正的結果指標時。而且每一個組合分數方法的不確定性(信度)和效度也很少一起在文獻中做比較。同時，小樣本醫院的問題，風險校正和課責制度(accountability)的議題也需要在建構組合分數一起考量。目的：我們使用全國的資料來建構及實証糖尿病論成效計酬的潛在分數(包括兩個項目反應理論的模式及PRIDIT模式)和非潛在分數(包括原始分數加總、全有或全無分數及品質積存人年分數)。根據這些結果，我們摘要出潛在和非潛在分數的重要特性(例如加權機制)，並且做出如何與論成效計酬誘因設計的結構面配套的建議。方法與材料：參加加成方案的糖尿病人除了需符合ICD碼250之外，還需在2007年就診4次及大於18歲，最後經過最大量演算法(plurality algorithm)以歸類給特定醫師。資料從中央健保局2005年1月至2007年12月的保險資料來。糖尿病的結果資料，例如糖化血色素的值，從中央健保局為醫療院所自行申報病人結果而設立的虛擬私人網路(Virtual Private Network，VPN)來。本研究為橫斷性研究，首先利用GEE模式校正糖化血色素的值，接者再運用不同的演算法計算出組合分數，然後用醫療機構排序的一致性、效度和信度等準則來比較不同的P4P組合分數。我們亦提供敏感度分析的結果以避免小醫療機構的對排名的影響。結果: 我們發覺在非潛在分數方面，原始分數加總比全有全無分數好，原因在於其具有較高的信度和效度，及與潛在分數有高相關。潛在分數又優於非潛在分數，原因在於有更高的信效度，加權機制和豐富的政策意涵。其中，PRIDIT模式的效度優於所有的項目反應理論的模式，但是在信度方面卻相反。結論: 我們整合了必要的元素以讓我們的組合分數研究更嚴謹，像是風險校正，最大量演算法(為了分配病人給一個醫師)和敏感度分析。我們也提出了如何在適當的時機點根據不同的潛在分數方法來做應用，以及如何與P4P的誘因結構做搭配。

關鍵字

組合分數；論成效計酬；糖尿病；醫院排名；信效度

並列摘要

Background: With more and more P4P and public disclosure initiatives been established around the world, health care experts showed their interests in and debate on the facility rankings and composite score issues. Although some research had discussed about a composite score, there are still little empirical testing about exploring the hospital ranking differences using a P4P composite score and nationwide database. In brief, we still don’t clearly understand whether the choice of methodology will have larger impact on P4P hospital rankings based on composite score, especially when it consists of the risk-adjusted outcome measures. And the degree of uncertainty (reliability) and validity of every composite score method is also less compared in literature. For constructing composite score, the problems of small-volume facilities, and the issue of risk adjustment and accountability are needed also to work with the composite score. Objectives: We constructed and proved the characteristics of DM (Diabetes Mellitus) P4P latent score (including two IRT-based Models and PRIDIT Model) and non-latent score (including raw sum score, all-or-none score, QALYs saved score). According to the results, we summarized the important characteristics of latent and non-latent composite score (e.g. weighting mechanism), and made suggestions that how to supplement them to the structure aspect of P4P incentive design. Methods and Materials: Not only DM patients with age > 18 participating in P4P Add-On Program must had ICD 250 code, but also they had at least four numbers of visits in year 2007. Then, they were assigned to one specific physician through plurality algorithm (accountability). DM P4P data were collected from the regular claim data of Bureau of National Health Insurance (NHI) for the period January 2005 to December 2007. DM patient outcome data, such as A1C values were retrieved from the Virtual Private Network (VPN) sponsored by NHI for the facilities or clinics self-reporting patients’ outcome. This research is a cross-sectional study. We first adjusted A1C level using GEE model then calculated composite scores using different algorithms. Then, comparison of different methods of P4P composite score were by three criteria, including agreement of hospital ranks, validity, and reliability. We also proposed sensitive results for avoiding the influence of small volume facilities on ranks. Results: For non-latent methods, we found that raw sum score were better than all-or-none score because of the higher validity, reliability, and higher correlation with latent score. Latent methods were superior to all of the non-latent methods because they are more excellent in validity and reliability than non-latent methods, and had specific weighting themes, as well as richer P4P policy implications. Among these latent scores, we found PRIDIT Model was superior to both IRT-based Models in validity, but opposite in reliability. Conclusion: We integrated some necessary elements into our research of composite score such as risk adjustment, use of plurality algorithm (for assigning patients to one physician), and sensitivity analysis for making our study stricter. We also proposed according their own characteristics how the appropriate timing of implementing different latent scores is, and how to supplement to the structure of P4P incentive design.

並列關鍵字

composite score ； pay-for-performance ； Diabetes Mellitus ； hospital ranking ； validity and reliability

參考文獻

172. Lin CC, Lai MS, Syu CY, Chang SC, Tseng FY. Accuracy of diabetes diagnosis in health insurance claims data in Taiwan. Journal of the Formosan Medical Association = Taiwan yi zhi 2005;104:157-63.

61. American. Medical Association. Guidelines for Pay-for-Performance Programs. (Cited 2007 Feb 28). Available from: URL: www.ama-assn.org/ama1/pub/upload/mm/368/guidelines4pay62705.pdf.

1. Rowena J, Maria G. How do Performance Indicators Add Up? An Examination of Composite Indicators in Public Services. Public Money & Management 2007;27:103-10.

2. Christopher H. Public Service Management by Numbers: Why Does it Vary? Where Has it Come From? What Are the Gaps and the Puzzles? Public Money & Management 2007;27:95-102.

3. Teixeira-Pinto A, Normand SL. Statistical methodology for classifying units on the basis of multiple-related measures. Stat Med 2008;27:1329-50.

國際替代計量

多重指標所製成的組織成效組合分數之發展與應用探討–以治療糖尿病論成效計酬病人的醫院排序為例

主題瀏覽