透過您的圖書館登入
IP:3.147.42.168
  • 學位論文

以統計模型分析肺癌發生率及死亡率之長期趨勢

Statistical Models for Time Trend of Lung Cancer Incidence and Mortality for Three Decades

指導教授 : 陳祈玲
共同指導教授 : 陳秀熙(Hsiu-Hsi Chen)

摘要


研究背景: 肺癌的發生率和死亡率及其組織型態皆因時間演進而改變。其中影響的因素相當複雜,如隨時間改變的危險因子。同時多階段的預防也會對肺癌的流行病學造成影響,包含初段預防,如戒菸宣導;次段預防,如低密度電腦斷層篩檢;三段預防,如手術及輔助性治療。因此,以統計模型推估肺癌的時間序列相當重要。不僅可以預測未來國家肺癌的趨勢,同時也可以了解許多重要的課題:如肺癌組織型態,包含腺癌、鱗狀細胞癌、小細胞肺癌在時間序列的改變是否對發生率及死亡率造成影響;不同組織型態間是否有男女的差異;肺癌的發生率是否可以用年齡、年代、世代等因子解釋;發生率及致死率的改變對死亡率的影響;同時,因肺癌篩檢的年代效應所造成的早期肺癌過度偵測如何反應在發生率上,也是重要的研究議題。 研究目的: 本研究使用兩種資料庫:全國癌症登記中心資料及基隆社區篩檢資料庫(KCIS)。並使用下列統計方法分析: (1) 利用Box-Jenkins時間序列模型預測不同性別整體及個別組織型態肺癌發生率及死亡率。 (2) 將死亡率拆解為發生率及致死率,並使用Box-Jenkins時間序列模型探討發生率及致死率在死亡率的影響。 (3) 使用貝氏年齡-年代-世代模型分析年齡、年代、世代等因子在發生率及死亡率上的效應。 (4) 使用零膨脹Poisson回歸模型探討不同組織型態肺癌之過度早期偵測率,同時探討抽菸此因子是否對早期肺癌的過度偵測造成影響。 (5) 使用貝氏多階層馬可夫迴歸模型預測肺癌發生率,含進展性肺癌和非進展性肺癌和死亡率。同時也可評估過度偵測早期肺癌對死亡率的影響。 (6) 利用存活模型校正男女性因不同組織學型態肺癌發生率而造成的死亡差異。 (7) 利用基隆篩檢資料庫,進行配對病例對照研究,並試圖找出除吸菸外之肺癌危險因子。 研究方法: 本研究資料來源取自國民健康署癌症登記互動系統及基隆社區整合式篩檢資料(Keelung Community-based Integrated Screening, KCIS)。癌症登記資料提供1979至2014年間每年不同性別及年齡別的肺癌發生(至2012年)及死亡案例數,此外,1995至2012間亦提供不同組織型態別的發生個案數;KCIS資料庫提供該世代2000至2007年肺癌發生、組織型態及截至2010年為止的存活狀況。 本研究採用以下分析方法回答上述六個研究目的: (1) Box-Jenkins自我迴歸移動平均模式(autoregressive integrated moving average, ARIMA)用來確認肺癌發生率及死亡率的趨勢,若長期趨勢不是常數,則以一次自我差值或是二次自我差值進行檢視,待其隨時間呈穩定狀態為止,再檢視Sample autocorrelation function (ACF)及Sample partial autocorrelation function (ACPF)對時間之作圖決定Box-Jenkins在自我迴歸(Autoregression, AR)及移動平均(Moving average, MA)之維度,最後以Ljung-Box統計量檢視模式配適。 (2) 貝氏年齡-年代-世代模式用來檢視其效應,分析資料為上述癌症登記資料提供之發生率及死亡率。本研究利用高斯自我迴歸事前模式平滑年齡、年代及世代效應。採用無訊息事前分佈。 (3) 利用零膨脹Poisson模式分析社區肺癌個案在不同組織型態之下的過度偵測的早期肺癌比例。由於肺癌個案存活者包含兩個部份:過度偵測之早期肺癌及進展性肺癌但仍存活者,因此利用羅吉斯迴歸模式分析組織型態與抽菸對過度早期偵測的影響,結合零膨脹Poisson模式分析性別、年齡及組織型態對進展性肺癌死亡危險之探討。 (4) 利用多階段馬可夫模式量化在有及沒有過度偵測早期肺癌情境下不同組織型態癌症發生及死亡率。發展三階段(模式一,無過度偵測的早期肺癌)及四階段(模式二,存在過度偵測的早期肺癌)馬可夫迴歸模型,並將組織型態視為迴歸解釋變項並應用於癌症登記資料檔,另外,分別以六階段(模式三,無過度偵測的早期肺癌) 及十階段(模式四,存在過度偵測的早期肺癌)馬可夫模型,將組織型態視為獨立階段並應用於社區篩檢資料庫。並以馬可夫鏈蒙地卡羅電腦模擬進行參數估計。 (5) 利用存活曲線校正方式以圖像表示在調整不同組織型態後男性與女性存活函數的差異,並得到可歸因比例模式(Attributable Proportion Model)。 (6) 利用配對病例對照研究方法探討社區篩檢資料庫個案死於肺癌的危險因子探討,對男性及女性進行分層分析。配對因子為出生年2年,探討因子包括吸菸、嚼檳榔、職業、煮菜及燒香習慣。利用McNemar進行單變項分析,以條件式羅吉斯迴歸分析進行多變項分析。 研究結果: 1. 時間序列Box-Jenkins模型 (1) 利用Box-Jenkins ARMA(1,1,0)模型在1980-2012全國肺癌資料,可以發現在年齡別發生率上有增加的趨勢而在致死率是逐年下降 (死亡率/發生率)。也因而在此期間年齡別死亡率維持穩定,並可使用ARMA(0,1,0) 模型做預測。 (2) 利用Box-Jenkins模型結果可得知肺腺癌及小細胞肺癌有增加的趨勢,鱗狀細胞癌的發生率則無太大改變,而在其他癌有減少的情形。。 (3) Box-Jenkins模型結果預測男性肺癌的年齡標準化發生率從2013年每十萬人發生率45.71到2017年48.82,女性肺癌的年齡標準化發生率從2013年每十萬人發生率26.57到2017年29.25。男性肺癌的年齡標準化死亡率從2015年每十萬人死亡率34.34到2019年35.9,女性肺癌的年齡標準化死亡率從2015年每十萬人發生率17.2到2019年18.08 (4) Box-Jenkins模型也利用在不同組織學型態的分析。在肺腺癌部分,男性的年齡標準化發生率從2013年每十萬人發生率30.77到2017年35.75,女性的年齡標準化發生率從2013年每十萬人發生率32.35到2017年32.34。在鱗狀細胞癌,男性的年齡標準化發生率從2013年每十萬人發生率14.8到2017年16.18。女性的年齡標準化發生率從2013年每十萬人發生率2.08到2017年2.12。在肺小細胞癌,男性的年齡標準化發生率從2013年每十萬人發生率7.18到2017年7.99。女性的年齡標準化發生率從2013年每十萬人發生率0.94到2017年1.04。 2. 貝氏年齡-年代-世代模型 (1) 使用貝氏年齡-年代-世代回歸模型分析可得到和Box-Jenkins模型相似的結果。其中年齡標準化死亡率在考量年齡、年代、世代等因子之後仍呈現下降的結果。 (2) 貝氏年齡-年代-世代迴歸模型分析同樣呈現男女性在肺腺癌上有顯著的上升,且在男性的肺小細胞肺癌也有上升的表現。在鱗狀細胞癌及未明示癌則是有下降的趨勢。 (3) 貝氏年齡-年代-世代迴歸模型顯示出顯著的年代及世代效應。 (4) 在女性50-64歲間有顯著的年代及世代效應,推測此年齡層的女性有過度偵測的早期肺癌之可能性。 3. 男女性別存活模型校正 (1) 女性相較於男性有較高的存活率和較多早期肺癌的病人。 (2) 男女性存活率差異在分期校正之後消失。 (3) 男女性之間因不同比例的組織學型態所造成的差異只占男女性存活率差異的33%。 4. 零膨脹Poisson迴歸模型 (1) 利用零膨脹Poisson迴歸模型分析KCIS資料庫肺癌病患的早期肺癌過度偵測率,在肺腺癌中有15 %,在鱗狀細胞癌中有12 %,在未明示肺癌中有12 %,在肺小細胞肺癌中則有5 %。 (2) 未吸菸者較吸菸者有較高的早期肺癌過度偵測率。 5. 貝氏多階段迴歸模型-利用貝氏多階段迴歸模型分析全國癌症登記及KCIS資料庫結果如下 社區篩檢資料庫 (1) 利用模型分析進展性肺癌和非進展性肺癌的發生率和死亡率,可得出不同組織型態的過度早期偵測率:肺腺癌24 %,鱗狀細胞癌10.1%,肺小細胞肺癌0.1%,未明示癌4.4% (2) 每年死亡率在肺腺癌為0.32,鱗狀細胞癌為0.54,肺小細胞肺癌為1.07,未明示癌0.49。在考量過度偵測早期肺癌之後死亡率在各組織型態皆有上升,分別為肺腺癌為0.68,鱗狀細胞癌為0.83,肺小細胞肺癌為1.46,未明示癌0.98。 全國癌症登記資料庫 (1) 利用模型分析進展性肺癌和非進展性肺癌的發生率和死亡率,可得出肺腺癌的過度偵測早期肺癌的比率為29 %,占所有癌症的13 %。在考量過度偵測早期肺癌率之後整體肺癌死亡率從0.29上升至0.97 結論: 本研究應用一系列統計分析方法探討過去30年來臺灣地區肺癌發生率、致死率及死亡率的時間趨勢。使用Box-Jenkins進行時間數列預測和貝氏年齡-年代-世代迴歸模型可得知年齡標準化死亡率的下降趨勢和致死率的下降有關。上升之發生率來自於不同組織學型態彼此的變化:肺腺癌和小細胞肺癌的上升,鱗狀細胞癌的下降。發生率上升也受到年代效應影響。此年代效應推測為特定年齡層的早期肺癌過度偵測。本研究提出零膨脹Poisson迴歸模型和多階段馬可夫模型作為評估過度偵測早期肺癌的工具,且由此可發現過度早期偵測多集中於肺腺癌。過度偵測早期肺癌可能造成存活率的高估。特別是在女性病人有較好的存活率,除了來自組織學型態的不同外,過度早期偵測可能也是其中原因之一。

並列摘要


ABSTRACT Background: As time trends of lung cancer (LCa) in incidence and mortality have been influenced by the interplay of complex causes involving risk factors changing with time, leading to different histological types of LCa, and also the advent of different levels of intervention strategies from primary prevention (e.g. smoking cessation) prevention, secondary prevention (low-dose CT screening), and tertiary prevention with surgery and adjuvant therapy, analyzing time trends of LCa using a series of statistical modelling is not only useful for forecasting the future trend on the disease burden of LCa but also examine whether the time trends of incidence and mortality varies with histological types (including squamous cell carcinoma, adenocarcinoma, small cell carcinoma, and unspecified others), whether there is gender difference in histology-specific LCa, how incidence LCa is explained by age, period, and cohort effect, how mortality of LCa is affected by incidence and also the survival (case-fatality) of LCa, and whether and how overdetection of early LCa resulting from screening, namely period effect, can account for an increasing trend in incidence of LCa. Objectives: By using nationwide aggregate data and one primary data on the community-based integrated screening, the application and development of a series of statistical models for analyzing time trends of incidence and mortality were made by using (1) Box-Jenkins time series model for forecasting gender-specific overall and histology-specific incidence and mortality of LCa; (2) Decomposed incidence-fatality Box-Jenkins time series model for assessing how time trend of mortality is affected by incidence and case-fatality of LCa; (3) Bayesian age-period-cohort (APC) regression model to assess relative impacts of age, period, and cohort on incidence and mortality of LCa; (4) Zero-inflated Poisson regression model to assess the extent of overdetection of early LCa by each histological type with and without smoking; (5) Bayesian multistate Markov regression model for estimating incidence rate ,including progressive and non-progressive of LCa and death rate of LCa so as to assess the extent of overdetection of early LCa and also examine how the extent of overdetected early LCa affected death rate of LCa; (6) Adjusted survival model with histology type for assessing gender difference in the survival of LCa after adjustment for histological type distribution; (7) Together with the conduction of a matched case-control study to identify possible risk factors. Methods: Two sources of database were used, aggregate nationwide cancer registry data primary community survey data (KCIS). Data on incident lung cancer cases during the period 1979-2012 were obtained from the Cancer Registry of the Health Promotion Administration, Ministry of Health and Welfare, Taiwan. The mortality rates of lung cancer between 1979 and 2014 were also used. The histology type of lung cancer in the data base was recorded from 1995 to 2012. The cancer stage of both sex was also recorded from 2004 to 2012. The KCIS database was from subjects participating the community-based surveys in Keelung, Taiwan, conducted from 2000 to 2007. Lung cancer patients were identified according to the ICD9 and International Classification of Diseases for Oncology, Second Edition (ICD-O-2), from National Health Insurance Database. Statistical models used as indicated above include (1) Box-Jenkins autoregressive integrated moving average (ARIMA) for modelling time trend: The decomposition method to determine if there was a significant trend of outcomes by identifying the order (p and q) of the autoregressive (AR) and the moving average (MA) terms for computing ARIMA model. The Ljung-Box statistics were computed to access model adequacy; (2) Bayesian age-period-cohort (APC) model for modelling incidence and mortality as a function of age, period, and cohort effects with data aggregated by 5-year ranges: Each cohort group was defined according to Bray’s method: cohort group (c) = Total age group number (A)+ period group (p) – age group (a) and an individual cohort c can be followed diagonally. A Gaussian autoregressive prior model was used to smooth the age, period, and cohort effects and to extrapolate the period and cohort effects from their second autoregressive order. Gamma (0.001, 0.001) distributions are used to give vague hyperpriors; (3) Zero-inflated Poisson for analyzing the fraction of possible overdetection of early lung cancer: This method was applied to the lung cancer cases from the KCIS data. The two model components were described as follows: zero-counts of death and lung cancer death. The zero-counts part included overdetected early LCa and slow progress to death. Two logistic regression models were used to correlate the proportion of extra zero, with histological types with and without adjustment for smoking status. Both were conjugated with a zero-inflated Poisson regression model for the association between progression rate and covariates of age, gender, and histological types; (4) Two Markov regression models without and with overdetection of early LCa which treated histological type as covariate, were developed to fit the nationwide archived data. The other two models were multi-state competing risk Markov model without and with overdetection of early LCa, which treated different histological types as separate states and applied to the KCIS data. To model different incidence rates of lung cancer by histological type and the corresponding hazard rates from lung cancer to lung cancer death, we used the exponential regression form to construct the relationship between histological types and the transition rates; (5) Survival adjustment, a graphic method, attributable proportion model (APM), for calculating attributable survival with time according to histology; (6) Finally, a case-control study was design in a ratio of 1:1. A total of 656 cases and 656 controls participated in this investigation. Male and female were analyzed separately. Cases and controls were matched for sex and birth year. The birth range was 2 years. Several exposures were analyzed including alcohol use, betel nut use, cooking habits and incense burning. The occupation was analyzed. We used McNemar test to evaluate the possible risk factors with two categories. To survey occupational difference of developing lung cancer, conditional logistic regression was used. Results: 1. Time series Box-Jenkins model (1) Application of the Box-Jenkins model with ARMA (1,1.0) and to nationwide aggregate data between 1980 and 2012 yielded an increasing trend of age-standardized incidence rate, a decreasing trend of case-fatality rate (the ratio of mortality to incidence), giving a stable age-standardized mortality using the Box-Jenkins model with ARMA (0,1,0). (2) The increasing trends with the Box-Jenkins model were noted for adenocarcinoma and small cell carcinoma, the plateau curves were observed for squamous cell carcinoma, and the decreasing trend for the unspecified type, all of which imply the surge of adenocarcinoma. (3) The Box-Jenkins models predicted age-standardized incidence rate from 45.71 per 100,000 in men and 26.57 per 100,000 in women in 2013 to 48.82 per 100,000 in men and 29.25 per 100,000 in women in 2017 and age-standardized mortality rate from 34.34 per 100,000 in men and 17.2 per 100,000 in women in 2015 to 35.90 per 100,000 in men and 18.08 per 100,000 in women in 2019. (4) The Box-Jenkins models predicted age-standardized incidence rate from 30.77 per 100,000 in men and 32.35 per 100,000 in women in 2013 to 35.75 per 100,000 in men and 32.34 per 100,000 in women in 2017 for adenocarcinoma; from 14.8 per 100,000 in men and 2.08 per 100,000 in women in 2013 to 16.18 per 100,000 in men and 2.12 per 100,000 in women in 2017 for squamous cell carcinoma; from 7.18 per 100,000 in men and 0.94 per 100,000 in women in 2013 to 7.99 per 100,000 in men and 1.04 per 100,000 in women in 2017 for small cell carcinoma, 2. Bayesian age-period-cohort regression model (1) Applications of Bayesian APC regression model gave the similar findings but revealed a decreasing trend of distinct age-standardized mortality rate after considering age, period, and cohort effects; (2) The increasing trends with the APC regression model were also seen for adenocarcinoma in both males and females and small cell carcinoma in males, the decreasing trends for squamous cell carcinoma, and the decreasing trend for the unspecified type, all of which imply the surge of adenocarcinoma but the declining trend for squamous cell carcinoma. (3) The Bayesian age-period-cohort model indicated strong period and cohort effects (4) Both period and cohort were clearly seen in females aged 50-64 years, indicating the possibility of period effect of overdetection of early LCa focused on this age band. 3. Adjusted survival model with histology type (1) Females had higher survival and more early stage than males; (2) Gender difference in survival disappears after adjustment for stage; (3) Histological type distribution only accounted for 33% disparity of survival between males and females. 4. Zero-inflated Poisson regression model (1) The use of zero-inflated Poisson regression model to primary data from the KCIS program found the proportions of overdetected early LCa were 15% for adenocarcinoma, 12% for squamous cell carcinoma, 5% for small cell carcinoma, and 12% for the unspecified type. (2) The non-smokers had higher likelihood of having overdetected early LCa than the smokers. 5. Bayesian multistate Markov regression model While applying Bayesian multistate Markov regression model to two datasets give the following results. (1) Primary data from the KCIS program A. Modelling the joint incidence of progressive and non-progressive LCa and death rate gave the proportions of overdetection of early LCa among each histological type, being 24% for adenocarcinoma, 10.1% for squamous cell carcinoma, 0.8% for small cell carcinoma, and 14.8% for the unspecified and the proportions of overdetected early LCa among all LCa, being 10.5% for adenocarcinoma, 0.1% for small cell carcinoma, and 4.4% for the unspecified type by using primary data from the KCIS program; B. The death rates (per year) were 0.54 for squamous cell carcinoma, 0.32 for adenocarcinoma, 1.07 for small cell carcinoma, and 0.49 for the unspecified before adjusting for overdetection of early LCa and the corresponding death rates were inflated to 0.83 for squamous cell carcinoma, 0.68 for adenocarcinoma, 1.46 for small cell carcinoma, and 0.98 for the unspecified after adjusting for overdetection of early LCa; (2) Aggregate archival data from nationwide cancer registry A. Modelling the joint incidence of progressive and non-progressive LCa and death rate gave 29% of overdetection of early LCa among adenocarcinoma type and 13% among all LCa; B. The death rates were 0.29 before adjusting for overdetection of early LCa and inflated to 0.97 after adjusting. Conclusion: This thesis demonstrates how to apply and develop a series of statistical models to analyzing time trends of incidence, case-fatality, and mortality over the past three decades from 1980 until 2012 by using nationwide cancer registry data and community data. The application of Box-Jenkins model can be used to predict gender-specific overall and histology-specific incidence and mortality of LCa. The application of Bayesian APC model noted a decreasing trend in age-standardized mortality attributed to decreasing time trend of case-fatality although a increasing time trend of incidence (after the balance between a tremendous increase in adenocarcinoma and small cell carcinoma and a slight decrease in squamous cell carcinoma) was noted. The increasing time trend in incidence has been influenced by period effect resulting from overdetection of early LCa in the eligible age group for screening. The zero-inflated Poisson regression model and the multistate Markov regression models were proposed for modeling the extent of overdetection of early LCa with the highest proportion observed in adenocarcinoma, which may make the survival of LCa spuriously better without adjusting for overdetection of early LCa. Such overdetection of early LCa in adenocarcinoma may also account for the recently observed better survival in females compared with that in females after being explained by histological distribution.

參考文獻


Aberle, D. R., Adams, A. M., Berg, C. D., Black, W. C., Clapp, J. D., Fagerstrom, R. M., . . . Sicks, J. D. (2011). Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med, 365(5), 395-409. doi:10.1056/NEJMoa1102873
Bach, P. B., Mirkin, J. N., Oliver, T. K., Azzoli, C. G., Berry, D. A., Brawley, O. W., . . . Detterbeck, F. C. (2012). Benefits and harms of CT screening for lung cancer: a systematic review. Jama, 307(22), 2418-2429. doi:10.1001/jama.2012.5521
Bray, I. (2002). Application of Markov chain Monte Carlo methods to projecting cancer incidence and mortality. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(2), 151-164. doi:10.1111/1467-9876.00260
Chang, J. S., Chen, L. T., Shan, Y. S., Lin, S. F., Hsiao, S. Y., Tsai, C. R., . . . Tsai, H. J. (2015). Comprehensive Analysis of the Incidence and Survival Patterns of Lung Cancer by Histologies, Including Rare Subtypes, in the Era of Molecular Medicine and Targeted Therapy: A Nation-Wide Cancer Registry-Based Study From Taiwan. Medicine (Baltimore), 94(24), e969. doi:10.1097/md.0000000000000969
Chen, W.-Q., Zheng, R.-S., & Zeng, H.-M. (2011). Bayesian age-period-cohort prediction of lung cancer incidence in China. Thoracic Cancer, 2(4), 149-155. doi:10.1111/j.1759-7714.2011.00062.x

延伸閱讀