透過您的圖書館登入
IP:18.222.180.118
  • 學位論文

基於深度學習之HCC病理影像分類:準確率與訓練資料集大小之關係

Deep Learning-Based Hepatocellular Carcinoma Histopathology Image Classification: Accuracy versus Training Dataset Size

指導教授 : 陳永耀

摘要


在全球,肝癌每年導致700,000多例死亡,是癌症死亡的第二大主要原因。肝細胞癌(Hepatocellular carcinoma, HCC)是成人中最常見的肝癌類型,占肝硬化患者的大多數死亡原因。早期的肝癌患者如果可以透過手術干預治療,預後通常會較為良好。因此,早期的病理影像學診斷是對抗肝癌的必要步驟。然而,常規的人工病理學診斷,病理醫師需要耗費大量的時間和精力,針對病理影像中的癌細胞正確位置進行詳細的檢查,並且,不同經驗的醫師的檢查結果可能也會有所差異,這對於診斷的準確率以及效率帶來了極大的挑戰。近年來,基於深度學習的病理影像分類器,在其他不同類型的病理影像的分類研究中,已被證明可以有效的輔助病理醫師進行更快速且正確的診斷。 根據過往研究指出,深度學習的分類準確率,與已標記的訓練資料數量呈現正相關。然而,通常難以確定需要標記多少數量的病理影像做為訓練資料,才能在臨床診斷中取得良好的表現。值得注意的是,一張病理影像的尺寸通常高達數十億像素以上,此特殊性質使得病理影像的人工標記成本相對於其他醫學影像昂貴許多。因此,如果在進行標記工作之前,就能夠依照診斷需求,提前估計所需要的訓練資料數量,應可更有效的優化標記成本、降低人力負擔。然而,目前對於病理影像所需要的訓練資料數量估計,仍缺乏可參考的相關研究。因此,本研究除了運用深度學習方法,對於肝細胞癌病理影像進行二分類,亦深入探討了肝細胞癌病理影像的分類準確率,與用於訓練的已標記資料集大小之間的關係,本研究的主要貢獻如下: 第一、本研究應用了GoogLeNet (Inception-V1)深度學習模型,對於肝細胞癌病理影像進行二分類。本研究運用了25張個案影像所訓練之模型,對於未經過訓練之4張個案影像進行分類測試,其分類準確率可達91.37%(±2.49%),靈敏度可達92.16%(±4.93%),特異性可達90.57%(±2.54%)。除此之外,本研究透過了單一影像所訓練之模型,確定了肝細胞癌病理影像的多樣性,將會極大程度的影響模型對於未經過訓練之個案影像的分類準確率。 第二、本研究深入探討了不同大小的訓練影像資料集,與其對應之未經過訓練之個案影像分類準確率的關係。基於此關係,本研究進一步運用了一種基於逆冪函數的估計模型 (Inverse power law function-based estimation model),預估到達臨床診斷所需的分類準確率時,所需要標記的肝細胞癌病理影像的最低數量,此估計數量可作為未來肝細胞癌病理影像訓練資料收集、以及標記的重要參考依據。 第三、本研究深入探討了不同數量的肝細胞癌病理影像訓練子切片(Patch),與其對應之訓練影像分類準確率的關係。於此基礎之上,本研究進一步提出了一種基於低信心率的估計方法 (Low Confidence Rate-based estimation method),本方法可依據所需要的分類準確率,預估每一張不同的肝細胞癌病理影像,所需要標記的訓練子切片最低數量,此估計值可作為病理醫師標記肝細胞癌病理影像子切片時的最低數量參考依據。

並列摘要


Globally, liver cancer causes more than 700,000 deaths each year and is the second leading cause of death from cancer. Hepatocellular carcinoma (HCC) is the most common type of liver cancer in adults and accounts for most deaths in cirrhosis patients. Patients with early-stage liver cancer can be treated by surgical intervention with a good prognosis; thus, early diagnosis, as confirmed by liver pathology examination, is necessary to fight HCC. Conventional manual pathology examination requires considerable time and labor, even with established expertise. It is widely accepted that deep learning-based classifiers may prove useful in the diagnosis process. Although the classification accuracy of deep learning is positively correlated with the amount of training data, it is often uncertain how much training data are required for deep learning to achieve satisfactory clinical diagnosis performance. Notably, annotating the gigapixel histopathology image needs huge costs and burdens. It leads the annotated training data of the histopathology image relatively difficult and expensive to obtain. Hence, estimating the required training dataset size before annotating work should effectively reduce the workload and optimize the efficiency of annotating. However, there is no effective method to estimate the required training dataset size for the HCC histopathology images. The main contributions of our study are as follows. First, we apply GoogLeNet (Inception-V1) to classify HCC histopathology images. The testing accuracy for new images obtained by the 25-Image Training Model reached 91.37% (±2.49%) accuracy, 92.16% (±4.93%) sensitivity, and 90.57% (±2.54%) specificity. Moreover, we determined that the diversity of HCC histopathology images greatly affected the testing accuracy for new images when using single image training. Next, we investigated the relationship between the testing accuracy for the new images and the number of training images. Then, we applied the inverse power law function-based fitting curve to estimate the minimal number of HCC-annotated training images required to achieve the desired diagnostic accuracy. This number can be the reference for the training image collection and annotation. Finally, we explored the relationship between the testing accuracy and the number of training patches for a given whole slide image. Further, A Low Confidence Rate-based estimation method is proposed. This method can determine the required number of training patches for a given whole slide image to achieve the desired testing accuracy, providing the minimum number of annotated training patches for a given whole slide image.

參考文獻


[1] "Key Statistics About Liver Cancer," Cancer.org, 2020. [Online]. Available: https://www.cancer.org/cancer/liver-cancer/about/what-is-key-statistics.html.
[2] World Health Organization, "World Cancer Report 2014," Chapter 5.6, 2014.
[3] A. Forner, J. Llovet, and J. Bruix, "Hepatocellular carcinoma," The Lancet, vol. 379, no. 9822, pp. 1245–1255, 2012.
[4] W. Huang et al., "Automatic HCC Detection Using Convolutional Network with Multi-Magnification Input Images," in Proc. 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Taiwan, 2019, pp. 194–198.
[5] Y. Liu et al., "Detecting Cancer Metastases on Gigapixel Pathology Images," arXiv.org, 2017.

延伸閱讀