透過您的圖書館登入
IP:3.142.198.129
  • 學位論文

內視鏡手術的語意分割:利用資料增強與中間層監督達到以少量資料訓練深層網路

Semantic Segmentation in Endoscopy Surgery: Using Data Augmentation and Intermediate Layer Supervision to Train Deep Neural Net with Few Data

指導教授 : 施吉昇
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著以內視鏡微創手術的興起,許多研究試著透過分析處理內視鏡影像來提供手術人員即時的協助。其中,我們試著處理內視鏡影像的語意分割問題,因為語意分割可以為許多應用提供重要的資訊,像是虛擬實境(VR)、擴增實境(AR)或是內視鏡的同時定位與建地圖(SLAM)。語意分割在都市場景或自然場景上已經累積了大量的研究,但是因為缺乏充足的學習資料,鮮少有研究觸及內視鏡手術。在這篇研究中,我們提出了一個資料增強的方法,和一種作用在網路中間層的監督模式,用來解決使用少量資料訓練深度網路的問題。實驗結果證明提出的方法可以比常用的資料增強方法更有效的提升網路的準確度。

並列摘要


With the increasing popularity of endoscope-based minimal-invasive surgery, many have tried to provide surgeons real-time assistance by processing video frames from endoscope. We aim at a particular problem, endoscopy semantic segmentation, which can provide important information for other applications like VR or endoscopy-SLAM. While semantic segmentation in other scenarios, e.g. urban scene or natural scene, has been intensively studied, seldom has reach the area of endoscopy surgery due to lack of large-scale, finely annotated dataset. In this work, we tried to solve the problem of training deep neural network with few training data in the case of endoscopy surgery, by introducing an aggressive data augmentation technique, and additional loss term which is applied on intermediate layers of network. Experiment results show that our proposed methods can improve network performance more effectively than commonly used data augmentation on endoscopy surgery dataset, and improve performance of state-of-the-art network by 4.67 in terms of mIoU(%)

參考文獻


[1] J. Song, J. Wang, L. Zhao, S. Huang, and G. Dissanayake, “MIS-SLAM: real-time large scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing,” CoRR, vol. abs/1803.02009, 2018. [Online]. Available: http://arxiv.org/abs/1803.02009
[2] N. Mahmoud, I. Cirauqui, A. Hostettler, C. Doignon, L. Soler, J. Marescaux, and J. M. M. Montiel, “Orbslam-based endoscope tracking and 3d reconstruction,”CoRR, vol. abs/1608.08149, 2016. [Online]. Available: http://arxiv.org/abs/1608.08149
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards real-time object detection with region proposal networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
[5] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.

延伸閱讀