透過您的圖書館登入
IP:18.191.125.109
  • 學位論文

深度學習模型視覺化的統計觀點

Statistical Viewpoints for Deep Learning Model Visualization

指導教授 : 陳素雲 謝文萍
本文將於2026/10/21開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


近年來機器學習及深度學習得益於硬體及算法的發展,在以往統計模型難以處理的問題 上取得了重大的發展。諸如語音及圖像辨識等等。然而,過於複雜的模型架構同樣使得產生 的決策難以被解釋。在一些應用場景上若無法給出明確的解釋則會發生咎責困難或是可信度 的問題,例如一些醫療場域等等。另一方面,在研究人員而言,則是希望可以透過模型的解 釋來修正、增進模型的效果。 基於上述背景,目前已存在多個針對影像辨識模型視覺化的解讀方法。但這些方法的解 釋各不相同外,也並無一個標準的解釋準則及嚴謹的論述,各自提出其認為可行的方案。 在本篇論文中,我們挑選出現今較為熱門的幾個解釋模型,以較易推導的統計模型為 基礎,嘗試推導各自的統計意義,並檢視其合理性。除此之外,以這些架構為基礎,利用 MNIST 及 Chest X ray 資料集來比較個方法,並且驗證推導結果。 結果顯示,儘管在一定的程度上存在解釋的效果,但多數方法在較為複雜的模型及較為 多樣的資料中依然時常呈現出無法解釋的結果。可以看到,在實際資料上,解釋的效果很大 程度依賴於模型的選擇、解釋的資料、以及圖片的呈現。

並列摘要


In recent years, machine learning and deep learning have benefited from the development of hardware and algorithms, and have made significant developments in the problems that were difficult to handle with statistical models in the past such as voice and image recognition and so on. However, the overly complex model architecture also makes the resulting decisions difficult to interpret. In some application scenarios, if a clear explanation cannot be given, it will lead to credibility issues. On the other hand, as far as researchers are concerned, they hope to modify and enhance the effect of the model through the interpretation of the model. Based on the above background, there are currently several interpretation methods for the visualization of image recognition models. However, the interpretation of these methods is different, and there is no standard interpretation criterion and rigorous exposition, and each proposes the solution that it thinks is feasible. In this paper, we select several explanatory models that are popular today and try to derive their statistical significance based on statistical models that are easier to derive and examine their rationality. In addition, based on these frameworks, the MNIST and Chest X-ray data sets are used to compare the methods and verify the deduction results. The results show that although there is an explanatory effect to a certain extent, most methods still often show unexplainable results in more complex models and for more heterogeneous data. It can be seen from the real data examples that the effect of interpretation largely depends on the choice of model, the data set for interpretation, and the particular pick of images for presentation.

參考文獻


Mahendran, A. and Vedaldi, A. (2016). Visualizing deep convolutional neural networks usingnatural pre-images.International Journal of Computer Vision, 120(3):233–255.Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. InProceedingsof the IEEE international conference on computer vision, pages 618–626.Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks:Visualising image classification models and saliency maps.arXiv preprint arXiv:1312.6034.Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. (2014). Striving for simplicity:The all convolutional net.arXiv preprint arXiv:1412.6806.Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. InInternational Conference on Machine Learning, pages 3319–3328. PMLR.Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., and Lipson, H. (2015). Understanding neuralnetworks through deep visualization.arXiv preprint arXiv:1506.06579.Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer.Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016). Learning deep featuresfor discriminative localization. InProceedings of the IEEE conference on computer visionand pattern recognition, pages 2921–2929.

延伸閱讀