透過您的圖書館登入
IP:3.149.238.159
  • 學位論文

以貝氏網路解析與提升深度神經網路模型之可釋性

Bayesian Network-based Interpretability on the Deep Neural Nets

指導教授 : 藍俊宏
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著深度神經網路模型演算法的突破,加上軟硬體計算力不斷地提升,使得過去訓練類神經網路的瓶頸得以跨越,因而催生了大規模且效果顯著的研究與產業應用。然而,給定一訓練良好、能輸出準確預判的深度神經網路,由於深度神經網路在經過隱藏層之間相連權重的計算經過線性與非線性的轉換,人們往往無法得知模型中參數的意義與判斷依據,因此推理過程往往為人詬病為一黑盒子,致使模型的可信度與被採用度大大降低,且在許多應用場景中,輸入層的變數對使用者具有確切的物理意義與解釋能力,回溯至輸入層並找出重要因素乃研究者注重的目標之一。 本研究中,將先回顧深度神經網路可釋性當前發展,並利用貝氏網路模型來解析已訓練完成的深度學習模型,利用層層剝離推論的方式,發展由輸出層往隱藏層、最後回推至輸入層的解析框架。接著分別以MNIST以及Fashion MNIST資料集訓練類神經網路模型,最後以視覺化的方式呈現經由貝氏網路演算法推理出的重要特徵 (即圖片上的像素),藉此解釋出黑盒子中的判斷邏輯,找出顯著影響模型的輸入變數,使深度學習的可信度與可釋性增加,提升產業採行深度神經網路技術的信心。

並列摘要


The breakthroughs in artificial neural network algorithms, coupled with the increasing computing power of hardware and software, have made it possible to train the Deep Neural Nets (DNNs). The conventional bottleneck, i.e., vanishing gradients, has been surmounted, leading to large scale and active research and industrial applications. However, given a well-trained DNN that can output accurate predictions, it is not easy to reason the extent to which a deep neural network will pass between the hidden layers. The nonlinear transformation embedded in the crisscross weights makes it almost impossible to know the basis of judgment. The reasoning process is often criticized as a black box, which severely reduces the credibility of the model and its adoption in many application scenarios. The variables in the input layer usually have precise physical meaning and interpretation to the users. In this thesis, the Bayesian network model is used to parse the information propagated within the DNNs. The layer-to-layer Bayesian models are built and integrated as an analytical framework from the output layer, through the hidden layers, and finally to the input layer. MNIST and Fashion MNIST datasets are employed to validate the proposed framework, and the black box is explained by presenting essential features, i.e., the pixels of the image, in a visualized way. In summary, an explainable model can increase the interpretability and enhance the confidence of industrial DNN applications.

參考文獻


[1] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE, 10(7).
[2] Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Mueller, K. R. (2010). How to explain individual classification decisions. The Journal of Machine Learning Research, 11, 1803-1831.
[3] Brendel, W., Bethge, M. (2019). Approximating CNNs with bag-of-local-features models works surprisingly well on imagenet. arXiv preprint arXiv:1904.00760.
[4] Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V. N. (2017). Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. arXiv preprint arXiv:1710.11063.
[5] Cooper G.F., Herskovits E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4), 309-347.

延伸閱讀