眾所周知,卷積神經網路容易受到對抗例攻擊,將精心構造出的雜訊加入輸入資料,即使是最強悍的卷積神經網路都會被誤導而做出錯誤的決策。因此,我們迫切需要發展一套能測試或檢測卷積神經網路弱點的方法,提出名為雙迭代融合(DIF)的重要像素點對抗例攻擊,用以偵測卷積神經網路的弱點。在一張解析度為32x32的圖片裡,DIF僅修改了5個像素點就能對卷積神經網路達到快速並有目標性的攻擊。利用DIF來測試神經網路,我們觀察到在許多經典的圖像分類卷積神經網路模型上,存在著一些特定的類別比其他類別更脆弱,即某一些類別比其他類別更容易受到對抗例攻擊的影響。以本篇實驗中的VGG19模型為例,其脆弱的類別為貓,擁有57.01%的類別成功指定率,而其他的類別都低於25%;在ResNet18模型中,其脆弱的類別為飛機,擁有37.08%的類別成功指定率,而其他的類別皆低於12%。這些類別可以被認為是卷積神經網路中的弱點,並且能被DIF方法所產出對抗例精確測試出來。透過加入對抗例來重訓練目標卷積神經網路,能緩和被攻擊後的誤判率,增強模型的強固性。重訓練後,脆弱類別的誤判率最多從61.67%降低到6.37%。
Convolutional neural networks (CNNs) are known to be vulnerable to adversarial attacks. Well-crafted perturbations to the inputs can mislead a state-of-the-art CNN to make wrong decisions. Therefore, there is a pressing need for the development of methods that can test or detect the vulnerability of CNNs. In this study, we present an adversarial attack method, called Dual Iterative Fusion (DIF) with Critical Pixels, to detect the vulnerability of CNNs. DIF modifies as few as five pixels out of 32x32 images in this study and achieves faster and more effective targeted attacks for testing CNN. Using DIF, we have observed that in many classical CNNs for image classification, some classes are more vulnerable than others, i.e. some classes are susceptible to misclassification due to adversarial attacks. For example, in the VGG19 trained for this study, the vulnerable class is Cat, with a successfully-targeted attack rate of 57.01% while the other classes are lower than 25%. In the ResNet18, the vulnerable class is Plane, with a successfully-targeted attack rate of 37.08% while the other classes are lower than 12%. These classes should be considered as vulnerabilities in the CNNs, and are pinpointed by generating test images using DIF. The issues can be mitigated through retraining the CNNs with the adversarial images generated by DIF, and the misclassification rate of the vulnerable classes declines at most from 61.67% to 6.37% after the retraining.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。