透過您的圖書館登入
IP:3.136.97.64
  • 學位論文

利用機器學習之Android惡意程式偵測架構

Effectively Detecting Android Malware with Machine Learning

指導教授 : 洪士灝

摘要


現今行動裝置應用越來越普及,豐富且便利了我們的生活,然而,隨著應用程式的增加,惡意的應用程式也逐漸增加。根據 G DATA 的統計顯示,有超過一半的惡意應用程式是針對財務方面的應用,安裝了這類的應用程式,會造成使用者金錢損失。機器學習已經被廣泛地應用在判斷惡意程式上,大部份的論文使用 trigram 來抽取應用程式的行為,作為機器學習的輸入,但只有少數的論文在使用巨量的資料下能取得良好的準確率。 本論文針對 trigram 的缺點探討,並提出嶄新的輸入格式,搭配卷積類神經網路(CNN)作為機器學習的演算法來改善低準確率的問題。就我們所知,本論文是第一個將卷積類神經網路演算法應用在惡意程式偵測應用,且針對整個網路架構進行完整討論的論文。使用我們所設計的輸入格式,卷積類神經網路能達到類似 k-skip-n-gram 的效果,學習到更複雜的行為來偵測惡意程式。 根據我們的實驗結果,新的輸入格式在不同配置下的卷積類神經網路架構,都能達到很好且穩定的準確率。在使用 32,000 個應用程式下,本論文所提出的架構能達到 93.012% 的預測準確度、12.9% 的誤判率,同時我們整合卷積類神經網路和 SVM 兩個不同特性的機器學習演算法,能有效降低誤判率到達 3%。最後,我們將本論文的架構與 NVIDIA 低功耗的開發板 Jetson-TK1 結合,進行惡意程式學習與預測,雖然機器學習的訓練時間增加,但有效地節省了整體架構的耗電量。

並列摘要


Mobile applications are getting more and more popular nowadays with various kinds of applications to make our lives more convenient.Unfortunately, as the number of applications grows, malicious applications, also known as malware, arise as well.In addition, more than a half of malware are financially motivated and cause huge loss of money according to the statistics of G DATA in 2015.While machine learning techniques have been adopted to identify malware, most of the prior works use trigram as the input format to extract the behavior patterns for mobile applications, but only a few of them obtain good performance with a large dataset. In this thesis, we discuss the weaknesses of the trigram-based machine learning methods and further improve the accuracy of malware detection by adding a new machine learning method based on the convolutional neural network (CNN) with a novel flattened input format.To our knowledge, this is the first work to discuss the usability of CNN on malware detection. With the proposed flattened input format, our CNN scheme can perform a k-skip-n-gram dimensionality reduction which learns more flexible and complex patterns to detect different types of malware from the trigram-based methods. Our experimental results show that the flattened input format yield good and stable accuracies with a simple topology design of the CNN scheme under different configurations.With 32,000 applications in our training set, CNN achieves 93.01% prediction accuracy and 12.9% FNR. After looking into the results of CNN, we can reduce FNR to 3% by using aggregation with SVM while retaining a similar accuracy.We demonstrate that running CNN on NVIDIA Jetson-TK1 further saves a half of power consumption comparing to the modern graphic cards, which reveals a new application scenario with low-cost, pervasive malware detection even on mobile platforms.

參考文獻


Bibliography
[1] “Accelerate Machine Learning with the cuDNN Deep Neural Network Library,” https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/G_DATA_ MobileMWR_Q1_2015_US.pdf.
[2] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A survey of mobile mal- ware in the wild,” in Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices. ACM, 2011, pp. 3–14.
[3] “Kantar worldpanel comtech’s smartphone os market share data q4 2013,” http://www. kantarworldpanel.com/smartphone-os-market-share/.
[4] “Trendlabs3Q 2012 SECURITY ROUNDUP Android Under Siege: Popularity Comes at a Price,” http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/reports/ rpt-3q-2012-security-roundup-android-under-siege-popularity-comes-at-a-price.pdf.

延伸閱讀