個人電腦的運算與連線能力在近年來大幅提升,網際網路上的惡意攻擊發起者不再將主要的攻擊目標放在伺服器,轉而進行Botnet的架設,透過散布電腦蠕蟲、木馬程式的方式攻擊個人電腦,再透過C&C Channel控制受害電腦進行惡意行為的方式獲取利潤,其對網際網路的危害已經是全球性的問題。Botnet在近年來使用了加密流量、Domain fluxing的技術隱藏其流量,為了有效阻擋運用這些技術的Botnet,使用DNS域名黑名單來阻擋Botnet的C&C連線是最有效的Botnet防治策略之一,如何從DNS域名中分辨出Botnet惡意域名是對抗Botnet威脅的重要議題。 本文嘗試利用語彙分析方法對DNS域名進行分析,並以不同的特徵組合,運用決策樹模型進行訓練與評估,以找出最適合辨識Botnet惡意域名的語彙特徵組合。我們提出五個主要特徵:域名長度,音節數量,母音數量,母音比例,字元重複數。實驗結果發現,在兩兩比對特徵時,母音比例與母音數量最能區分黑名單與白名單的樣本。在計算誤報與漏報的比率時,也是母音比例與母音數量這個組合效果最好,誤報率只有0.01%,漏報率為4.3%,顯示本文方法可以充分辨識惡意域名。
Personal computer computation power and connection capability dramatically increase such that personal computers becomes malicious attack major targets instead of traditional servers. By using worms and Trojan horses infecting victim personal computers, attackers establish their Botnets, which remote control victims performing malicious activities in order to make money and thus become major Internet threats. Encryption and domain fluxing become current major evading techniques for Botnet. In order to defend Botnet, DNS black list is one of the most effective defensive strategies to hidden Botnet connections. Therefore, effectively identifying malicious domain names is a critical issue for Botnet detection. This paper presents lexical analysis approach to find domain name different patterns, and adopts decision tree models to train the best combination of malicious domain name lexical features. We propose five major features: domain length, syllable count, vowel count, vowel ratio, character redundant. Experiment results show vowel ratio and vowel count can effectively differentiate black and white list samples while cross comparing them. While calculating false positive and negative rates, the combination of vowel ratio and vowel count provides the best results, 0.01% false positive rate and 4.3 % false negative rate, which show our approach can effectively identify malicious domain names.