基於應用統計行為特徵之流量辨識

網路流量辨識在網路管理上是一項基本且重要的工作。以前的管理者大多根據檢查應用程式使用的網路埠號碼或是應用程式層級的特徵碼來辨別應用程式。不過，當應用程式愈趨增加，應用程式使用動態網路埠或是封包加密等方式時，上述兩方法已不能再保證辨識正確性。所以在此論文中，我們提出四種利用網路應用程式外在統計行為特徵：(1)以連線為單位、依封包大小分布不同判斷的流量辨識方法，(2)以會話為單位、依封包大小分布不同判斷的流量辨識方法，(3)以連線為單位，依封包出現序列判斷的流量辨識方法，及(4)結合依封包大小分布不同與封包出現序列之流量辨識方法。有鑑於網路流量還可以用在(1)網路產品的實地試驗、重播測試，與(2)網路服務之程式碼測試上，所以我們針對這兩項應用設計了(1)「易觸發產品問題之問題流量」及『辨識縮減問題流量』，與(2)「程式碼測試涵蓋率」及『流量多樣性評估』來處理相關議題。關於(1)，為了減少重播時間及能有效率的重現待測產品的錯誤，如何縮減產生網路產品錯誤的流量是必須面對的問題，我們提出了線性縮減與二元縮減的演算法來進行流量檔案大小的縮減。為了評估縮減方法的有效性，我們使用縮減比率來做為評估單位。關於(2)，我們定義了「流量多樣性」以及提出『計算流量多樣性與分析程式碼測試涵蓋率』的方法，以具有不同數量封包及從不等大小的網段蒐集到的流量來測試網路服務程式碼的測試涵蓋率大小。實驗的結果顯示，依會話為單位、依封包大小分布來判斷的辨識方法可以在300個封包內做出決定，且平均準確率可以高達99.98%，吞吐量約有400Mbps；而依封包出現序列判斷的辨識方法最多僅需要每條連線前15個封包就可以做決定，且平均準確率可以有94.99%，其處理吞吐量換算約有800Mbps；若是結合這兩者的辨識方法，則有94.12%準確率及約723Mbps的吞吐量。其他議題方面，從網路產品觸發的錯誤分佈來看，在所有的錯誤中，ICMP和HTTP造成的錯誤占了所有網路設備錯誤的60%；我們提出的線性縮減與二元縮減分別對於流量達到了77%與80%的縮減比率，根據結果來看，測試的時間也依照同樣的比例減少，證明我們的縮減方法可以對於網路產品測試的效率有大幅度的改善。在程式碼測試涵蓋率評估方面，我們發現隨著封包數量增加及網域變大，流量的豐富度也是隨之增加，且測試涵蓋率會隨著封包數量增加及網域變大而增加。有了這些結果與經驗，我們透過硬體化設計技術，模擬前面提出的流量辨識方法硬體化，來加速及評估基於應用統計行為特徵之流量辨識技術的瓶頸；模擬結果發現，由封包出現序列判斷的流量辨識方法做出最後決定的比率約佔77.4%，由封包大小分布不同判斷的流量辨識方法做出最後決定的比率約佔22.6%，且吞吐量可超過80 Gbps。

關鍵字

流量辨識；封包大小分布；封包傳送序列；流量多樣性；流量檔縮減；硬體化模擬設計

並列摘要

Classifying network flows into applications is a fundamental requirement for network administrators. Administrators used to classify network applications by examining transport layer port numbers or application level signatures. However, most emerging network applications have the abilities of encrypting traffic or sending traffic with randomized port numbers. This makes it challenging to detect and manage network applications. In this dissertation, we propose four statistics-based solutions: (1) flow-based message size distribution classification (MSDC(f)), (2) session-based message size distribution classification (MSDC(s)), (3) flow-based message size sequence classification (MSSC(f)), and (4) A hybrid solution Hybrid which combines the advantages of MSDC and MSSC. Furthermore, traffic can also be applied to other applications, such as (1) networking device testing and (2) code coverage evaluation. For (1), in order to efficiently reproduce the failures produced by networking devices and reduce the replay time, we propose a binary and a linear downsizing algorithms to reduce the size of the traces that trigger the failures of networking devices. A metric called downsizing ratio is defined is order to evaluate the efficiency of the traces downsizing. For (2), traffic diversity index is defined and a methodology for calculating traffic diversity and analyzing code coverage is proposed. Our numerical results show that the MSDC(s) can make a decision within 300 packets and achieve a high detection accuracy of 99.98% while the MSSC(f) classifier can respond by only looking at the very first 15 packets and have a slightly lower accuracy of 94.99%. Our implementations on a commodity personal computer show that running the MSDC(s), the MSSC(f), and the hybrid classifier in-line achieves a throughput of 400 Mbps, 800 Mbps, and 723 Mbps, respectively. Simulations based on hardware design show that MSSC contributed 77.4% of decision rounds and MSDC contributed 22.6% of decision rounds. Besides, the hybrid method can achieve high accuracy above 94% while achieving a throughput of 80 Gbps. For networking testing, the evaluation of the failures distribution show that ICMP and HTTP failures represent around 60% of the total number of failures experienced by networking devices. In addition, the binary downsizing and linear downsizing achieve a downsizing ratio of 77% and 80% respectively. As a result, the time needed to perform testing is reduced by the same amount. For the traffic diversity, the results show that code coverage increases with more number of packets or larger size of network segments.

並列關鍵字

traffic classification ； packet size distribution ； message size sequence ； traffic diversity ； traffic reduction ； hardware design

參考文獻

[LHL13] Ying-Dar Lin, Ren-Hung Hwang, Chun-Nan Lu, and Jui-Tsun Hung, “In-Lab Replay Testing under Real Traffic with a Case Study on WLAN Routers,” submitted to Journal of Internet Technology, available upon request, 2013.

[LLL14] Ying-Dar Lin, Yuan-Cheng Lai, Chun-Nan Lu, Jui-Tsun Hung, Chun-Pin Shao, “Traffic Diversity and Code Coverage: A Preliminary Analysis,” International Journal of Communication Systems (IJCS), to appear.

[CSJ+03] F. Hernandez-Campos, F. Donelson Smith, Kevin Jeffay, and Andrew B. Nobel, “Statistical Clustering of Internet Communications Patterns,“ Computer Science and Statistics, vol. 35, July 2003.

[LLJ+13] Ying-Dar Lin, Chun-Nan Lu, Jose Miguel Sagastume Jacobo, Jui-Tsun Hung, and Yuan-Cheng Lai, “On Event Reproduction Ratio in Stateless and Stateful Replay of Real-World Traffic,” Journal of Communications Software and Systems (JCOMSS), vol. 9, no. 4, pp. 212-221, December 2013.

[CLR+09] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Introduction to Algorithms, third edition, The MIT Press, July 2009.

國際替代計量

基於應用統計行為特徵之流量辨識

全文下載

主題瀏覽