Traffic Characterization and Classification for Web Applications

近年來，隨著網頁應用快速地被開發，流量的特性與分類迄今已被廣泛地研究分析，由於其可應用在網路管理以及加強資訊安全等等。傳統的網路流量可藉由port號碼或是封包內容來分類，但這些特徵並不適用於網頁流量，因為所有網頁流量都是藉由port 80/443來傳送，而且通常是被加密的，因此我們是利用應用層的統計特徵來分析文件編輯(Google document)、地圖(Google map)、遊戲(Facebook上的Tetris Battle 和 Dungeon Rampage)、影音類(Youtube、Dailymotion 和 Tudou)和檔案傳送(Google drive 和 Dropbox)。我們用人工的方式去蒐集九個網頁應用程式已及兩個流覽器(Chrome 和 Firefox)產生的流量。在本篇論文中，我們僅基於在main connection的request方向中，取前五個最常出現的message size做為特徵，而main connection是最可代表使用者互動資訊的連線。待取完特徵後，我們挑選了Weka中的四個演算法(NBtree、Random Forest、J48graft 和 Naive Bayes)以及配合使用10-fold cross-validation做驗證，實驗證明我們的準確率在Random Forest這個演算法可達到93.89%，另外，我們也支援early classification的機制，準確率同樣可達93.89%. 為了驗證準確率，也另外找了多位使用者協助產生流量去分析，最後結果可以看到我們所使用的特徵值可以有效分類不同應用之流量。

關鍵字

無資料

並列摘要

With the increasing evolution of web applications, the traffic characterization and classification have been extensively studied in recent years, since they are widely applied in network management, security and so on. Traditional network traffic can be classified by port number or packets payload, but the features are unsuitable for web application traffic, which all runs on port 80 or port 443 and is usually encrypted. In this thesis, we use application-level statistical features to characterize the traffic from the web applications: office applications (Google document), map services (Google map), game applications (Tetris Battle, Dungeon Rampage on Facebook), video streaming applications (Youtube, Dailymotion, Tudou) and file sharing applications (Google drive and Dropbox). We manually collected packet traces from nine web applications and two browsers (Chrome and Firefox). This work features classifying web applications just based on the top-5 most frequent message sizes in the requests of the main connection, which is the most representative of user interactions. After extracting features, we use Weka with four algorithms to evaluate the accuracy, i.e., NBtree, Random Forest, J48graft and Naive Bayes. The experiment results show the accuracy can be up to 93.89% with random forest. Furthermore, this mechanism allows early classification with only the first 30 messages in the main connection and the accuracy can also achieve up to 93.89% with random forest. In addition, we collected the traffic from multiple users to evaluate classification, and the result shows that the features can also classify such traffic effectively.

並列關鍵字

Traffic characterization ； Traffic classification ； Web application

參考文獻

[15] S. Huang, K. Chen, C. Liu, A. Liang and H. Guan, ``A statistical-feature-based approach to internet traffic classifi cation using Machine Learning," In Proc. of the International Conference on Ultra Modern Telecommunications and Workshops (ICUMT), Oct. 2009.

[32] F. Schneider, S. Agarwal, T. Alpcan and A. Feldmann, ``The New Web: Characterizing AJAX Traffic," In Proc. of the Passive and Active Measurement Conference, 2008.

[1] T. T. Nguyen and G. Armitage, ``A Survey Of Techniques for Internet Traffic Classifi cation Using Machine Learning," In Proc. of IEEE Comm. Surveys Tutorials, vol. 10, no. 4, pp. 56-76, Oct.-Dec.2008.

[2] J. S. Park, S. H. Yoon and M. S. Kim, ``Performance Improvement of Payload Signature-based Traffic Classifi cation System Using Application Traffic Temporal Locality," In Proc. of the 15th Network Operations and Management Symposium (APNOMS), Sept. 2013.

[6] Y. Xue, D. Wang and L. Zhang, ``Traffic Classifi cation: Issues and Challenges," In Proc. of the International Conference on Computing, Networking and Communications (ICNC), Jan. 2013.

國際替代計量

Traffic Characterization and Classification for Web Applications

未授權

主題瀏覽