流量分類在網路的管理中,扮演了一個重要的角色,由於加密流量無法被解析取得其內容,致使傳統的深度封包檢測技術對其連線無法加以辨識,也因為如此,不依靠封包內容的統計式分類技術開始發展,然而因為區域性的問題,也造成了這個技術難以實現於真實世界中。在許多的機器學習研究中,只能透過深度封包檢測技術來獲取訓練集所對應的真實解,這種做法卻也與加密流量問題相互矛盾。此外,隨著智慧型裝置的流量逐年倍增,智慧型應用的辨識功能也無法再被忽視。不幸的,上述這些未解決的問題也使得統計式流量辨識技術僅能實作於研究中。 為了解決這些問題並且提高準確度,本論文提出基於統計特徵的應用辨識系統。此系統採用「應用層回合制」演算法與統計方法進行包含加密流量在內的流量行為分析,所有的統計資訊與對應的應用程式名稱將被送雲端平台藉由多種機器學習演算法建立分類模型。針對區域性問題,本論文以多個實驗結果證明此問題嚴重性,並且設計出多層式架構來對訓練集進行分群與建立多個分類模型,最後於分類時,基於網路環境來選擇出最佳的模型進行流量辨識。本論文加入了智慧型裝置訓練流程,促使系統具有智慧型應用的辨識功能。最後本系統佈建於雲端平台,擴展性架構將使系統能承受大量分類請求,並且增加本系統運作於真實世界的可能性。
Traffic classification plays an important role on the management of networks. Traditional deep packet inspection (DPI) cannot be used to analyze encrypted traffic if the key pairs haven’t be captured in the early communication flow. The statistical based traffic classification is developed to analyze traffic without the content of packets; however, the locational issue causes that the statistical based classification is hard to work in the real world. And generating ground truth of training data via DPI in most machine learning studies is in contradiction to the problem of encrypted traffic. Furthermore, as traffic of smart devices continues its meteoric rise within the few years, the classification of apps also no longer can be ignored. Unfortunately, the above-mentioned pending issues lead to the statistical based traffic classification in the academic research. This thesis proposes a statistical signatures based application identification system to solve the following problems and accelerate classification. This system uses the application round technique and statistic methods to handle encrypted traffic and analyze flow behaviors. All of the statistical information are sent to servers and trained to classified models by multiple machine learning algorithms. For locational issues, the multi-stage architecture is designed to separate training data and build multiple models, it also selects the best model based on the network environment for apps identification feature. The smart device training architecture is contained in this system that enables the feature of apps classification. At last, the system is deployed in the cloud, and the scalable architecture allows the system to handle large amounts of classifying requests. This system is possible to implement in reality.