應用非監督式機器學習於多維度路網資料之探勘

近年來由於智慧型運輸系統、物聯網科技以及無線網路科技的進步，加上政府機關對於資料開放的支持，目前可取得大量的交通資料。這些資料具有詳細的空間與時間資訊，甚至更複雜的資料維度。為了萃取隱藏在資料當中的重要資訊，勢必需要多維度的資料分析方法。　　本研究提出多維度路網資料的非監督式機器學習方法，用以分析多維度的交通路網資料。演算法利用多維度路網加權矩陣，計算路網在多維度中的距離，並結合K-Medoids演算法適用於離散資料之特性，發展集群分析演算法。為解決K-Medoids集群分析演算法對於初始集群種子與K值的敏感性，演算法採用兩個解決方案。首先，演算法以系統性間距採樣產生初始種子，降低演算法的隨機因素。集群分析演算法中導入集群分割與集群合併的方法，用以彌補初始種子選擇不佳對於結果的影響力。　　從高速公路車流量的集群分析中，可以發現演算法具有下列優勢。首先，演算法具有一致性與可靠性。由於系統性間距採樣降低了演算法的隨機要素，因此當給予相同的輸入資料與參數，可以預期演算法產出相同的集群結果。不同的K值對於結果的影響較低，但是適當的K值選擇對於演算法的效能有其助益。集群結果顯示演算法忠於路網的拓樸關係，距離相近但路網距離差距甚遠的資料不會被分配在同一個集群中。演算法也能成功辨識跨路網的交通樣態。集群結果也顯示演算法能分辨在時間與車流量維度的特徵的差異，將具有特殊時間或車流量樣態的資料具為一類。　　本研究的結果可以提供運輸管理、物流、交通地理等領域一個系統性分析時空或多維度路網資料的取徑，從集群中心可得知資料樣態的規則，而集群也能做為可操作的單元，供進一步的決策使用。

關鍵字

路網分析；多維度；非監督式機器學習； K-Medoids

並列摘要

In recent years, with the advanced of ITS, IoT and wireless communication technology, and also the positive attitudes toward open data from the government, we can retrieve a big amount of traffic data. These data contain detailed spatial and temporal information, and even features with complicated data dimension. In order to extract useful information hidden within the data, a multi-dimensional data analysis technique are required to extract useful information hidden in the data. This study designs an unsupervised machine learning approach for multi-dimensional network data. The algorithm adopts the concepts of network weight matrix and space-time matrix to calculate multi-dimensional distances in the network space. In combine with K-Medoids algorithm, which has the capability of dealing with discrete data, a clustering algorithm is developed. To solve the problems of the sensitivity to initial seeds and K value of K-Medoids algorithm, two methods are adopted. First, a systematic sampling approach for seeds generation is adopted to cut down on the randomness of the algorithm. Cluster splitting and merging method is introduced to compensate the poor seeds selection in the initial phase. From the case of highway traffic clustering, the algorithm demonstrates several advantages. First, the algorithm possesses consistency and robustness. Because systematic sampling seeds generation removes the randomness of the algorithm, the results can be expected throughout several experiments giving the same inputs and parameters. The algorithm also demonstrates that it respects the topology of the highway network. Features that are proximate in space but distant in network space will not be assigned as the same clusters. The algorithm can also recognize cross-system traffic patterns. The results of clustering also demonstrate that the algorithm can identify the difference in temporal dimension and the data dimension of traffic. Features with unique temporal and traffic patterns will be grouped together This study can provide an approach for systematically analyse space-time or multi-dimensional network data, which can be used in researches like transportation management, logistics and transportation geography. The medoids of the clusters can serve as the rules for traffic patterns. Also, the clusters can be used as operational units for further decision making.

並列關鍵字

network analysis ； multi-dimension ； unsupervised machine learning ； K-Medoids

參考文獻

Ankerst, M., Breunig, M. M., Kriegel, H. P., and Sander, J. (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM Sigmod record, 28(2), 49-60.

Google Scholar

Anselin, L. (1995). Local Indicators of Spatial Association - LISA. Geographical Analysis, 27(2), 93-115.

Google Scholar

Ball, G. H., and Hall, D. J. (1965). ISODATA, a novel method of data analysis and pattern classification. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/699616.pdf