一車聯網強化式學習框架之設計與實作：以感知式半持續性排程為例

近年來受惠於車聯網(V2X)技術的快速進展以及車輛感測器的佈署，各式蒐集的資料得以與鄰近的運算裝置透過車聯網進行交換，實現了智慧交通系統。然而由於車載網路環境具高度動態變化以及高QoS要求等特性，激勵研究者開始應用新的方法來設計車載網路相關演算法。受惠於車感測器蒐集的大量資料，一種人工智慧的算法，強化式學習(RL)，近年來被大量應用於車聯網領域來分析蒐集的資料進而提升相關性能。而RL的運作形式為算法不斷與模擬環境進行互動進而獲取資訊，並透過獲得的資訊進行決策優化。為進行相關算法的開發與探討，支援允許外部程式不斷互動並影響模擬環境的平台，變成不可或缺。目前有一廣泛被使用的強化式學習API，OpenAI-Gym，提供了大量支援RL運作形式的模擬環境，讓開發者得以進行相關探討。但OpenAI-Gym並未支援車載網路模擬環境，且現存的車載網路模擬器也皆未支援相關的RL API。因此在本篇論文中，我們設計與實作了一RL框架，V2X-Gym。該框架提供了一整合網路模擬器(ns-3)與道路交通模擬器(SUMO)的車載網路模擬器，此外該模擬器也同時支援OpenAI-Gym API。另外我們還在本篇文章中提出了一透過RL算法，Q-learning，來對3GPP車對車通訊演算法，感知式半持續性排程，進行通訊可靠度的提升。並透過V2X-Gym框架對該算法進行模擬與分析，進而驗證V2X-Gym的可行性與適用性。

關鍵字

車聯網；強化式學習； OpenAI-Gym ；車載網路模擬器；網路模擬器(ns-3) ；道路交通模擬器(SUMO) ； Q-Learning ；感知式半持續性排程

並列摘要

Recently, vehicles are equipped with sensors to collect information and communicate with surroundings, which enables the operations of the intelligent transportation system (ITS). The key enabler of these connectivity is vehicle-to-everything (V2X) communications. However, a variety of new challenges are produced due to the highly dynamic vehicular network environment, hence motivate the new methodology of designing vehicular network algorithms. Thanks to the large volumes of information collected by the vehicular sensors, reinforcement learning (RL) is introduced to exploit such collected data for enhancing the vehicular network performance. To evaluate RL algorithms, a simulation environment allowing interactions with agents for learning optimal policy is required. We note that, there is a de-facto RL framework called OpenAI-Gym, which provides lots of simulation environment allowing interactions with agents via standardized methods. However, OpenAI-Gym does not provide vehicular network simulation environment, and none of vehicular network simulators provide related RL APIs to allow evaluating/comparing proposed RL-enhanced vehicular network algorithms. In this paper, we design and implement an RL framework called V2X-Gym which contains a proposed vehicular network simulator implemented via integrating network simulator (ns-3) and traffic simulator (SUMO), and encapsulated behind OpenAI-Gym API to provide standardized environment for evaluating RL-enhanced vehicular network algorithms. Furthermore, we adopt an RL algorithm, Q-learning, to improve the reliability of 3GPP V2V sensing-based semi-persistent scheduling (SPS) algorithm. We evaluate this proposed algorithm via the V2X-Gym to prove the feasibility and applicability of our platform.

並列關鍵字

Vehicle-to-everything (V2X) ； Reinforcement Learning (RL) ； OpenAI-Gym ； Vehicular Network Simulator ； ns-3 ； SUMO ； Q-Learning ； Sensing-based semi-persistent scheduling (SPS)

參考文獻

[1] K. Sjoberg, P. Andres, T. Buburuzan, and A. Brakemeier, “Cooperative intelligent transport systems in europe: Current deployment status and outlook,” IEEE Vehicular Technology Magazine, vol. 12, no. 2, pp. 89–97, 2017.

Google Scholar

[2] X. Cheng, L. Yang, and X. Shen, “D2d for intelligent transportation systems: A feasibility study,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 1784–1793, 2015.

Google Scholar

[3] L. Liang, H. Ye, and G. Y. Li, “Toward intelligent vehicular networks: A machine learning framework,” IEEE Internet of Things Journal, vol. 6, no. 1, pp. 124–135, 2018.

Google Scholar

[4] G. Association et al., “An assessment of lte-v2x (pc5) and 802.11 p direct communications technologies for improved road safety in the eu,” 5G Automotive Association, Tech. Rep., 2017.

Google Scholar

[5] P. Lison, “An introduction to machine learning,” Language Technology Group (LTG), vol. 1, no. 35, 2015.

Google Scholar

國際替代計量

一車聯網強化式學習框架之設計與實作：以感知式半持續性排程為例

查找全文

主題瀏覽