透過您的圖書館登入
IP:52.14.126.74
  • 學位論文

有限回授之多天線非正交多工接取系統下使用強化學習選擇調變編碼模式

On Using Reinforcement Learning to Select Modulation/Coding Schemes for Non-Orthogonal Multiple Access in Multi-User Multiple-Input Multiple-Output Systems with Limited Feedback

指導教授 : 謝宏昀
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


有許多的研究專注於結合多天線多使用者系統(MU-MIMO)和非正交多工接取(NOMA)這兩項技術去提升流量,但這些研究大部分並沒有考慮在實際LTE/LTE-A環境下,通道資訊回饋是有限的。而根據我們的分析,有限通道資訊回饋造成的量化誤差會導致不合適的資源配置,因此,在結合這兩項技術時,必須考慮如何獲得準確信號與干擾雜訊比(SINR)的方法。為了避免改變目前LTE-A的通道回饋規範,本論文採用Outer Loop Link Adaption(OLLA),利用混合動態回傳(HARQ)來動態調整估SINR。對於LTE-A連結時間短的連線,OLLA的收斂速度會是一個重要的議題。針對該議題,現行的OLLA多只考慮通道品質指標(CQI),但在多天線系統中,預編碼索引(PMI)以及不同排程組合的干擾也須列入考量。在本論文中,我們使用強化學習去改良OLLA。強化學習可以自動與環境互動,觀察出各種不被現有知識限制的策略。但當將強化學習應用至一個新的領域時,對於該問題的了解會是能否有效訓練的關鍵,因此,我們分析了何種因素會影響OLLA的策略。基於該分析,我們考慮了排程過的使用者資訊、通道回饋、偏好的調制與編碼策略和一起排程的使用者以設計合適的特徵擷取、獎賞設計(reward shaping)和探索(exploration and exploitation)機制,並對訓練相關參數進行各種嘗試,以提供更有效率的訓練架構。與OLLA的基線相比,我們提出的方法在結合NOMA+MU-MIMO時有7%的增益,在MU-MIMO有14%的增益。此外,OLLA收斂速度增快了38%。總而言之,我們提出一個能自動改良OLLA的架構,此架構能有效針對不同回饋處理干擾並改善流量和收斂速度。

並列摘要


Much work has been done to improve the overall throughput by jointly considering MU-MIMO and NOMA, but little has considered the combination of these techniques under practical environments in LTE/LTE-A, in which the feedback of CSI is limited. Based on our analysis, a method capable of obtaining accurate SINR is important to reduce the improper resource allocation caused by such limited CSI feedback. In this work, to avoid changing the current feedback architecture in LTE-A, we adopt outer loop link adaption (OLLA) to dynamically modify the MCS according to HARQ. Convergence plays a crucial role while applying OLLA in LTE-A due to the characteristics of short connections. For the convergence issue, only CQI is considered in most existing OLLA, while PMI and pairing should be taken into account in MU-MIMO. In this work, we adopt the reinforcement learning to enhance OLLA. Reinforcement learning is a technique which can explore unknown strategies through interacting with the environment. When applying reinforcement learning in a new field, domain knowledge is important for effective training. Therefore, the factors affecting the strategy of OLLA, including the past assigned MCS of the scheduled users, feedback, desired MCS, and pairing user, are analyzed for state design, reward shaping, and exploration and exploitation. Our proposed method improves the throughput by 7% in NOMA+MU-MIMO and by 14% in MU-MIMO. Moreover, the convergence speed is increased by 38%. To conclude, we propose an architecture that can enhance OLLA automatically, deal with the interference, improve the throughput, and accelerate the convergence under different types of feedback.

參考文獻


[1] N. Wooseok, B. Dongwoon, L. Jungwon, and K. Inyup, “Advanced interference management for 5g cellular networks,” IEEE Communications Magazine, vol. 52, no. 5, pp. 52–60, 2014.
[2] T. Yoo, N. Jindal, and A. Goldsmith, “Multi-antenna downlink channels with limited feedback and user selection,” IEEE Journal on Selected Areas in Communications, vol. 25, no. 7, 2007.
[3] M. Trivellato, F. Boccardi, and F. Tosato, “User selection schemes for mimo broadcast channels with limited feedback,” in Vehicular Technology Conference, 2007. VTC2007-Spring. IEEE 65th. IEEE, 2007, pp. 2089–2093.
[4] J. Schaepperle and A. Regg, “Enhancement of throughput and fairness in 4g wireless access systems by non-orthogonal signaling,” Bell Labs Technical Journal, vol. 13, no. 4, pp. 59–77, 2009.
[5] M.-J. Yang and H.-Y. Hsieh, “Moving towards non-orthogonal multiple access in next-generation wireless access networks,” in Communications (ICC), 2015 IEEE International Conference on. IEEE, 2015, pp. 5633–5638.

延伸閱讀