透過您的圖書館登入
IP:3.17.154.171
  • 學位論文

具固定運算量晶格簡化與QR分解前處理之64-QAM 8x8多輸入多輸出偵測器

A 64-QAM 8x8 MIMO Detector with Constant-Throughput Lattice Reduction and QR Preprocessing

指導教授 : 黃元豪
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著無線通訊科技的演進,無線網路資料傳輸的速度或品質的要求日益增加,單純利用時域或頻域的資源已不能滿足高規格的無線通訊傳輸,多輸入多輸出天線(MIMO)無線通訊系統被提出並被廣泛使用來利用空間的多樣性,多天線的偵測器設計與實作也變成是一個很大的挑戰。傳統的多天線偵測器除了最佳偵測器(Maximum Loglikelyhood detector)可達到全多樣性(full diversity)的偵測效果,其它偵測器皆無法達到,因此晶格簡化演算法(lattice reduction)被提出應用,其演算法可強化其它多輸入多輸出天線非最佳偵測器得到全多樣性的偵測效果,且可逼近最佳解碼器(ML detector)的錯誤率。然而晶格簡化強化的多天線偵測器不論是晶格簡化演算法或是其前置的QR分解演算法跟其後要接的多天線偵測器複雜度皆會隨著天線數的增加,硬體的面積也大幅的增加,如何在天線數達到八根的情況下設計出相對應的硬體就變成這篇論文的主要目標,八根天線同時也是最新4G或是802.11ac開出的標準之一。 為了完成這篇論文的目標,總共製作了三顆晶片。第一顆晶片的部分,本論文藉由改良原始晶格簡化演算法(LLL algorithm),提出一個固定吞吐量的晶格簡化演算法,並藉由一些預測電路來降低整體電路的功耗,然後在UMC90的製程下實作四根天線的版本,並驗證其省能量的目標。第二顆晶片則是將晶格簡化演算法結合QR分解演算法,利用其演算法之關聯性,綜合考量二個演算法,可省略一些多餘的運算降低運算時間及硬體複雜度。又因其演算法有大部分相同運算,可使用運算元件共享來降低硬體面積。最後我們利用TSMC 90製程在八根天線下實現這個演算法。第三顆晶片則整合了第二顆晶片的前處理演算法,加上提出的低延遲K-best偵測器來完成整個晶格簡化強化的多天線偵測系統。提出的偵測器部分使用了不同的技巧來增加系統的偵測效果或是降低硬體需求,最後整體的演算法使用TSMC90製程實作,在72個訊號於同一通道下偵測時,吞吐量可以達到585Mbps,如果能增加前處理器的數量,則吞吐量更可以達到3Gpbs的等級。 這篇論文完成了一個完整的晶格簡化強化偵測器包含前處理器跟偵測器的部分。前處理器包含了一個多功能的QR分解與固定吞吐量晶格簡化處理器,偵測器則是使用低延遲的K-best偵測器,整個系統的功耗相當的低,也使的每一個偵測比特(bit)所耗的能量相對於現有的文獻相當的低,所以可以知道此論文對無線通訊領域的發展有相當的貢獻。

並列摘要


Nowadays, the progress of wireless communication has become very fast. The growth of the dimension of the multiple-input multiple-output (MIMO) systems is also very fast due to the demand of high throughput applications. Therefore the need for a high-performance and low-complexity MIMO detector becomes an important issue. The maximum likelihood (ML) detector is known to be an optimal detector; however, it is impractical for realization owing to its great computational complexity. Addressing this problem, researchers have proposed tree-based search algorithms, such as sphere decoding and K-Best decoding, to reduce the complexity with near-optimal performance. On the other hand, channel matrix preprocessing technique, such as lattice-reduction-aided (LRA) detection, has been proposed to improve the MIMO detection performance with full diversity gain. Although, lots of researchers address the merit of the lattice reduction aided system, there are still lacking of VLSI implementation in the lattice reduction aided MIMO detection criterion. This thesis focus on implementation of a complete lattice reduction aided MIMO detection system, and there are total three chip implementations in order to accomplish this goal. Each chip is introduced with one chapter. The goal of the first chip is to implement the first constant throughput LLL lattice reduction processor. A variant LLL lattice reduction algorithm is proposed and implemented in 4 × 4 MIMO systems. The power is saved by using redundant operation prediction techniques. The power saving technique is valid in both algorithm and hardware aspect. The chip is implemented using UMC 90 1P9M technology, and it occupies 4.29 mm2 area including a 0.8 mm2 core area with 24.8 mW power comsumption at its maximum frequency 37MHz. The average reduction power of the Rayleigh-fading MIMO channel is 22.42% of the original power. The throughput of this processor is determined by choosing a certain stage number, and the stage number can also be chosen to have different performance requirement. The goal of second and third chip is to implement a complete lattice reduction aided MIMO detection system. Although there are some implementations of the LLL lattice reduction algorithm in the literature, they often neglect the QR decomposition before the LLL lattice reduction algorithm. Thus, the second chip implemented a joint QR decomposition and efficient constant throughput LLL lattice reduction algorithm. This chip uses several different functional blocks to support both QR and lattice reduction operation. There are above 80% hardware sharing of these two algorithms which greatly lower the hardware cost for implementing a whole preprocessing operation, and the utilization rate of these processing elements is all close to 80% at will. This means there are few idling circuits. The joint design of these two algorithms also lowers the word-length of the circuit. The proposed processor was designed and fabricated using TSMC 90nm 1P9M CMOS technology. The chip occupies a 5.211mm2 area, including a 2.505mm2 core area, and consumes 31.2 mW at its maximum frequency of 55 MHz. It is the first 8 × 8 realization of the lattice reduction processor. The third chip deals mainly with the detector part of the lattice reduction aided MIMO detection system. The preprocessing processor of the second work is also used in this chip. Using simple linear detector cannot have satisfied performance in 8 × 8 MIMO environment. However, the lattice reduction aided K-best detector has a much larger data range which will result in large hardware cost. The sorting operation of K-best detector also results in long latency and hardware cost. Therefore, the third work proposed a sorting-reduced K-best detector to greatly lower the sorting operation with small performance degradation. Differential value representation is also proposed to reduce the hardware cost of lattice reduction aided K-best detector. The bridge between preprocessing and detection is also implemented on this chip. The proposed design, which includes QR decomposition with full size reduction, the E-CTLLL LR algorithm, shifting and scaling circuits, projection circuits, and the SR K-best detector, was fabricated using the TSMC 90 nm 1P9M CMOS process. The chip occupies a 13.82 mm2 area, including a 7.94 mm2 core area, and consumes 37.1 mW at a frequency of 65 MHz. The proposed SR K-best detector alone can achieve a throughput of 3.1 Gbps when 64-QAM is applied, outperforming state-of-the-art methods. To estimate the throughput of the whole system, one channel is assumed to detect 72 symbols. Therefore, the estimated throughput is 585 Mbps for this chip, and the bottleneck is the three cycle projection operation of the preprocessing part. The energy per bit is 63 pj/bit which is also the lowest in the literature. Thus, this work is believed to have many contributions in the VLSI implementation of lattice reduction aided MIMO detection area.

參考文獻


[21] R. C. H. Chang, C. H. Lin, K. H. Lin, C. L. Huang, and F. C. Chen, “Iterative QR decomposition architecture using the modified Gram-Schmidt algorithm for MIMO systems,” IEEE Trans. Circuits Syst. I, vol. 57, no. 5, pp. 1095 –1102, May 2010.
[51] P. K. Meher, J. Valls, T. B. Juang, K. Sridharan, and K. Maharatna, “50 years of CORDIC: algorithms, architectures, and applications,” IEEE Trans. Circuits Syst. I, vol. 56, no. 9, pp. 1893 –1907, Sep. 2009.
[1] G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multiple antennas,” in Bell Laboratories Technical Journal, 1996, pp. 41 – 59.
[3] L. Zheng and D. N. C. Tse, “Diversity and multiplexing: a fundamental tradeoff in multiple-antenna channels,” IEEE Trans. Inf. Theory, vol. 49, no. 5, pp. 1073–1096, May 2003.
[4] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel,”IEEE Trans. Commun., vol. 51, no. 3, pp. 389 – 399, Mar. 2003.

延伸閱讀