透過您的圖書館登入
IP:3.17.68.14
  • 學位論文

記憶體式快速傅立葉轉換運算架構的普遍化不定資料分布存取方案

A Generalized Inconstant-distribution Access Scheme for Memory-based FFT Architecture

指導教授 : 薛木添
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


實現 Cooley-Tukey 演算法的硬體運算架構主要分為管線式 (Pipelined) 與記憶體式 (Memory-based) 兩個方向,在吞吐量 (Throughput) 許可的前提下,低硬體成本的記憶體式運算架構會是優先的選擇,而記憶體存取方案 (Memory access scheme) 則為此運算架構的核心,並在多層面影響硬體行為的特徵,故成為本論文的研究對象。過去的研究從固定資料分布 (Constant data distribution) 開始發展,對有序輸入的資料建立資料指標 (Data index) 作為分布資料的依據,同時對所有記憶體模組進行交換排序就成為此類研究成果的特徵;而本研究則嘗試開發不定資料分布 (Inconstant data distribution) 的方案,並從研究方法開始探討。   相較於過去研究,本研究嘗試從 SFG (Signal flow graph) 取得分布資料的資訊,並提出一個可適用於任意 Radix-(2^k) FFT 演算法的存取方案,而所配合提出的列指標 (Row index) 產生器,其硬體面積隨字組長度 (Word length) 呈線性增長 (Linear growth),適用於長點數 FFT 運算,又其傳遞延遲不隨演算法所採用的基數而變化,則將更適用於高基數 FFT 演算法。此外,本研究進一步將所提出的存取方案,應用於低記憶體面積的 Ping-pong Cache-memory 運算架構中,以驗證其適用範圍,並對記憶單元所遭遇的非理想效應進行抑制,首先抑制讀取與寫入對彼此時序的相依性,其次抑制連續 FFT 運算對彼此資料分布的相依性。   所提出的存取方案並非基於有序的資料指標,故運作過程中無須對記憶體模組進行交換排序,而是針對每個記憶體模組直接產生列指標,乃是對控制邏輯的進一步簡化;又基於所提出存取方案的資料分布特性,於實際運算架構中抑制連續 FFT 運算的資料分布相依性,而得以再次進一步簡化控制邏輯。總結本研究的核心成果,首先在於針對不定資料分布的研究提出方法,其次又基於所提出的存取方案得以在兩個層面簡化控制邏輯,說明所能帶來的新發展。

並列摘要


Under the condition of guaranteeing sufficient throughput rate, memory-based architecture with low area overhead would be preferable for executing Cooley-Tukey algorithm, thus memory access scheme is the object of this study while it is an essential issue of memory-based architecture. Data distribution is a primary consideration for access scheme, and a criterion of constant distribution is established in previous work, which means the mapping between data sequence and memory address is fixed throughout entire compute procedure. In contrast, this work attempts to develop an access scheme under a criterion of inconstant distribution and begins with developing a novel modeling method. This work proposes an access scheme for arbitrary power-of-two radix FFT algorithm and a corresponding row index generator which is highly hardware efficient. The area overhead of generator grows linearly with word length of row index, thus it is suitable for long FFT length. On the other hand, the propagation delay of generator remains constant for arbitrary power-of-two radix, thus it is suitable for high radix FFT algorithm. Furthermore, the proposed access scheme is applied to Ping-pong Cache-memory architecture for evaluating its feasibility. At the same time, issues of eliminating non-ideal effects encountered by memorial unit are also discussed. Compared to previous work, this work extracts sufficient information of distributing data from signal flow graph, SFG, rather than ordered data indexes, hence permutation of memory modules for parallel accessing is not required; besides, data distribution dependency between FFT computations is eliminated. Thus complexity of control logic is reduced in two aspects. In conclusion, the primary achievement of this work is proposing a novel modeling method for inconstant distribution and reducing complexity of control logic to demonstrate possible evolution which could be brought by inconstant distribution.

參考文獻


[2] J. W. Cooley and J. Tukey, “An algorithm for machine calculation of complex fourier series,” in Math. Comput., vol. 19, pp. 297–301, Apr. 1965.
[4] S. He and M. Torkelson, ”A New approach to Pipeline FFT processor,” in Proc. Int. Parallel Processing Symp., 1996, pp. 766–770.
[5] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation,” in Proc. URSI Int. Symp. Signals Syst. Elect., 1998, pp. 257–262.
[6] B. G. Jo and M. H. Sunwoo, ”New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy,” IEEE Tran. on Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911–919, May 2005.
[7] S. He and M. Torkelson, ”Design and implementation of a 1024-point pipeline FFT processor,”in Proc. IEEE Custom Integrated Circuits Conf., May 1998, pp. 131–134.

延伸閱讀