透過您的圖書館登入
IP:3.147.28.93
  • 學位論文

應用於非揮發性記憶體內運算架構之高速雙位元全電壓值域感測放大器

A High Speed Two-bit Full Voltage Range Sense Amplifier for Non-volatile Computing-In-Memory

指導教授 : 張孟凡
本文將於2025/10/13開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


近年來,隨著行動裝置和物聯網的發展比以往更加盛行,對於非揮發性記憶體的要求與日俱增。目前主流的非揮發性記憶體為快閃記憶體(FLASH),其具有成本低、容量大的特性而被大眾廣泛使用。然而,由於快閃記憶體需要高寫入電壓,且在製程微縮上遇到許多問題而陷入了瓶頸,因此開始拓展下世代的非揮發性記憶體(ReRAM, STT-MRAM, ...等)。相比傳統的快閃記憶體,下世代非揮發性記憶體可以使用較低的電壓來寫入、較快的讀取速度、較小的面積、並且具有邏輯製程相容性,這些優點使非揮發性記憶體比快閃記憶體更適合應用於內嵌式裝置。 又隨著深度學習和物聯網的發展,需要計算的資料量隨著神經網路的複雜度而上升,然而,傳統的范紐曼架構(Von Neumann)讓大多數的時間浪費在處理器和記憶體間的資訊搬運,兩者間的帶寬限制造成了運算速度的瓶頸。所以,近年來開始提出記憶體內運算(CIM)來解決這個問題,這使得需要傳輸的資訊是經過計算後的,減少資訊的搬移,進而提升處理效率,並搭配上非揮發性記憶體的特性,使得非揮發性記憶體內運算更適合運用在行動裝置和物聯網。 本碩士論文會探討非揮發性記憶體內運算所面臨之挑戰,並提出一個電壓感測放大器去解決這些問題,主要面臨的挑戰有下面兩個: 1. 隨著網路的複雜度提升,為了提高準確度,多位元的輸入和權重是必須的。然而隨著輸出的位元數上升,非揮發系性記憶體內運算架構需要更長的時間來完成,操作的速度因此下降。 2. 在有限的電壓下,傳統的電壓感測器並不能在低於臨界電壓的部分做正常的操作,因此不同累加值之間的感測裕度降低,讀取的良率也會跟著降低。 因此在此篇論文中提出一個電壓感測放大器,可以在一個操作區間內,產生連續兩位元的輸出,分別是00、01、10、11的值。提出的電壓感測放大器的時間比傳統的電壓感測放大器少48% ~ 52%,且在記憶體內運算巨集的時間比使用傳統的電壓感測放大器快27% ~ 39%。並且提出的電壓感測放大器支援全值域的電壓感測,使得在做非揮發性記憶體內運算時的感測裕度可以放大。同時具有製程變異消除和放大感測裕度的機制來在小的感測裕度也有較高的良率。傳統的電壓感測放大器在不同的共模電壓下能夠容忍的小偏壓電壓量也不同,而提出的電壓感測放大器能夠容忍1.76 ~ 2.91倍的小偏壓電壓量,且在不同的共模電壓下能夠容忍的小偏移電壓量很穩定。 我們以容量為4Mb的電阻式記憶體來實現記憶體內運算,使用台積電22奈米製程。在正常操作電壓0.8V,量測提出的兩位元輸出電壓感測放大器速度為1.36ns而傳統的電壓感測放大器為1.24ns。應用在記憶體內運算架構八位元輸入和八位元權重的速度,輸出八位元可以達到14.8奈秒。

並列摘要


In recent years, with the development of mobile devices and the Internet of Things (IoT) more prevalent than ever, the requirements for non-volatile memory have increased day by day. The current mainstream nonvolatile memory is flash memory, which has the characteristics of low cost and large capacity and is widely used by the public. However, because flash memory requires a high write voltage and encountered many problems in the process of scaling down, it has fallen into a bottleneck. Therefore, the next generation of non-volatile memory (ReRAM, STT-MRAM, ... etc.) has been expanded. Compared with flash memory, the next-generation non-volatile memory can use a lower voltage to write, faster read speed, smaller area, and logic process compatibility. These advantages make non-volatile memory is more suitable for the embedded system than flash memory. With the development of deep learning and the IoT, the amount of data that needs to be calculated increases with the complexity of the neural network. However, the traditional Von Neumann architecture wastes most of the time by transfer information between processor and memory. So, the bandwidth limitation between the two has caused a bottleneck in computing speed. Therefore, in recent years, Computing-In-Memory (CIM) has been proposed to solve this problem, which makes the information that needs to be transmitted is calculated, reduces the movement of information, and improves processing efficiency, combined with the characteristics of non-volatile memory. This makes non-volatile memory more suitable for mobile devices and the IoT. This master's thesis will discuss the challenges faced by non-volatile memory operations and propose a voltage sense amplifier to solve these problems. The main challenges are as follows: As the complexity of the network increases, to improve accuracy, multi-bit input and weights are necessary. However, as the number of output bits increases, the non-volatile memory internal arithmetic architecture takes longer to complete, and the operation speed decreases. Under the limited voltage, the traditional voltage sensor cannot perform the normal operation in the part lower than the critical voltage. Therefore, the sensing margin between different accumulated values is reduced, and the reading yield will also be reduced. Therefore, in this paper, a voltage sensing amplifier (VSA) is proposed, which can produce sequential two-bit outputs within a sensing period, which are the values of 00, 01, 10, and 11. The speed of the proposed VSA is 48% ~ 52% reduction than the traditional VSA, and the speed of CIM macro is 27% ~ 39% reduction than using the traditional VSA. Also, the proposed VSA supports full-range voltage sensing, so that the sensing margin can be enlarged when performing CIM operations. At the same time, it can eliminate process variation and amplify the sensing margin to achieve a higher yield with a small sensing margin. The proposed VSA can improve 1.76x ~ 2.91x input offset voltage under different common-mode voltage. We implement a 4Mb ReRAM CIM macro, using TSMC 22nm process. At a normal operating voltage of 0.8V, the measured 2bit speed of the 2b-FVRSA is 1.36ns and the measured 1bit speed of traditional VSA is 1.24ns. The speed of 8bit input and 8bit weights of CIM macro can achieve 14.8ns of 8bit outputs.

參考文獻


[2] M. Bohr, "The new era of scaling in an SoC world," 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2009, pp. 23-28, doi: 10.1109/ISSCC.2009.4977293.
[3] F. Menichelli and M. Olivieri, "Static Minimization of Total Energy Consumption in Memory Subsystem for Scratchpad-Based Systems-on-Chips," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 2, pp. 161-171, Feb. 2009, doi: 10.1109/TVLSI.2008.2001940.
[4] D. Smith et al., "A 3.6 ns 1 Kb ECL I/O BiCMOS UV EPROM," IEEE International Symposium on Circuits and Systems, New Orleans, LA, USA, 1990, pp. 1987-1990 vol.3, doi: 10.1109/ISCAS.1990.112119.
[5] C. Kuo et al., "A 512-kb flash EEPROM embedded in a 32-b microcontroller," in IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 574-582, April 1992, doi: 10.1109/4.126546.
[6] S. H. Kulkarni, Z. Chen, J. He, L. Jiang, M. B. Pedersen and K. Zhang, "A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 um2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS," in IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 863-868, April 2010, doi: 10.1109/JSSC.2010.2040115.

延伸閱讀