透過您的圖書館登入
IP:3.133.87.156
  • 學位論文

基於記憶體安全相關應用之低峰值低能耗多位元電流感測放大器及內嵌式高面積效率近記憶體運算功能電路

A Low Peak Current Low Energy Multi-bit Current Sense Amplifier with Embedded Area-Efficient Near-Memory-Computing Function for Memory Security Applications

指導教授 : 張孟凡
本文將於2025/10/13開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


目前非揮發性記憶體的發展在記憶體市場上具有龐大的潛力,現今以快閃記憶體為最大宗,然而快閃記憶體需要在高電壓下才能進行寫入和抹除資料,且操作速度較慢並且難以隨著先進製程一直微縮。因此,下世代新型的非揮發性記憶體如STT-MRAM,ReRAM等,可以在低電壓下操作且有百倍以上的操作速度,成為了取代快閃記憶體的選擇並應用在各式各樣需要高速運算的終端裝置上。 而現今許多終端設備與機器上,對於資料安全防護的意識與需求越來越高,這些裝置多數使用安全散列算法(SHA)或進階加密標準(AES)演算法將內部的資料以及明文進行數據加密。而這些操作需要高速的讀取速度和可以搭配wide-IO的非揮發性記憶體(NVM)來實現高讀取帶寬。此外,為了減少傳統馮紐曼(Von Neumann)架構運算中大量的資料搬移,在記憶體內放置運算單元的近記憶體運算 (Near memory computing)可以有效降低安全相關演算法的運算時間以及功耗。 自旋力矩轉移-磁阻式隨機存取記憶體(STT-MRAM)是主要用於先進製程節點的on-chip非揮發性記憶體,有著現在非揮發性記憶體中最快的讀取速度。但是,它需要具備小偏移量的感測放大器才能容忍微小的穿隧式磁阻比例(TMR-Ratio)來進行穩定的讀取,會造成大量的面積消耗和讀取能量(ERD)。因此設計一個高讀取帶寬、安全相關自旋力矩轉移-磁阻式隨機存取記憶體運算巨集主要面臨的挑戰有: 1. 使用大量的感測放大器進行平行讀取,可實現較短的讀取時間,但會導致峰值電流(IPEAK)提高和消耗大量面積和能量。若使用較少數目的感測放大器依序讀取多位元可減少高峰值電流和面積及能量消耗,但會導致較長的讀取時間進而降低讀取帶寬。 2. 具有較高峰值電流的自旋力矩轉移-磁阻式隨機存取記憶體巨集會降低晶片的電源穩定性,可能會導致同一晶片上對雜訊敏感的區塊出現故障。 3. 傳統的記憶體-邏輯單元分離架構於非揮發性記憶體的安全邏輯運算會導致較長的延遲時間 (wide-IO讀取及觸發器做移位/旋轉位元需要兩個週期),以及消耗額外的面積跟能量。 本論文主要討論自旋力矩轉移-磁阻式隨機存取記憶體在高帶寬讀取中的出現的問題,以及傳統馮紐曼架構的效能瓶頸,並提出結合了低能耗多位元電流感測放大器(LEMB-CSA)以及高面積效率近記憶體運算之電路。放大器具有電流裕度持續增強、製程變異容忍、小面積、低峰值電流、低能耗的特性;而內嵌於感測放大器之下的近記憶體運算電路具有高面積效率以及低功耗的表現,有效解決了前面所提到的設計挑戰。 在台積電22 奈米製程分析下,我們提出的讀取架構相較於傳統電流感測放大器可有35.2%的良率改善且多容忍80%的穿隧式磁阻比例。此外,減少的參考電流數量和流水線電流採樣方式使我們提出的感測放大器的能耗相較於2020年ISSCC發表的多位元電流感測放大器減少了36.4%,峰值電流降低了40%,可容忍之偏移量提升1.3倍,而僅付出相對於傳統電流感測放大器(並行感測) 18.2%讀取速度的代價。而我們提出的近記憶體運算電路可以減少33.3%的面積消耗和48.8%的功耗,並可以結合電流感測放大器的讀取操作,在一個工作週期內完成移位/旋轉位元的邏輯運算。 最後,我們與台積電合作在22奈米以及28奈米的CMOS工藝中實做並驗證我們提出的架構,本篇的量測驗證以28奈米的記憶體測試巨集為主,在VDD = 0.9伏特時,8個位元的讀取速度 = 3.12奈秒(ns),而在感測8位元+完成1位元移位/旋轉的近記憶體運算模式中為3.29奈秒(ns),僅額外多消耗了0.17奈秒(ns)。

並列摘要


The development of non-volatile memory (NVM) has great potential on storage memory market now, especially for the flash memory. However, the flash memory requires high voltage to program and erase data, low operation speed and it is hard to scaling down in the advance technology node. So the emerging non-volatile memory such as STT-MRAM, ReRAM which can operate in low supply voltage and achieve hundred times of operation speed become the choice to replace flash memory, and used in many kinds of edge devices that require to compute in high speed. The awareness and requirement of data protection are increasingly concerned in many edge devices and machines. Most of these applications use Secure Hash Algorithm (SHA) or Advanced Encryption Standard (AES) functions for data/plaintext encryption, and they require high read speed and non-volatile memory (NVM) that can be used with wide-IO to achieve high read bandwidth. Besides, to reduce the large amount of data movement in typical Von Neumann architecture, near-memory-computing can decrease the security-related algorithms latency and power consumption efficiently. Spin Torque Transfer-Magnetoresistive Random Access Memory (STT-MRAM) is the major on-chip NVM for advanced process nodes, it has the fastest read speed in the recent NVMs; however, it requires small-offset sense amplifiers (SAs) for robust read against small TMR-ratio at the expense of large area overhead and read-energy (ERD). Therefore, design a high bandwidth STT-MRAM macro for security-related applications imposes the following challenges: 1. Using a large number of SAs for parallel readout to achieve short TAC, but results in high peak current (IPEAK) and large area overhead. Using fewer SAs for sequential readout reduces IPEAK and area overhead, but imposes long TAC and decreases read-bandwidth (BWR). 2. MRAM macros with high IPEAK degrade the supply (VDD) integrity of the chip, often leading to failure in noise-sensitive blocks on the same chip. 3. A conventional memory-logic-separated scheme imposes a long latency (2 cycles: wide-IO memory read + flip-flop (FF) shift/rotate), extra area overhead, and power consumption for NVM-based security logic operations. In this thesis, we mainly discuss the issues in high bandwidth reading with STT-MRAM, and the performance bottleneck in typical Von Neumann architecture. Then we propose a low energy multi-bit current sense amplifier (LEMB-CSA), which is featured with continuously margin enhancement, offset suppression, small area, low peak current, and low energy consumption. We also propose the high area efficient, low power consumption near-memory-computing circuit which is embedded in sense amplifier to solve the design challenges we have mentioned before. In TSMC 22 nm technology analysis, our proposed sense amplifier achieves 35.2% yield improvement and >80% lower tolerance on TMR-ratio than conventional sensing scheme. Moreover, the reduced number of IREF and pipeline current sampling method allows LEMB-CSA has 36.4% reduction in energy consumption, 40% reduction in peak current, and can tolerate 1.3 times offset current compared with MB-CSA published in ISSCC 2020. The speed overhead is only 18.2% compared with conventional read schemes (parallel sensing). The proposed near memory computing circuit can reduce 33.3% area overhead and 48.8% power consumption, and it can finish the shift/rotate logic computing in 1-cycle combined with read operation of the current sense amplifier. Finally, our proposed scheme are verified in TSMC 22nm and 28nm CMOS process, the measurement results are depending on the 28nm test-mode memory macro. The access time of sense amplifier 8b sensing = 3.12ns at VDD = 0.9V, and the access time of 8b readout + 1b shift/rotate NMC operation mode required 3.29ns and consumed only 170ps over a typical memory access.

參考文獻


[21] N. Shibata et al., "13.1 A 1.33Tb 4-bit/Cell 3D-Flash Memory on a 96-Word-Line-Layer Technology," 2019 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2019, pp. 210-212, doi: 10.1109/ISSCC.2019.8662443.
[24] H. Shiga et al., "A 1.6GB/s DDR2 128Mb chain FeRAM with scalable octal bitline and sensing schemes," 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2009, pp. 464-465,465a, doi: 10.1109/ISSCC.2009.4977509.
[25] G. De Sandre et al., "A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput," 2010 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, CA, 2010, pp. 268-269, doi: 10.1109/ISSCC.2010.5433911.
[26] L. Wei et al., "13.3 A 7Mb STT-MRAM in 22FFL FinFET Technology with 4ns Read Sensing Time at 0.9V Using Write-Verify-Write Scheme and Offset-Cancellation Sensing Technique," 2019 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2019, pp. 214-216, doi: 10.1109/ISSCC.2019.8662444.
[27] Y. Chih et al., "13.3 A 22nm 32Mb Embedded STT-MRAM with 10ns Read Speed, 1M Cycle Write Endurance, 10 Years Retention at 150°C and High Immunity to Magnetic Field Interference," 2020 IEEE International Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 222-224, doi: 10.1109/ISSCC19947.2020.9062955.

延伸閱讀