以非揮發性記憶體擴大量子電路的模擬與分析

隨著量子計算的蓬勃發展，高效能量子計算的模擬已成為開發量子計算系統與應用的相關活躍研究課題。然而，由於大規模的量子電路模擬所需的運行時間以及記憶體容量會隨著量子位元的數量而呈現指數增長，因此在傳統電腦上模擬大規模量子電路是一項相當具有挑戰性的任務。目前常見的大規模的量子電路模擬，會透過叢集來擴大可用的記憶體容量。然而，此方法會為使用者帶來高昂的金錢成本外、也會帶來大量資料交換所產生的大量的通信間接費用。本論文提出了一種利用非揮發性記憶體在單台機器上實現大規模量子電路模擬、且兼具成本效益的優化方法。相比於現今的高端服務器僅能承載 TB 等級大小的 DRAM。TB 等級的 NVMe 硬碟是相當常見的，且常規的電腦亦可裝載輕鬆裝載數百 TB 等級的 NVMe 磁碟陣列。此外，NVM 磁碟的價格約為 DRAM 的百分之一倍。因此利用 NVM 記憶體能更輕鬆做到在單一電腦上具成本效益的大量記憶體空間。本方法不僅利用了 NVM 大量記憶體的優勢，且根據 NVM 存放資料的模式，實現連續訪問資料的優化以及大量的資料重用。此外，對於常用的特定量子邏輯閘，包括: CNOT、CZ、CS、CP(theta)、SWAP 以及 Toffoli 等量子邏輯閘，我們利用省略不必要的資料訪問，藉以獲得模擬器運行時間的加速。此外，由於到 NVM 訪問資料的時間比 DRAM 慢得多，我們提出了一種量子電路排程器，利用將量子邏輯閘聚合成大型的 N-量子位元邏輯閘，除了能減少電路深度外、亦可減少到 NVM 提取資料的次數，達到模擬量子電路時間上的加速。即使本篇論文所提出的方法，並非旨在模擬適合計算機 DRAM 大小的小型量子電路。然而，由於小電路的結果可作為估計使用 NVM 或 DRAM 作為記憶體所建造的大型量子電路模擬器之間的性能差距。因此於本論文中，為了評估及驗證本方法的效能，我們運行了一系列量子電路，並將其運行結果和由牛津大學所提出的量子模擬器，QuEST，進行運行時間的比較。實驗結果表明，本方法成功讓使用者能在更低的成本下，在合理的運行時間內做超出 DRAM 記憶體大小的大型量子電路模擬器。而在沒有使用我們提出的量子電路排程器，我們的模擬器因受限於 PCIe 通道傳輸速度的限制，最糟的實驗結果顯示，本量子電路模擬器運行時間在小型的量子電路下會是 QuEST 模擬器的 2 倍，而在大型的量子電路下的運行時間約是 QuEST 模擬器的 10.9 倍。但若使用本論文所提出的量子電路的排程器，我們的 NVM 大型量子電路模擬器運行的速度可以比 QuEST 高出 1.2 倍的速度，直接證明了我們所提出的量子電路排程器所能帶來的速度上的效益。

關鍵字

量子計算；量子電路模擬；量子電路優化；平行計算；效能分析

並列摘要

With recent advance of quantum computing, high-efficiency quantum computing simulation has become an active research topic for developing quantum computing sys- tems and applications. However, it is challenging to simulate large-scale quantum circuits on traditional computers as the runtime and memory capacity required for the simulation grow exponentially with the number of quantum bits (qubits). While a computer cluster may be used to enlarge the scale of simulation by extending the computing and memory resources, it incurs very high costs for the users. This paper proposes a cost-effective method for performing quantum circuit simulation on a single computer with non-volatile memories (NVM) and optimization schemes. The proposed method not only takes advan- tage of the large capacity offered by NVM, but also optimizes the data access patterns for NVM for contiguous accesses and data reuse. For specific quantum gates that are pop- ularly used, including CNOT, CZ, CS, CP(theta), SWAP, and Toffoli, we make special arrangement to gain extra speed. In addition, as NVM is accessed via I/O and is much slower than regular memories such as DRAM, we propose a quantum circuit scheduler to aggregate quantum gates into k-qubit unitary gates to reduce the circuit depth and de- creases the number of data fetches of from NVM. To evaluate the performance of the proposed method, we carry out a series of bench- mark circuits and compare it against QuEST, one of the most popular quantum circuit simulators. The experimental results show that our work successfully enables the user to simulate quantum circuits beyond the capacity of regular memories with NVM at an affordable cost and a reasonable speed. In comparison, one NVMe disk already offers terabytes of memory capacity at approximately 1/100 of the price of DRAM, and one typ- ical computer can easily attach arrays of NVMe disks to provide hundreds of terabytes of memory capacity, while today’s high-end server can only host several terabytes of DRAM. While our work is not intended to simulate a small quantum circuit which fits in the DRAM of a computer, the results from small circuits serve as references to estimate the performance gaps between DRAM-based simulation and NVM-based simulation for large circuits, if the user can afford the high cost of DRAM. As the data fetched from NVM can be cached in the system memory for data reuse, the speed of our work varies with the size of the circuits. Without the proposed scheduler, for regular unitary gates and specialized gates, in the worst case scenario, QuEST outperforms our work by 2.0x for smaller circuits and 10.9x for larger circuits in terms of speed, mainly due to the perfor- mance bottleneck caused by I/O operations to access NVM across the PCIe bus. With the proposed scheduler, our NVM-based can outperform QuEST by 1.2x for large random circuits, which demonstrate the effectiveness of the proposed scheduling technique.

並列關鍵字

Quantum Computing ； Quantum Circuit Simulation ； Quantum Circuit Optimization ； Parallel Computing ； Performance Analysis

參考文獻

[1] Apacer nvme https://www.tweaktown.com/news/86395/apacer-is-first-with-\ pcie-5-0-ssds-up-to-13-000mb-sec-reads/index.html.

Google Scholar

[2] Ibm’s 10 quantum device https://www.google.com/search?q=IBM+10+quantum+device+\lineup& client=safari&rls=en&sxsrf=ALiCzsap6Xvvlp5R7BtFTMJe25iBXQQn_ g:\1658823801164&source=lnms&tbm=isch&sa=X&ved= 2ahUKEwin-qabkJb5AhVHB4gKHTASCagQ_AUoAnoECAEQBA&biw=1440&bih= 820&dpr=2#imgrc=MppnJ0tCPSUi2M&imgdii=hYj4VhTlAnxqRM.