組合最佳化是指從所有可行解答的集合中尋找最佳解的過程,它在各種領域有許多重要的應用,如交通狀況模擬、蛋白質折疊模型、金融分析等。模擬量子退火(Simulated Quantum Annealing, SQA)是一種解決組合最佳化問題的啟發式演算法。 SQA在傳統電腦上模擬量子退火機器之量子穿隧效應,使其能有更高的機率找到總體最佳解。目前已有許多研究透過不同的硬體加速器來加速SQA,例如FPGA和GPU。大多數的研究是實作在單一加速器上,因此,SQA的規模會受限於單一加速器的計算能力和記憶體容量。在這篇論文中,我們提出了使用多GPU平行處理的做法來進一步加速SQA,並探索了幾種系統方面的設計來擴大SQA的規模。我們的成果與之前的作法相比,在使用8張A100 GPU 的情況下,達到7.2倍的速度提升,並且由於合計的記憶體提高至320GB,我們也將可解決的問題大小增加了8倍。此外,雖然CPU可使用較大的記憶體容量,但在解決同樣大的問題時,我們提出的多GPU的作法在執行時間上比CPU的作法快了約1000倍,並且在能耗效率上提高約100倍。
Combinatorial optimization is the process of finding an optimal solution from a set of all feasible solutions, which has many important applications in various fields, such as traffic flow simulation, protein folding models, financial analysis, etc. Simulated quantum annealing (SQA) is a heuristic algorithm for solving combinatorial optimization problems, which simulates the quantum tunneling phenomena of quantum annealing machines on traditional computers to increase the chance of finding globally optimal solutions. There have been several works of speeding up SQA with hardware accelerators, such as FPGA and GPU. Most of the works are implemented on a single accelerator and hence, the scale of SQA is limited by the computing power and memory capacity of the accelerator. In this thesis, we propose a multi-GPU approach to further accelerate SQA and explore several system-level designs to increase the scale of SQA. As a result, we achieve 7.2x speedup and enlarge the solvable problem size by 8 times with 8 A100 GPUs and a total of 320GB of GPU memory. While a CPU-based implementation can also solve a problem of the same size, our work offers ~1000x speedup for execution time and ~100x better power efficiency over the CPU-based implementation.