透過您的圖書館登入
IP:216.73.216.174
  • 學位論文

以在 FPGA 實作 Ethash 探討資料流最佳化

Exploring Data Flow Optimization Design by Implementing Ethash on FPGAs

指導教授 : 李致毅
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


乙太坊是現今相當熱門的區塊鏈,其發明者希望以太坊能做為世界各地人們一起驅動的世界電腦。隨著乙太坊所用的代幣價格提升,如何有效率地進行其所使用的工作量證明Ethash,其重要性也跟著日益升高。 根據Ethash的設計,電路的運算能力和其所配備大容量記憶體的頻寬成正比。最常見的高頻寬電子產品是繪圖處理器(GPU)。除此之外,也有部分現場可程式化邏輯閘陣列(FPGA)配備有高頻寬記憶體(HBM)。這類型的FPGA也因此適合計算Ethash。本篇論文描述如何在FPGA上實作硬體電路以利用HBM的頻寬,將頻寬完整轉化為Ethash的運算能力。首先透過流水線設計提高時脈。流水線設計所增加的正反器(Flip-Flop)可能導致電路無法完整放入晶片,或由於硬體資源使用率過高導致繞線困難反而降低時脈。於是進一步透過調整資料流提升硬體使用效率並精簡使用硬體資源。另外也對除法器進行改良,透過乘法器來實作。最終讓運算能力提升一倍達到81.25MH/s,同時面積縮小超過二分之一,可以將電路移植到更小的晶片上。 本篇論文並彙整過程中使用的研究方法,對於演算法資料流的分析,可以有效協助開發者找到實作架構中的瓶頸所在,期望這個方法能運用在未來其他類似開發工作中。

並列摘要


Ethereum is a popular blockchain nowadays. Its inventor has a rather ambitious vision to make it the computer which is driven by people from all over the world. In addition to blockchain-related applications, the importance of efficiently calculating Ethash, one of the proof-of-work (PoW) functions, has risen as the price of the tokens used in Ethereum has increased. According to Ethash's design, the computation capacity is proportional to the bandwidth of large-capacity memory. In addition to GPUs, there are also field programmable logic gate arrays (FPGAs) equipped with high-bandwidth memory (HBM) to handle some of the tasks with high bandwidth requirements. This paper describes a design of hardware circuits on FPGAs that is capable of fully utilizing the bandwidth of HBM and getting the most computation capacity. The first step is to improve the clock rate by pipelining. A pipeline design using too many flip-flops may cause the circuit too big to fit in the chip completely. The circuit that is too big may also reduce the timing due to high hardware resource usage. The high hardware resource usage would make electronic-design-automation (EDA) tools hard to route the circuit. Therefore, I further improve the efficiency of hardware usage and reduce the use of hardware resources by adjusting the data flow. In addition, I use a multiplier to implement the divider. Eventually, I manage to raise the computation capacity to 81.25MH/s, and the area of the circuit is reduced by more than one-half, allowing the circuit to be ported to a smaller chip. This paper also compiles the research methods used in the process. These methods of analyzing the algorithmic data flow can effectively help developers to find the bottlenecks in the implementation architecture, which can be applied to other similar development work in the future.

參考文獻


[1] A. R. Zamanov, V. A. Erokhin and P. S. Fedotov, "ASIC-resistant hash functions," IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), pp. pp. 394-396, 2018.
[2] "ethminer," [Online]. Available: https://github.com/ethereum-mining/ethminer.
[3] W.-K. Chang, A high-performance memory subsystem tailored to applications with heavy bandwidth requirement. [Unpublished master's thesis] National Taiwan University.
[4] Xilinx, "VCU1525 Reconfigurable Acceleration Platform User Guide(UG1268)," 2019. [Online]. Available: https://docs.xilinx.com/v/u/en-US/ug1268-vcu1525-reconfig-accel-platform.
[5] Xilinx, "Alveo U50 Data Center Accelerator Card Data Sheet(DS965)," 2020. [Online]. Available: https://docs.xilinx.com/v/u/en-US/ds965-u50.

延伸閱讀