本文的動機來自於異質運算技術的進步以及現實應用中對於各種工作負載加速的強烈要求。對於執行在 FPGA 上的管線的工作,本文提出了一套系統化的方法來配置每個管線工作階段的硬體資源,並在 FPGA 記憶體頻寬的限制下,最小化所有管線工作階段中執行時間的最大值。對於這個問題,我們提出了一個演算法並證明其解法為最佳解,並在一個真實的平台上實做了此演算法。在我們的實驗中,以此方法實做在 FPGA 上的一個影像濾波器,其效能可以分別超越 CPU、 GPU 和基準FPGA 達 460%、 73%和 1030%。我們另外也對於擁有更多資源的 FPGA 裝置進行了深入的模擬,以證明此方法的擴充性。
This work is motivated by the advance of heterogeneous computing and the strong demands of workload acceleration in practice. By considering pipeline workloads over FPGA, this thesis explores a systematic methodology to configure the hardware instances of each pipeline stage such that the maximum of the execution time of each stage is minimized, where FPGA allocation with the memory bandwidth constraint is considered. For the target problem, an algorithm is proposed and proved being optimal, and a real implementation study is conducted. In the experimental study, an image filter FPGA implementation can outperform the CPU, GPU, and baseline FPGA solutions by 460%, 73%, and 1030%, respectively. Extensive simulations were also conducted with a large FPGA size to show the scalability of this work.