本篇論文中,主要任務是求解支持本篇論文中,主要任務是求解支持向量機模型中的非約束最小化問題,此支持向量機模型是用於為一條向量訂出其所屬的類別。在大數據的時代,此問題的運算效率成了一個關鍵點。然而過往所提出的支持向量機硬體架構並沒有針對大尺規輸入資料做處理。儘管其加速的成果可能在小型資料上表現非凡,卻難以實踐於大尺規的輸入資料。 我們於此篇論文使用一個基於L-BFGS演算法改良的訓練演算法來減少記憶體的使用量,並引用MapReduce L-BFGS演算法的概念縮減向量計算的次數,再利用硬體實作技巧如管線化和平行化處理、記憶體配置、串流資料輸入等,最後實做出考量資源使用而能適用於大尺規輸入的加速硬體。 此硬體使用TSMC 40奈米製程實現,每個子單元的面積尺寸為5.592 mm^2,作用頻率為500 MHz。其訓練功能支持特徵向量的維度最高達4096,且特徵向量的個數沒有限制。當訓練一組大小為784x300的特徵向量集合時,此硬體相較於軟體可達17.62倍的加速倍率。子單元可共享部分的I/O介面及部分的模組,組合成一個大型的n倍平行子單元組,加快演算法速度。
The main task of this work is to solve an unconstrained minimization problem for Support Vector Machine (SVM) model, which is used to identify a vector into the class it belongs to. In the big data era, computing efficiency becomes a big issue. However, previous developments of SVM hardware did not focus on large-scale inputs. Despite that their accelerating performances are much better for small cases, they might be not practical for large data applications. In this thesis, we use a training algorithm based on Limited-memory BFGS (L-BFGS) algorithm to reduce memory usage. Besides, we use the concept of Map-Reduce L-BFGS algorithm to reduce vector calculation times. By using hardware techniques, including pipelining and parallel processing, memory arrangement and streaming input data, we accelerate large-scale training of SVM with a reasonable resources usage. The hardware is implemented by using TSMC 40 nm technology. Each subunit takes about 5.592 mm^2 in area and consumes 530.5 mW of power consumption when operating at 500 MHz. Its training function supports feature vector dimension of 4096 and feature instances of unlimited number. When training a 784x300 feature vector set, the hardware can achieve 17.62 times speed up, compared to the software version. Subunits can be grouped into a larger super module of N-parallel subunits, with shared parts of I/O ports and modules.