基於權重剪枝下的深度學習硬體加速器之工作排程問題

深度學習(Deep Learning, DL)在許多領域已取得突破性發展，然而高效能運算是實現人工智能應用的關鍵。先前研究發現深度神經網路(Deep Neural Network, DNN)中有許多零或非常接近零的權重(Weight)。在深度學習硬體加速器(Deep Learning Accelerator)設計時，將這些權重刪除可以大幅提升運算效能，即為權重剪枝(Weight Pruning)。但是即使相同的神經網路模型在不同的應用下，模型的參數也會有所差異。這些差異會造成硬體設計上的不同，而有不同的工作排程 (Job Scheduling)需求。為了縮小硬體設計時間成本，以自動化方法分析並歸納出適當的工作排程顯得十分重要。我們以權重剪枝(Weight Pruning)為基礎，實現硬體資源的最佳化問題，並且探討硬體架構下的效能指標，以及提出工作排程問題(Job Scheduling Problem)的解決方法。

關鍵字

深度學習加速器；權重剪枝；排程問題；卷積神經網路

並列摘要

Application of Deep Learning (DL) has achieved a huge breakthrough in many fields. Many innovative DL applications require efficient computation. Previous work has found that neural networks of DL have many zero and near to zero weights. These weights can be deleted, i.e. weight pruning, to improve computation efficiency of the deep neural networks (DNNs). Also, different neural network model varies from one to one. This leads to the difficulty in the hardware design and job scheduling. Thus, using an automation technology to analyze and support the hardware accelerator design flow may be helpful. In this work, we study an optimization problem based on weight pruning technology, discuss the performance of hardware design, and propose a solution to a job scheduling problem.

並列關鍵字

Deep Learning Accelerator ； Weight Pruning ； Scheduling Problem ； Convolutional Neural Network

參考文獻

[1] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Ra- minder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre- luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adri- ana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gre- gory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA ’17, page 1–12, New York, NY, USA, 2017. Association for Computing Machinery.

Google Scholar

[2] Song Han, Jeff Pool, John Tran, and William J. Dally. Learning both weights and connections for efficient neural networks. CoRR, abs/1506.02626, 2015.

Google Scholar

[3] L. Du, Y. Du, Y. Li, J. Su, Y. Kuan, C. Liu, and M. F. Chang. A reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE Transactions on Circuits and Systems I: Regular Papers, 65(1):198–208, Jan 2018.

Google Scholar

[4] Yann Lecun, J. S. Denker, Sara A. Solla, R. E. Howard, and L.D. Jackel. Optimal brain damage. In David Touretzky, editor, Advances in Neural Information Processing Systems (NIPS 1989), Denver, CO, volume 2. Morgan Kaufmann, 1990.

Google Scholar

[5] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Deep learning with limited numerical precision. CoRR, abs/1502.02551, 2015.

Google Scholar

國際替代計量

基於權重剪枝下的深度學習硬體加速器之工作排程問題

全文下載

主題瀏覽