針對智慧型視覺辨識應用之大腦啟發演算法及架構設計

讓電腦能像人腦一樣有智慧的處理事情將會是未來主要的運算需求，莫爾定律告訴我們每一顆晶片上將會有愈來愈多的運算資源，然而電腦的智慧有和運算能力同步的成長嗎？從“百步法則“中我們學到，當我們在照片中辨識一隻貓的時候，大腦並沒有去計算結果，而是從記憶中去提取相關的訊息，這和現今電腦的運作機制是不一樣的，如果要開發一個有智慧的機器，可以嚐試學習大腦去建立一個有效率的記憶系統而非是一個運算系統。在本篇論文中，我們嚐試去開發一個能支援未來智慧應用的智慧處理器 (Intelligence Processing Unit, IPU)。首先，我們模仿大腦的記憶-預測結構建立一個可以提供多種智慧功能的軟體系統，接著，我們模仿大腦記憶聯想的機制去建立一個有效率的硬體平台，我們還提出了皮質編譯器 (Cortex Compiler)，可以支援從記憶-預測結構到硬體平台的轉換，最後，這些部份結合起來成為一個可以支援未來智慧應用的完整設計流程。在論文的第一部份，我們試著建立一個針對視覺分析的仿大腦運作模型，我們專注在大腦進入視覺皮質層後的處理。新皮質 (Neocortex) 是大腦智慧的核心，而記憶-預測的結構可能是新皮質統一的運作機制，海馬迴 (Hippocampus) 和視丘 (Thalamus) 也是記憶-預測處理的重要成員，海馬迴幫忙形成記憶-預測的結構，視丘負責調節資訊整合時個別訊號的強弱。我們結合新皮質、海馬迴還有視丘的運作機制提出了一個記憶-預測的架構可以同時支援辨識 (Recognition)、探勘 (Mining) 和合成 (Synthesis)的功能，在此架構中提供了階層間、空間上還有時間上的記憶-預測結構，這些結構可以合成出預測的影像資訊來改善視覺分析的結果。針對有雜訊的影像，我們提出的運作模型不只可以提高辨識準確度，還可以去除雜訊恢復影像的原貌。我們提出的架構還可將被遮住的圖型重建回來，或是在沒有任何影像輸入時去想像一個學過的影像型態。這樣記憶-預測的機制還可以延伸出其他的智慧功能，例如我們要在影像中找車子，車子的概念會預測可能出現的型態讓我們可以快速專注到有車子的區域而不是漫無目的的對整張影像做搜尋。這也證明這樣記憶-預測的架構確實可能是新皮質中產生人類智慧能力的統一而基本的運作機制。在論文第二個部份我們試著去建立一個矽腦硬體平台，其中包含了System Control、Neocortex Processor和Cortex Compiler，分別代表了我們提出的大腦運作模型中的視丘、新皮質和海馬廻。System Control負責記憶-預測網路的輸入和輸出。Neocortex Processor是一個離散式記憶聯想系統，為了達到區域性的資料讀取，我們學習大腦讓資訊是自動往後傳遞 (Push-based Processor) 而非像是在一般處理器中透過主記憶體做資料讀取 (Pull-based Processing)。接著我們提出了Dataflow-in-memory的機制讓記憶聯想所需的運算分散到許多較小的記憶體中，讓資料的讀取變得較有效率。我們亦採用了晶片網路的方式做資料傳輸以提供較佳的晶片可擴張性 (Scalability)。此外，我們還採用了仿大腦的資料閘道控管機制 (Information Gating)，只有信心分數較高的記憶型態會發出訊號去激發和其相關的記憶型態，如此一來，只有部份的辨識網路會被激發，我們提出的Neocortex Processor能達到類似大腦的快速反應、能量使用有效率及好的擴展性等特性。最後，Cortex Compiler負責將記憶-預測網路切割並擺放到Neocortex Processor中去運作，目的在於增加硬體的使用效率。我們以UMC 90nm的製程實作了一個14核心的IPU，核心面積為10.6 mm2。我們的系統可以在0.15秒內辨識一張64x64的影像，耗費的功率為137mW。和1.86-GHz的CPU比起來，我們的反應速度是其8倍且只花費不到二十分之一的硬體資源及不到三百分之一的功率。和其他類神經網路的模擬器比起來，我們提出的IPU系統能達到85.2倍的功耗效率 (定義為每單位功耗能支援的神經元模擬量)。而和其他智慧辨識系統比較，我們能提供較快速且完整的設計流程、較佳的可擴張性及較多的智慧功能。總結來說，我們提出的IPU系統有潛力在未來接近人類的智慧，並且適合用於未來的智慧應用中。

關鍵字

大腦啟發；智慧處理；視覺辨識；積體電路設計；記憶-預測

並列摘要

People in Intel forecast intelligent processing of Recognition, Mining, and Synthesis (RMS) will become the main computation requirements in their 2015 work load model. According to Moore's Law, there will be more and more computing resources in a single chip. However, does machine intelligence increase with available computing power? With the inspiration from "hundred-step rule," we learn that when we recognize a kitten in a photo, human brain does not compute the result but retrieves the associated information from the past experience. If we want to build an intelligent machine, we could mimic the brain to build a memory system rather than a computing system like the current computer. In this thesis, we try to develop an Intelligence Processing Unit (IPU) to support the future intelligent applications with the combination of silicon technology and the brain-mimicking model. At first, we mimic the memory-prediction structure of the brain to build a software model and provide the required intelligent functions. Then, we mimic the memory association mechanism of the brain to build an efficient hardware platform for intelligent processing. We also propose Cortex Compiler which supports the mapping from memory-prediction structure to the hardware platform. Finally, we provide a complete design flow from the software brain model, Cortex Compiler, to the hardware platform for future intelligent applications. In the first part of the thesis, we try to build a brain-mimicking model for vision analysis. Our work focuses on information processing in the visual cortex. Neocortex is the core of human intelligence. Recently, it is found that memory-prediction could be the unified processing mechanism in neocortex. Hippocampus and thalamus are also key components for memory-prediction processing in the brain. Hippocampus recollects the experience through a day in the dream and builds the connections between the associated patterns as long-term memory. That is to say, it builds the memory-prediction structure. Thalamus is the information gateway which modulates the information processing. It adjusts the weighting values for information fusion. We propose a memory-prediction framework which combines the functions of hippocampus, thalamus, and neocortex and can achieve the capability of recognition, mining, and synthesis. The provided hierarchial, spatial, and temporal memory-prediction structure can help synthesize predicted image patterns to refine the vision analysis results. For a noisy pattern recognition problem, the proposed model can not only improve the recognition accuracy but also recover the pattern. Our model can also provide the capability of image completion for an occluded image and imagination when no input is provided. At last, the memory-prediction framework can also be extended to other intelligent functions like attention. If we want to find a car in an image, the concept of the car predicts the possible patterns. As a result, we can attend to the car fast with rough view rather than full search of the image. It is proven that the proposed memory-prediction model is a unified and a basic building block for human intelligence. In the second part, we try to mimic the brain circuits to build a Silicon Brain hardware platform. Silicon Brain contains System Control (thalamus), Neocortex Processor (neocortex), and Cortex Compiler (hippocampus). System Control is the input/output interface which reads the input data of the memory-prediction network and outputs the analysis results. Neocortex Processor is a distributed memory association system for the memory-prediction network. Push-based processing rather than pull-based processing in conventional processers is first adopted to achieve localized data processing and reduce data access latency. Then, we propose a dataflow-in-memory technique. Memory association operations for the memory-prediction network can be distributed to many small-sized memory units and thus memory access becomes efficient. On-chip network is adopted to achieve data communication through virtual channels with good routability and scalability. In addition, an information gating data processing mechanism is proposed. Only the information with high confidence is forwarded to activate the fan-outs in the memory-prediction network. With this technique, only parts of the recognition network are activated. As a result, the proposed Neocortex Processor can achieve the brain-like features of fast response, energy efficiency, and good scalability. Cortex Compiler is responsible for partitioning and allocating the memory-prediction network into Neocortex Processor. Techniques of relay cell insertion and cortex placement are proposed to reduce sequential data processing and improve the hardware utilization. We have implemented a 200-MHz 14-core IPU system in UMC 90nm technology with 10.6 mm^2 of area. The proposed system can recognize one 64x64 image in 0.15 seconds with 137 mW of power consumption. The throughput is 8x compared to a 1.86-GHz CPU but with only less than 1/20 transistors and less than 1/300 power. Compared with other neural network simulators/emulators, the proposed 14-core IPU system can achieve at less 85.2x better power efficiency (number of neurons per Watt). We also compare our design with other intelligent recognition processors. Our IPU platform can provide rapid design flow, better scalability, and more intelligent functions. In conclusion, the proposed IPU system is potential to approach human-like intelligence and suitable for future intelligent applications.

並列關鍵字

Brain-inspired ； Intelligent Processing ； Visual recognition ； VLSI design ； Memory-prediction

參考文獻

[1] Pradeep Dubey, “A platform 2015 workload model: Recognition, mining and

Furber, “SpiNNaker: Mapping neural networks onto a massively-parallel chip

multiprocessor,” in IEEE International Joint Conference on Neural Networks,

2008, pp. 2849–2856.

[5] Thomas Dean, “A computational model of the cerebral cortex,” in Proc.

國際替代計量

針對智慧型視覺辨識應用之大腦啟發演算法及架構設計

全文下載

主題瀏覽