基於模型路徑數的量化神經網路的高效神經架構和混合精度搜尋

隨著人工智能及其應用的快速發展，神經網絡模型變得更加複雜，架構的計算量和參數量更是以往的數千倍以上。而這也使得人工去搜尋或是建立架構變得極其困難。也因此，自動化神經網路架構搜索(Neural Architecture Search, NAS) 和自動化的混合精度搜尋(Mixed-Precision Search, MPS) 變得極其重要。然而，大多數當前的NAS和MPS演算法都會消耗大量的時間和運算資源在搜尋架構，而且許多方法僅適用於圖像分類應用或是單一應用上。所以本文針對NAS和MPS分別提出了總路徑計數分數(Total Path Count score, TPC score)和位元總路徑分數(Bitwise Total Path Count score, BTPC score)，這兩種分數都只需要針對架構型態的資訊去進行簡單計算，便可以很好得預測這個架構在正確率上的表現。另外，我們也從理論上和實驗上驗證了 TPC score和BTPC score不僅非常簡單運算，而且非常有效。在NAS的部分，我從搜尋空間(Search Space)中隨機選取的20 種不同架構並且計算每個架構的TPC score與在CIFAR100的正確率，發現 Kendall 相關係數高達 0.87。另外，本文還提出了 TPC-NAS。這是一種建立在 TPC score上的零樣本 NAS (Zero-Shot NAS)演算法。因為TPC-NAS不需要任何訓練或是推論資料，所以在使用規格為Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz 的CPU在不到 5分鐘的時間內完成NAS任務。然後，我將 TPC-NAS 應用於圖像分類(Image Classification)、物件偵測(Object Detection)、超解析度成像(Super-Resolution)和自然語言處理(Natural Laugugage Processing, NLP)等應用上進一步驗證TPC-NAS的可行性。在圖像分類問題中，TPC-NAS 找到了一個399M計算量(FLOPs)的架構並且在 ImageNet 中上達到 78.3% 的top-1 準確率，優於所有其他 NAS 的正確率。在物件偵測問題上，從 yolov4-p5 開始搜尋，TPC-NAS 找到了一個高性能的架構，和其他NAS找到的差不多計算量的架構高至少2%的mAP。在超解析度成像應用中，TPC-NAS 發現了一個參數少於 300K計算量的架構，並在 Urban100 數據集中生成了 PSNR 為 32.09dB 的圖像。最後在自然語言處理上，我們找到一個和tinyBERT差不多計算量的編碼器(Transformer)，並且經過預訓練(pre-trained)後，在GLUE標準應用下，平均正確率來到73%比原本的tinyBERT的64%正確率高上快10%。以上四種實驗驗證了TPC-NAS演算法的確可以在各種不同的使用前饋神經網路(feedforward neural network)的應用問題上，快速搜尋出高質量的架構。另外，在MPS的部分，我們發現BTPC score和不同的量化精度經過訓練後正確率的Kendall相關係數來到了0.956，代表BTPC score可以非常有效的預測架構經過量化後的正確率表現。基於BTPC score，我們提出BTPC-MPS。其可以使用基因演算法非常快速得在多種不同的量化經度分布下找到BTPC score最高的精度分布。然後，我們將BTPC-MPS用在影像辨識的問題上，試驗了多種不同的架構包括resnet20, resnet18, mobilenetv2，都可以在非常短的時間下搜尋出適合的量化精度分布。在針對CIFAR10的問題上，resnet20在平均bit數為3的混合精度下，做到92.71%的正確率，而在ImageNet問題上，resnet18和mobilenetv2在平均為4 bit的混合精度限制下分別做到70.7%和68.74%的正確率，我搜尋出來的混合量化精度可以做到接近或是超越以往其他MPS演算法的正確率。在object detection的應用中，我可以在平均為4 bit的限制下，做到43.2%的mAP，相比於全部都使用4 bit的量化方法，提升2%以上的正確率。而在super-resolution的問題中，透過BTPC-MPS演算法一樣在平均限制為4 bit的條件下，在Urban100的測試集可以做到31.45的PSNR。而自然語言處理中，BERT架構在平均abit*wbit為24的限制下，混合精度的架構可以做到平均81.7%的正確率，甚至比原本FP32的BERT架構高1%左右的正確率。最後，結合了TPC-NAS和BTPC-MPS兩種演算法，我們可以透過兩階段的自動化搜尋找出一個有者合理量化精度分配的低運算量架構。因為兩個演算法都可以在CPU上在5分鐘以內完成搜尋，所有合併兩個演算法後整體演算法也可以在10分鐘內完成搜尋。這樣的搜尋時間基本上遠遠的小於已知的NAS和MPS演算法。另外，透過結合兩種演算法我們僅僅使用了5.7G的單位計算量(bFLOPs)就可以在ImageNet上做到74.08%的正確率，相比傳統的mobilenetV2架構，我們可以在少於50倍以上的單位計算量下，提升2%以上的正確率。在object detection的應用中，在結合TPC-NAS和BTPC-MPS演算法後，我們可以在僅僅322G bFLOPs的計算量下，做到41.5%的mAP，相比於其他演算法在計算量和正確率表現上都有所突破。在super-resolution問題中，可以在3648G bFLOPs的計算量下，在Urban100測試集做到31.80的PSNR。而在natural language processing中，透過NAS和MPS的結合，我們可以在0.92G bFLOPs的計算量下做到平均69%的正確率，相比於原本FP32的tiny-BERT架構更高5%左右的平均正確率。

關鍵字

神經網路架構搜尋；混合精度搜尋；圖像分類；物件偵測；超解析度成像；自然語言處理；量化神經網路

並列摘要

Neural network models have become more sophisticated with the explosive development of AI and its applications. Automating the model search and quantization precision search process is essential to explore a full range of quantized neural architectures for satisfactory performance on hardware devices. However, most current Neural Architecture Search (NAS) and Mixed-Precision Search (MPS) algorithms consume significant time and computing resources, and many cater only to image classification applications. Hence, this paper proposes the total path count (TPC) score and Bitwise total path count (BTPC) score, which require only simple calculation based on the architecture information, as efficient accuracy predictors. TPC score and BTPC scores are not only simple to come by but also very effective. For the NAS problem, the Kendall rank correlation coefficient of the TPC scores and the accuracies of 20 architectures for the CIFAR100 problem is as high as 0.87. This paper also proposes TPC-NAS, a zero-shot NAS method leveraging the novel TPC score. TPC-NAS requires no training and inference and can complete a NAS task for ImageNet and other applications in less than five minutes on CPU with specification Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz. Then, we apply TPC-NAS to image classification, object detection, super-resolution, and natural language processing applications for further validation. In image classification, TPC-NAS finds an architecture that achieves 78.3% top-1 accuracy in ImageNet with 399M FLOPs, outperforming all other NAS solutions. Starting with yolov4-p5, TPC-NAS comes up with a high-performance architecture with at least 2% mAP improvement over other NAS algorithms' results in object detection. In the super-resolution application, TPC-NAS discovers an architecture with fewer than 300K parameters and generates images with 32.09dB PSNR in the Urban100 dataset. Finally, in natural language processing, we found a transformer with similar computational complexity as tinyBERT and, after pre-training, achieved an average accuracy of 73% on the GLUE benchmark, which is almost 10% higher than the original tinyBERT's accuracy of 64%. These four experiments convince us that the TPC-NAS method can swiftly deliver high-quality architectures in diverse feedforward neural network for different applications. In the MPS section, we found that the Kendall correlation coefficient between the BTPC score and the accuracy after training with different quantization precisions reached 0.956, which means that the BTPC score can be very effective in predicting the quantized models' accuracy. Based on the BTPC score, we propose BTPC-MPS. It can use the genetic algorithm to find the mixed-precision architecture with the highest BTPC score quickly. Then, we applied BTPC-MPS to the image classification problem and tested various architectures, including resnet20, resnet18, and mobilenetv2. The models can search for a suitable mixed precision in a very short time. For the CIFAR10 dataset, using a ResNet20 model with an average bit of 3 achieved an accuracy of 92.71%. Similarly, for the ImageNet dataset, ResNet18 and MobileNetV2 achieved accuracies of 70.7% and 68.74%, respectively, with an average of 4 bits. The mixed precision obtained through my search approach can achieve comparable or surpass the accuracies of previous MPS algorithms. In the application of object detection, I can achieve a mAP of 43.2% with an average constraint of 4 bits, which represents an improvement of over 2% compared to using 4-bit quantization methods across all layers. For super-resolution tasks, employing the BTPC-MPS algorithm under the same average constraint of 4 bits allows me to achieve a PSNR of 31.45 on the Urban100 test dataset. In natural language processing, using a mixed-precision architecture with an average constraint of 24 for abit*wbit, the BERT framework achieves an average accuracy of 81.7%, even surpassing the original FP32-based BERT architecture by 1% in terms of accuracy. Finally, combining the two algorithms, TPC-NAS and BTPC-MPS, we can find a low-computation architecture with a reasonable mixed precision through a two-stage automated search. The overall search time is within 10 minutes, far shorter than the known NAS and MPS algorithms. In addition, by combining the two algorithms, we can achieve an accuracy of 74.08% on ImageNet with only 5.7G bFLOPs. Compared with the traditional mobilenetV2 architecture, we can improve by more than 2% with 50 times fewer FLOPs. In object detection applications, combining TPC-NAS and BTPC-MPS algorithms allows us to achieve mAP of 41.5% while utilizing a mere 322G bFLOPs. This represents a breakthrough in terms of computational efficiency and accuracy compared to other algorithms. For super-resolution problems, we can achieve a PSNR of 31.80 on the Urban100 test dataset with 3648G bFLOPs. In natural language processing, through the combination of NAS and MPS, we achieve an average accuracy of 69% with only 0.92G bFLOPs. This is 5% higher in average accuracy compared to the original FP32 tiny-BERT architecture.

並列關鍵字

Neural Architecture Search ； Mixed-Precision Search ； Image Classification ； Object Detection ； Super-Resolution ； Natural Language Processing ； Quantization

參考文獻

[1] Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V. and Kurakin, A., “Large-scale evolution of image classifiers,” in Proc. International Conference on Machine Learning (ICML), July 2017, pp. 2902-2911.

Google Scholar

[2] Real, E., Aggarwal, A., Huang, Y., and Le, Q. V., “Regularized evolution for image classifier architecture search,” in Proc. of the aaai conference on artificial intelligence (AAAI), July 2019, pp. 4780–4789.

Google Scholar

[3] Zoph, B. and Le, Q., “Neural architecture search with reinforcement learning,” in International Conference on Learning Representations (ICLR), 2017.

Google Scholar

[4] Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q. V., “Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2820–2828.

Google Scholar

[5] Mei, J., Li, Y., Lian, X., Jin, X., Yang, L., Yuille, A., and Yang, J., “Atomnas: Fine-grained end-to-end neural architecture search,” in International Conference on Learning Representations (ICLR), Dec. 2020.

Google Scholar

國際替代計量

基於模型路徑數的量化神經網路的高效神經架構和混合精度搜尋

全文下載

主題瀏覽