嵌入式系統視訊串流分析深度學習方法之改進

隨著人工智慧的崛起，廣泛應用神經網路已不可避免地來臨，越來越多的應用如自動駕駛、行車監控、道路監控、門禁管理等等，開始利用神經網路剖析視訊串流實現於嵌入式系統及行動裝置之上。對於運算效能有限的嵌入式系統，即時性的分析運算將是一個需要被攻克的問題，特別是要實施人工智慧分析影像更是需要具備有大量的運算能力。為了解決上述的問題，提出了三種不同的應用場景，及對應合適的解決方案並實現於嵌入式系統上。首先，對於視訊串流中偵測場景轉換的方法，提出Inter Sub-block Difference (ISD)方法，可作為分析視訊串流的前置步驟，除了有利於取得影片分段，且能判斷相鄰畫面之間的差異程度，給予後續神經網路處理做為參考。此方法設計於YUV彩色空間並將畫面切割為區塊作為運算基礎的設計方法，除了本身有著極高的執行效率，對於下一級神經網路演算法也具有不錯篩選效果，使得此方法特別適合需要即時性的嵌入式系統。對於一些應用場景，如安裝於車輛內、外的攝影機上的影像分析、人員出入管制、門禁身分識別等經常會需要應用人臉識別技術，因此提出了Continuous Frames Skipping Mechanism（CFSM），結合改良後的ISD與狀態機設計，使得人臉識別執行於嵌入式系統上的執行時間，相較於“Basic Face Recognition system”減少90%。另一項廣泛的視訊串流分析應用，如道路監控、車內異常行為監測、犯罪或非法活動等偵測。通常這類的檢測方法偵測出畫面異常後，隨後需要其他演算法查找出異常發生成因，因此提出Video Abnormal Region Detection System(VARDS)，除了找出異常的畫面並標定出異常區域，並且提供畫面內的物件、種類、數量、大小等資訊，有利於實務上的告警機制，相較於傳統結合神經網路“basic system of Abnormal Frame with Object Detection”的方法於嵌入式系統，速度提升2.6倍，實驗的結果表現所提的VARDS方法有極大速度優勢。上述機制的目標除了對於理解影像內容的進階方法和架構，還包括降低深度學習方法所需要的計算複雜度與和計算能力需求，尤其適合實現在嵌入式系統。

關鍵字

嵌入式系統；分鏡邊界偵測；人臉識別；異常偵測；深度學習；人工智慧

並列摘要

With the advancement of artificial intelligence, the era of ubiquitous neural network applications has unavoidably arrived. Many of them are used on embedded systems and mobile devices to analyze video streams that are used in numerous edge devices like auto driving, vehicle surveillance, road surveillance, access management, and so on. Real-time analysis and computation will be a challenge for embedded systems with low computation performance, especially if artificial intelligence is employed to analyze video streams, which utilizes a significant amount of computational power. To overcome the aforementioned issue, three application scenarios and solutions are proposed and implemented on the embedded system. The first topic is Inter Sub-block Difference (ISD) is used to preprocess the video stream for the shot boundary detection, which not only retrieves the video segment but also identifies the difference between adjacent frames and provides a reference for subsequent computation. Since the proposed mechanism works on the YUV color domain and adopts a sub-block design, it not only has a high execution efficiency but also provides effective filtering capabilities for higher-level algorithms. Second, many applications require face recognition technology, such as the image on dashboard cameras, personnel management, access control identification, and so on. This research proposes the Continuous Frames Skipping Mechanism (CFSM), which combines an improved ISD with a state machine and reduces the execution time of face recognition on an embedded system by up to 90% compared to “Basic Face Recognition”. The last, the abnormality detection algorithm is used on many security systems. Such an algorithm often simply determines whether there is an abnormal frame and requires another algorithm to identify the fundamental causes. Hence, the Video Abnormal Region Detection System (VARDS) is proposed which not only detects the abnormal frame in the video stream but also identifies the abnormal region and provides information about detected objects such as object types, quantities, and region size, which is useful for practical alerting mechanisms. The experimental result shows that by implementing VARDS on an embedded system, the speed increases 2.6 times compare with “basic system of Abnormal Frame with Object Detection”. The objectives of the above mechanisms include developing improved methods and architectures for understanding the contents of the video but also exploring solutions to reduce the computational complexity and computing power required by deep learning methods.

並列關鍵字

Embedded System ； Shot Boundary Detection ； Face Recognition ； Abnormal Detection ； Deep Learning ； Artificial Intelligence

參考文獻

[1] Shen, R., Lin, Y., Juang, T. T., Shen, V. R., & Lim, S. Y. (2018). Automatic detection of video shot boundary in social media using a hybrid approach of HLFPN and keypoint matching. IEEE Transactions on Computational Social Systems, 5(1), 210-219.

Google Scholar

[2] Sajjad, M., Nasir, M., Muhammad, K., Khan, S., Jan, Z., Sangaiah, A. K., . . . Baik, S. W. (2020). Raspberry Pi assisted face recognition framework for enhanced law-enforcement services in Smart Cities. Future Generation Computer Systems, 108, 995-1007.

Google Scholar

[3] Ling, N., Wang, K., He, Y., Xing, G., & Xie, D. (2021). RT-mDL: Supporting Real-Time Mixed Deep Learning Tasks on Edge Platforms. Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems.

Google Scholar

[4] Bramberger, M., Brunner, J., Rinner, B., & Schwabach, H. (2004). Real-time video analysis on an embedded smart camera for traffic surveillance. Proceedings. RTAS 2004. 10th IEEE Real-Time and Embedded Technology and Applications Symposium, 2004., 174–181.

Google Scholar

[5] Chu, S.-L., Chen, C.-F., & Zheng, Y.-C. (2021). CFSM: A novel frame analyzing mechanism for real-time face recognition system on the embedded system. Multimedia Tools and Applications, 81(2), 1867–1891.

Google Scholar

主題瀏覽