透過您的圖書館登入
IP:3.16.76.43
  • 學位論文

基於深度學習物件偵測網路之口罩佩帶辨認

Mask Wearing Identification Based on Deep Learning Object Detection Networks

指導教授 : 繆紹綱

摘要


由新型冠狀病毒(COVID-19)傳播的研究顯示,佩戴口罩可降低經由呼吸道飛沫傳染的風險,確保個人與他人的健康。為了防止疫情擴散,研發口罩辨識系統有助於落實防疫自動化,大幅降低人力成本,提高檢測效率,也減少感染者將病毒由飛沫傳播到健康個體。   通常需要在良好的偵測環境以及物件靜止的情況下,傳統口罩辨識方法才有良好的可靠度。而目前對此課題所提出的深度學習方式則是花費大多時間在預處理或是頭部姿勢分類上。本研究將佩帶在臉部上的口罩視為一個物件,嘗試使用通用物件偵測深度學習網路解決以上的問題。對這些問題,先前已有人嘗試過YOLOv3網路,本研究則提出用更先進的YOLOv5s版本,此網路具備低運算複雜度與高精準偵測能力,更符合實務的需要。   為了達到實時性的偵測,我們使用YOLOv3和YOLOv5s直接學習已完成佩戴狀態標記的臉部口罩數據,其中分為good (佩戴正確)、improper (有佩戴但露出鼻孔)以及bad (未佩戴)三種標記。之後再以實際拍攝的影片測試網路學習的效能。   實驗結果顯示,使用YOLOv5s物件偵測網路在中壢火車站出口進行實際環境測試的五段影片中,得到平均精確度為99.6%且平均召回率為99.6%的良好成績;在GPU RTX 2080 8GB的硬體上,網路平均執行速度約66.3 FPS;同時兼顧速度與正確性。與用YOLOv3的偵測結果相比,YOLOv3與YOLOv5s最大的表現差異在於bad (未戴口罩)類別的精確度,前者平均為47.4%,後者則為82.6%,中間有高達35.2%的效能差距。   此外,YOLOv3因未採用Mosaic資料擴增,導致戴有顏色的口罩容易誤判成沒戴口罩,而YOLOv5s採用Mosaic資料擴增解決了這個問題。但戴特殊圖案(例如豹紋)口罩者,兩個網路都會把它當成臉部特徵而誤判成沒戴口罩,這是後續還要改進的地方。

並列摘要


Research on the spread of the new coronavirus (COVID-19) shows that wearing a mask can reduce the risk of respiratory droplets and ensure the health of individuals and others. In order to prevent the spread of the pandemic, the development of a mask wearing identification system will help implement the automation of epidemic prevention, greatly reduce labor costs, improve detection efficiency, and reduce the spread of the virus from droplets to healthy individuals by infected persons. Usually, the traditional mask wearing identification method has good reliability only when the detection environment is good and the object is still. The deep learning methods currently proposed for this topic spend most of their time on preprocessing or head pose classification. This study treats the mask worn on the face as an object and tries to use the general object detection deep learning network to solve the above problems. For these problems, people have tried the YOLOv3 network before, but this study proposes to use a more advanced version of YOLOv5s. This network has low computational complexity and high-precision detection capabilities, which is more in line with practical needs. In order to achieve real-time detection, we use YOLOv3 and YOLOv5s to directly learn the face mask data that has completed the wearing status labeling, which is divided into three types: good (worn correctly), improper (worn but exposed nostrils), and bad (not worn) labels. Then test the learning performance with actual videos. The experimental results show that in the five videos tested in the actual environment at the exit of Zhongli Railway Station using the YOLOv5s object detection network, good results with an average precision of 99.6% and an average recall rate of 99.6% are obtained; on the GPU RTX 2080 8GB hardware, the average network execution speed is about 66.3 FPS; both speed and precision are achieved simultaneously. Compared with the detection results using YOLOv3, the biggest performance difference between YOLOv3 and YOLOv5s lies in the precision of the bad (not wearing a mask) category. The former is 47.4% on average, and the latter is 82.6%, with a performance gap of as much as 35.2%. In addition, because YOLOv3 does not use Mosaic data augmentation, it is easy to mistakenly judge that wearing a colored mask is not wearing a mask, and YOLOv5s uses Mosaic data augmentation to solve this problem. However, for people who wear masks with special patterns (such as leopard print), both networks will regard it as a facial feature and misjudge that they are not wearing a mask. This is an area for further improvement.

參考文獻


[1] Siddhartha Verma, Manhar Dhanak, and John Frankenfield, "AIP Physics of Fluids, " Proc. Physics of Fluids, Vol. 32, June 2020.
[2] Florida Atlantic University, "Face Mask Construction, Materials Matter for Containing Coughing, Sneezing Droplets, " Proc. Physics General Physics, June 2020.
[3] 溫哲彥、余秋忠和郭宜蓁,自動化可疑犯罪行為影像監視系統之模擬,行政院國家科學委員會專題研究計畫成果報告,2007。
[4] Rafael C. Gonzalez and Richard E. Wood, Digital Image Processing, 2nd ed., Prentice Hall (2002).
[5] Ramesh Jain, Rangachar Kasturi, and Brian G. Schunck, Machine Vision, McGraw-Hill Int. Ed. (1995).

延伸閱讀