透過您的圖書館登入
IP:52.14.85.76
  • 學位論文

具注意力機制之混合任務級聯模型—實例分割之新框架

Hybrid Task Cascade with Attention: A New Framework for Instance Segmentation

指導教授 : 謝宏昀

摘要


在需要能即時、有效檢測和物件分割的機器視覺任務,如自動駕駛汽車、行人跟蹤和接觸者追蹤檢測等,混合任務級聯 (HTC, Hybrid Task Cascade) 取得了里程碑的進展與諸多成功應用。近代的實例分割研究更多地集中在主幹模組、資料增強技術和基於轉換器的架構上。這些模型變得越來越複雜,因此也變得越來越慢。在本論文中,我們的目標是顯著降低所提出模型的大小和複雜性,但得同時保持 HTC的最佳表現。與當前領先的實例分割技術相比,我們發現基於 HTC 架構中設計較無效率之處,因此,我們提出了帶注意機制的混合任務級聯 (HTCA) 框架,並在其中測試了三種不同的設計。在三種實驗設計中,將注意力機制嵌入反卷積層的混合任務級聯 (HTCA-D) 表現最好。HTCA-D 集成了最先進的檢測器 EfficientDet作為主幹,為基於傳統 HTC 遮罩分割器的分支。分割任務也透過新模塊的合併能更聚焦於物件上。透過基準資料集的驗證比較,輕量化的 HTCA 不僅減少了使用的參數量,同時還能提高了目標偵測品質。使用 HTCA,我們在 COCO 資料集上增加 1.3 個遮罩 AP。

並列摘要


Hybrid Task Cascade (HTC) for instance segmentation has recently gained enormous interest in the computer vision community in the domains that require effective detection and segmentation in real-time, such as the self-driving car, pedestrian tracking, and contact tracing. Recent instance segmentation research focuses more on the backbone, data augmentation, and transformer-based architectures. These models became more and more complex and consequently slower. In this thesis, we aim to significantly reduce the size and complexity of the proposed model while maintaining the state-of-the-art performance of HTC. We have found inefficient designs in the previous HTC-based architectures compared to the leading-edge developments. Therefore, we tested three different designs in the proposed Hybrid Task Cascade with Attention (HTCA) framework. Among the three experimental designs, Hybrid Task Cascade with Attention in the deconvolutional layer (HTCA-D) appears to be the best, the proposed HTCA-D is a novel network that integrates the state-of-the-art detector EfficientDet as the backbone, followed by the segmentation branch based on the conventional HTC mask head. The segmentation task is also renovated by incorporating a new module to focus more on the object. Our method reduces the number of FLOPs by more than 30% and it uses almost 75% less memory than the original version of HTC. It helps to save time and energy consumption during training and inference. Through validation with benchmark datasets, the lightweight HTCA not only reduces the number of parameters used but also enhances the object detection quality at the same time. Using HTCA, we surpass our baseline mask AP and bounding box AP by +1.3 points in each task on the COCO dataset.

參考文獻


[1] Kaiming He, Georgia Gkioxari, Piotr Dollr, and Ross Girshick. Mask r-cnn,2017.
[2] Youngwan Lee and Jongyoul Park. Centermask : Real-time anchor-free in-stance segmentation, 2020.
[3] Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, ShuyangSun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen ChangeLoy, and Dahua Lin. Hybrid task cascade for instance segmentation, 2019.
[4] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, StephenLin, and Baining Guo. Swin transformer: Hierarchical vision transformerusing shifted windows, 2021.
[5] Tingting Liang, Xiaojie Chu, Yudong Liu, Yongtao Wang, Zhi Tang, WeiChu, Jingdong Chen, and Haibin Ling. Cbnetv2: A composite backbonenetwork architecture for object detection, 2021.

延伸閱讀