透過您的圖書館登入
IP:18.225.55.151
  • 學位論文

應用變分自編碼器強化物件偵測的資料篩選效能

Enhance data selection efficiency with variational auto-encoder for object detection’s active learning

指導教授 : 孫民

摘要


對一次選多張影像之物件偵測自主學習在監視影片環境中,我們的方法應用變分自編碼器強化其資料篩選的多樣化特性。 相比於多樣性篩選以及不穩定性篩選,我們的混合型篩選策略在各種環境都具有穩定的表現:我們仰賴不穩定性篩選策略對影像的評分方式,但是我們會動態的調整評分的權重來避免相似資料篩選帶來得不必要花費,首先先藉由K-means 聚類法將變分自編碼器描述的影像分布取得相似影像的假設性標註,再藉由假設性標註以及篩選過的圖片,調輕與被選過的資料同類別的篩選權重,反覆上述步驟直到選取定量的影像進行標註後,我們會加入訓練集來訓練我們的物件偵測模型。我們實驗在四種不同環境以測定我們的混合策略是有效且強健的,並且給予各種方法對於各種環境的適用性比較及使用建議。透過我們的方法可以加速物件偵測系統的建置以及資料的收集在監視器上的應用,在多數環境下我們可以僅使用30%資料訓練模型取得完整資料集訓練模型的90%表現。

並列摘要


We apply pool-based active learning on object detection with surveillance video. The pool-based needs to select one batch of images, which have a budget limit in each selection iteration. Our method utilizes the VAE to enhance the diversity property of the selection strategy. Comparing with uncertainty and diversity selection, our method (hybrid strategy) have robust performance in different environments: Our method relies on uncertainty selection strategy to score image, which is more valuable for labeling. Moreover, we dynamic re-weight the uncertainty score of the image to avoid selecting similar data, which causes the redundant information for object detection model. First, we cluster the latent space of VAE by k-means in order to get similar data pseudo-label. Second, we re-weight uncertainty scores of similar images by the number of selected images with the same pseudo-label. Third, we select the most informative image for annotator labeling, which has the top-1 high re-weighted uncertainty score. Then we select data iteratively following the above steps until reaching the budget limited of the one batch of images. In the end, we add the batch of images as the object detector's training data. We do four experiments to validate that data selection in our method is more efficient and robust. Besides, we organize the recommendation usage of each method in different environments. Finally, we can accelerate the surveillance system build-up time and the data collection through our method. In most environments, we can only use the 30% data to achieve a competitive model 90% performance with the entire dataset.

參考文獻


[1] W.-Y. Chang, W.-H. Chiang, S.-H. Lu, T. Wu, and M. Sun, “Bias-aware heapified policy for active learning,” 2019. ii
[2] H.-N. Hu, Q.-Z. Cai, D. Wang, J. Lin, M. Sun, P. Krähenbühl, T. Darrell, and F. Yu, “Joint monocular 3d vehicle detection and tracking,” in IEEE International Conference on Computer Vision (ICCV), 2019. 1
[3] T.-H. Wang, H.-N. Hu, C. H. Lin, Y.-H. Tsai, W.-C. Chiu, and M. Sun, “3d lidar and stereo fusion using stereo matching network with conditional cost volume normalization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019. 1
[4] P.-Y. Huang, W.-T. Hsu, C.-Y. Chiu, T.-F. Wu, and M. Sun, “Efficient uncertainty estimation for semantic segmentation in videos,” in European Conference on Computer Vision (ECCV), 2018. 1
[5] N. E. M. Khalifa, M. H. N. Taha, A. E. Hassanien, and S. Elghamrawy, “Detection of coronavirus (covid-19) associated pneumonia based on generative adversarial networks and a fine-tuned deep transfer learning model using chest x-ray dataset,” 2020. 1

延伸閱讀