透過您的圖書館登入
IP:18.117.232.90
  • 學位論文

基於對抗式訓練的跨城市街景分割

Cross City Adaptation of Road Scene Segmenters via Adversarial Learning

指導教授 : 孫民 邱瀞德

摘要


儘管近期基於深度學習的方法已成功地運用在語意切割上,然而若要將一個已事先訓練好的街景分割器運用於一個其影像從未出現在分割器訓練集中的城市上,其結果會因為數據集的偏差而無法達到令人滿意的成果。因此我們提出一個非監督式學習的方法來適應不同城市的街景分割器,而不是在每個城市中都收集大量已標註好的影像來訓練或改進分割器。我們發現可以藉由Google map和其time-machine功能來收集在每一個街景不同時間點的未標註影像,因此我們就能間接地萃取出靜態的物體(static-object priors)。然後我們進一步地結合全域(global)和特定類別(class-specific)的域對抗學習框架(domain adversarial learning framework)在不需要任何用戶去標註或介入的情況下,來實現事先訓練好的分割器對該城市的適應。從結果可以顯示出我們提出的方法在跨洲且多個城市中都能提高了其語意分割的準確度,同時我們的方法對於那些需要有標註好訓練數據的最先進方法是有利的。

並列摘要


Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.

參考文獻


[1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
[3] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv preprint arXiv:1511.00561, 2015.
[5] J. Xie, M. Kiefel, M.-T. Sun, and A. Geiger, “Semantic instance annotation of street scenes by 3d to 2d label transfer,” in CVPR, IEEE, 2016.
[6] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in ECCV, Springer, 2016.
[7] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in CVPR, IEEE, 2016.

延伸閱讀