利用跨視角學習與空間估計辨識四軸飛行器視角的建築物

近年來，四軸飛行器日趨盛行，同時有別於傳統攝影機額外裝備了多種傳感器（可以同時得到影像與四軸的地理位置），同時為了實踐四軸相關應用，提供環境相關資訊（例如周遭建築物資訊）是非常重要的。因此我們定義了這樣的四軸飛行器視角建築物辨識的問題：給予一個地標還有他的若干張影像和地理位置，以及四軸的地理位置，在四軸視角影像中找出最有可能的建築物影像。然而雖然當前很少有標記的四軸影像，但我們有許多其他視角的影像可以利用像是地面上拍的，或者街景圖與高空圖。因此，我們提出了一個跨視角三重態神經網路去學習四軸視角與其他視角的視覺相似度。此外，我們進一步考慮了空間估計：每一個建築物影像的四軸角度與四軸長度，利用四軸與地標的地理位置空間資訊來改進這樣困難的跨視角視覺搜尋。除此之外，因為缺乏標記的四軸視角資料我們也收集了嶄新的四軸視角數據集（Drone-BR）。接著我們計算並實驗了不同的神經網路並研究出在不同情況下如何得到最好的表現。最後，我們提出的方法比最新卷積網路要好0.29 mAP，能讓四軸更深入地了解他的周遭環境。

關鍵字

四軸飛行器；四軸飛行器視角；建築物辨識

並列摘要

Recently, drones become more popular and equip several types of sensors (for image and geo-location). Simultaneously, to enable drone-based applications, it is essential to provide related information (e.g., building information) to understand the environment around the drone. We frame this extbf{drone-view building identification} as building retrieval problem: given a building (multimodal query) with its images, geo-location and drone's current location, retrieve the most likely proposal (building candidate) in a drone-view image. Although there are few annotated drone-view images to date, fortunately, there are a lot of images from other viewpoints, such as ground-level, street-view and aerial images. Hence, we propose a extit{cross-view triplet neural network} to learn visual similarity between drone-view and other views. In addition, we further consider spatial estimation ( extit{drone-angle} and extit{drone-distance}) for each building proposal to utilize drone's geo-location on geographic map in order to solve this challenging cross-view image retrieval problem. Moreover, we collect a new drone-view dataset ( extit{Drone-BR}) on our own owing to the lack of annotated drone-view dataset. We evaluate different neural networks and investigate how to achieve the best performance on various conditions. Finally, our method outperforms state-of-the-art approaches (CNN features) by 0.29 mAP, which indeed helps drones more deeply understand surroundings.

並列關鍵字

Drone ； Drone-View Image ； Building Identification

參考文獻

[5]M. Wolff, R. T. Collins, and Y. Liu, "Regularity-driven facade matching be-tween aerial and street views," in The IEEE Conference on Computer Visionand Pattern Recognition (CVPR), June 2016.

[6]A.-J. Cheng, F.-E. Lin, Y.-H. Kuo, and W. H. Hsu, "Gps, compass, or camera?:investigating effective mobile sensors for automatic search-based image annota-tion," in Proceedings of the 18th ACM international conference on Multimedia,pp. 815–818, ACM, 2010.

[8]V. Roberge, M. Tarbouchi, and G. Labonté, "Comparison of parallel geneticalgorithm and particle swarm optimization for real-time uav path planning,"IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 132–141, 2013.

[9]Y. Bazi, S. Malek, N. Alajlan, and H. AlHichri, "An automatic approach forpalm tree counting in uav images," in 2014 IEEE Geoscience and Remote Sens-ing Symposium, pp. 537–540, IEEE, 2014.

[12]D. G. Lowe, "Distinctive image features from scale-invariant keypoints," Inter-national journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.

國際替代計量

利用跨視角學習與空間估計辨識四軸飛行器視角的建築物

全文下載

主題瀏覽