基於機率分布之高效率部分遮蔽物體姿態估計

本篇論文的目標是從單張的彩色影像中，即時地估計場景中被遮蔽物體的6D姿態。近幾年的方法多半仰賴深度神經網路來直接估計物體的6D姿態，或者是先估測預先定義的關鍵點在影像上的投影位置，之後再透過Perspective-n-Point演算法來解出6D姿態。然而，絕大多數的方法都沒有考慮到預測的投影位置的不確定性，因而影響到這些方法在物體被遮蔽的情況下的表現。為了解決這個問題，我們提出了一個基於機率分布的6D物體姿態估計網路，該網路將投影點的位置視作是一個機率分布，在預測投影點的位置時估計該機率分布的中心與變異數的值。因為我們將投影點的位置視作是一個機率分布，因而推導出一個更有效的損失函數，該函數不僅在訓練網路模型時提供一個直觀的方式讓網路模型學習，更重要的是，在後續計算物體的6D姿態時，該函數也會讓結果更為精準。最後我們在Occlusion LINEMOD這個資料庫上驗證我們的方法，並且在正確率上超越了目前現有的方法，除此之外，因為採用YOLOv3當作我們的骨幹網路，我們的方法能夠即時地對多個物體同時做姿態故機並且維持精準的估測。

關鍵字

物體姿態估計；深度神經網路；機率分布

並列摘要

This paper aims to resolve the problem of 6 DoF pose estimation from a single RGB image in which the target objects are partially occluded. Recent work tends to train a deep neural network which either directly estimates the pose from the input image or predicts the 2D locations of 3D keypoints, which are then used to compute the pose by the PnP algorithm. However, most of these methods treat the target object as a holistic entity and predict the pose from global information. Therefore, they are sensitive to occlusion and their results are easily degraded. To overcome this, we propose a segmentation-driven distribution-based 6D pose estimation network which predicts 2D locations together with confidence simultaneously. We regard the 2D locations as distributions and derives an unsupervised loss form which leverages the regression loss and confidence loss. Finally, we choose the most reliable local predictions depending on the predicted confidence and fuse them properly to generate a robust prediction. We outperform the state-of-the-arts on LINEMOD and Occlusion LINEMOD datasets, which exhibits that our approach can estimate precise pose even the target objects are partially occluded. Moreover, by employing the YOLOv3 as our backbone network, our method is efficient and effective enough for real-time applications.

並列關鍵字

6D pose estimation ； deep neural network ； probabilistic distribution

參考文獻

[1] Chen Wang et al. “Densefusion: 6d object pose estimation by iterative dense fusion”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, pp. 3343–3352.

Google Scholar

[2] Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. “Epnp: An accurate o(n) solution to the pnp problem”. In: International journal of computer vision 81.2(2009), p. 155.

Google Scholar

[3] David G Lowe. “Distinctive image features from scale-invariant keypoints”. In: International journal of computer vision 60.2 (2004), pp. 91–110.

Google Scholar

[4] Navneet Dalal and Bill Triggs. “Histograms of oriented gradients for human detection”. In: 2005.

Google Scholar

[5] Tsung-Yi Lin et al. “Focal loss for dense object detection”. In: Proceedings of the IEEE international conference on computer vision. 2017, pp. 2980–2988.

Google Scholar

國際替代計量

基於機率分布之高效率部分遮蔽物體姿態估計

全文下載

主題瀏覽