我們提出了一種新方法,能夠從單張RGB圖像準確估計新物體的6自由度(6DoF)姿態。我們的方法巧妙地結合了2D-3D關鍵點的對應和透過渲染比較來優化姿態。具體來說,我們首先使用現有的物體檢測技術檢測輸入圖像中的目標物體,然後通過2D-3D關鍵點匹配來估計初始的6DoF姿態。最後,我們利用3D高斯渲染技術,通過比較渲染圖像與輸入圖像來精細化優化物體的姿態。我們的方法結合了基於點雲模型的2D-3D關鍵點對應和基於3D高斯點的渲染模型,並實現了高效的可微渲染技術。實驗結果顯示,我們的方法在LINEMOD、YCB-V和OnePose-LowTexture等數據集上表現出色,尤其適用於實景和室內場景中的應用。
We introduce a new method for accurately estimating the 6DoF pose of new objects from a single RGB image. Our approach cleverly integrates 2D-3D keypoint correspondences and utilizes rendering comparisons to optimize the pose. Specifically, we first employ existing object detection techniques to detect the target object in the input image. Next, we estimate the initial 6DoF pose using 2D-3D keypoint matching. Finally, we refine the object's pose using 3D Gaussian rendering techniques by comparing rendered images with the input image. Our method combines 2D-3D keypoint correspondences based on point cloud models and utilizes 3D Gaussian rendering models, implementing efficient differentiable rendering techniques. Experimental results demonstrate the effectiveness of our approach on datasets such as LINEMOD, YCB-V, and OnePose-LowTexture, particularly in real-world and indoor settings.