針對單一平面目標物之三維姿態的直接分析：演算法和系統實作

近年來，隨著擴充實境與機器人學的發展，如何即時且準確地分析已知的單一平面目標物其三維姿態成為了一個重要的議題。即使過去十幾年中，不少有效率的系統陸續被提出，但由於這些系統只能針對特定的平面目標物，如基準標記或含有簡易封閉曲線的平面目標物，此問題仍舊缺少一個適用於任意平面目標物的解決方法。目前針對此問題最好的解決方法為基於特徵點的方法，但是此方法必須在目標物與相機圖片中的特徵點能對應的前提下才能運作。為了解決這個問題，在本篇碩士論文中，我們提出了一個表現穩定、能針對任意平面目標物的直接分析演算法。首先，我們採用模板匹配的概念求出一個近似的三維姿態。接下來針對此近似的三維姿態，我們提出了一個梯度下降尋找的演算法來求出更為精準的三維姿態。更進一步，基於所提出的演算法，我們在圖形處理器上實作了一套分析和追蹤平面目標物之三維姿態的系統。此系統包含了分析單元和追蹤單元兩部分。分析單元負責計算出起始的三維姿態，是基於我們提出的演算法所設計的；追蹤單元則負責追蹤三維姿態，其所使用的方法為我們提出的一種三階層搜尋法。在系統中，無論是分析單元還是追蹤單元都充分利用了圖形處理器中平行運算的優點，使得我們的系統能夠非常有效率的運作。我們透過大量的實驗，證明我們所提出的演算法和系統，其表現都比目前基於特徵點的方法要更精準而且穩定。並於實際應用中，我們的系統達到了每秒11幀的運算速度。

關鍵字

三維姿態分析；三維姿態追蹤；直接分析；平行運算

並列摘要

Real-time estimating and tracking accurate 3D poses of a known planar target from a calibrated camera are essential for augmented reality and robotics. Although numerous efficient systems have been proposed in the past few decades, it remains a challenging task since the planar targets are limited to fiducial markers and targets with simple contours. The feature-based schemes are the state-of-the-art solutions for obtaining poses of arbitrary planar targets. However the success hinges on whether feature points can be extracted and matched correctly on targets with rich texture. In this thesis, we propose a robust direct method for 3D pose estimation with high accuracy that performs well on both texture and textureless planar targets. First, the pose of a planar target with respect to a calibrated camera is approximated estimated by posing it as a template matching problem. Next, the object pose is further refined and disambiguated with a gradient descent search scheme. In order to make the proposed algorithm applicable, we also develop D-PET, a direct 3D pose estimation and tracking system implemented on graphics computing units (GPU) which is able to obtain poses in real-time. The system consists of a pose estimation unit and a pose tracker. The pose estimation unit is built based on the approximated pose estimation scheme in the proposed algorithm to find the initial pose. A 3-scale search scheme is proposed for the pose tracker to track the pose precisely. Both of them utilize the characteristics of GPU and accomplish the work efficiently. Extensive experiments on both synthetic and real datasets demonstrate that both the proposed algorithm and system perform favorably against state-of-the-art feature-based approaches in terms of accuracy and robustness. The proposed system achieves a processing speed of 11 fps on an embedded GPU.