本論文描述了一個運用單攝影機並可應用於人機互動的雙手追蹤系統。為了辨別使用者的頭以及手,本方法同時追蹤了使用者的頭。當目標距離彼此大於一段距離時,它們會被視為獨立追蹤。然而當它們有可能被互相干擾時,它們的狀態向量會一起被考慮依據相依的量測。追蹤器會運用遮罩將其它追蹤器最近的結果所在的區域忽略,以避免不同追蹤器之間的干擾。當下具鑑別力的顏色權重影像以及參考模型的反向投影的合成、運動模板影像和梯度方向特徵被用來驗證粒子濾波器所產生的假設。在另一方面,當目標物距離很近,甚至是重疊的時候,我們運用基於膚色推論之重要性取樣的粒子濾波器,產生融合目標物的假設,並加入深度順序的估測。我們依據視覺上的資訊包括:被遮蔽的臉部模板、手的形狀之梯度方向、運動的連續性以及前臂的線性方程式,來驗證這些融合的目標物可能的假設。實驗結果中展示了系統的即時效率以及強健性,我們也提供了系統跟依據Kinect深度影像的OpenNI 追蹤器追蹤結果在準確度上的比較,以及與一個目前最新的人體姿態估測方法在正確率的比較。
This thesis presents a two-hands tracking method with a monocular camera for human machine interaction (HMI). To clarify the face of the user and his/her hands, the face is also tracked in our method. The targets are tracked independently when they are far from each other; however, they are merged with dependent likelihood measurements in higher dimension while they are likely to interrupt each other. While one target is being tracked in the independent situation, other targets are masked to decrease the skin color disturbances on the tracked one. Multiple cues, including the combination of the locally discriminative color weighted image and the back-projection image of the reference color model, the motion history image and the gradient orientation feature, are employed to verify the hypotheses originated from the particle filter. On the other hand, when the targets are closing or even overlapping, the multiple importance sampling (MIS) particle filter generates the tracking hypotheses of the merged targets by the skin blob reasoning and the depth order estimation. These joint hypotheses are then evaluated by the visual cues of occluded face template, hand shape gradient orientation, motion continuity and forearm equation. The experimental results present the real-time efficiency and the robustness in comparison with the OpenNI tracker which has been released recently for the Kinect depth sensor and with the state-of-the-art human pose estimation method.