  • 學位論文


Pedestrian Detection via Combination and Configuration of Heterogeneous Detectors

指導教授 : 莊永裕


在電腦視覺中, 靜態影像的行人偵測是個重要的問題. 其應用包含機器人和監視器技術等等. 傳統的行人偵測方法, 主要是萃取出重要的特徵來代表行人整體, 而對於清楚呈現在影像中的行人, 此類方法能成功的找出行人的位置. 但不幸的是, 此類方法無法處理被遮蔽的身體和肢體形變等問題, 並因此大幅降低準確率. 因此近來的研究比較偏向利用部分身體區塊的方法. 然而此類方法雖然能解決上述問題, 卻也有其缺點. 由於容易受到環境因素的干擾, 這類偵測器很常產生不必要的偵測錯誤, 而降低準確率. 因此, 如何挑選適合的部分身體偵測器, 並有效地結合這些不同的偵測器, 是非常重要的議題. 本篇論文提出了結合不同偵測器的架構, 利用原本整體的行人偵測, 配合高效率的人臉偵測與與可處理肢體形變的部分肢體樣板偵測, 來解決當行人被遮蔽或者形變的情況. 在整合的過程中, 我們設計了一個機率模型, 考慮偵測器彼此位置關係的限制, 並進行投票來找出一些有可能是行人的位置. 除此之外, 為了避免錯誤的投票, 我們利用細胞模型, 來檢測每個位置支持這個偵測結果的程度, 而過低的偵測將被移除. 我們利用著名的INRIA資料集進行不同的實驗, 而結果顯示我們提出的系統架構能有效提昇行人偵測的辨識率和準確率, 並且和目前最好的行人偵測演算法並駕齊驅.


Pedestrian Detection in still images is a key problem in computer vision, such as surveillance and robotics. Traditional approaches designed features for representing holistic human body and successfully detected humans with high visibility. Unfortunately, body occlusion and pose articulation pose a challenge and drop the performance obviously on these approaches. A recent shift of focus towards part-based representations due to its capability of ac- cess these problems. However, part-based approaches are well known to be sensitive and noisy so that produce much more false alarms. Therefore, se- lecting proper part-based detectors and adequately combined them becomes essential. In this thesis we propose a framework to combine the heteroge- neous detectors including holistic-based, part-template-based and face detec- tors. Face information is discriminative while detecting pedestrians, and part- template-based approaches have merit of pose-invariance. Instead of directly fusing responses from these distinct detectors, we design further steps for combination and configuration. Firstly, the responses from heterogeneous de- tectors cast probability votes under the consideration of geometry constraint of different detectors. Peaks of voting form and localize where the pedes- trians are. Avoiding the inadequate vote, cell models learned in advance are utilized on each detection from voting to measure the level of local alignment and reject a wrong detection. Experiments are conducted on the novel INRIA dataset and the quantitive results have been shown that our framework makes a significant improvement on pedestrian detection. Also, the framework we proposed outperforms the baseline holistic-based approaches and achieves a comparable results with state-of-the-art approaches.


[1] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24:603– 619, 2002.
[3] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: A benchmark.
[4] M. Enzweiler and D. M. Gavrila. Monocular pedestrian detection: Survey and ex- periments. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 99(1), 2009.
[5] M. Everingham, A. Zisserman, C. K. I. Williams, and L. Van Gool. The pas- cal visual object classes challenge 2006 VOC2006 results. http://www.pascal- network.org/challenges/VOC/voc2006/results.pdf, 2006.
[7] F. Han, Y. Shan, H. Sawhney, and R. Kumar. Discovering class specific composite features through discriminative sampling with swendsen-wang cut. Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8, June 2008.
