透過您的圖書館登入
IP:3.141.24.134
  • 學位論文

改善物件定位中之子視窗搜尋法

Improving Efficient Subwindow Search in Object Localization

指導教授 : 顏嗣鈞

摘要


物件檢測與物件定位在電腦視覺領域中是個非常重要的研究課題。此類研究可用於偵測數量眾多的消費性電子系統中的自然物件,如搜尋照片或視頻檔案數據庫中是否存在某類我們有興趣之事物。物件檢測所解決的問題是一影像中是否存在某個我們有興趣的物品,而物件訂位所專注的則是更加困難的問題、如何在一影像中找出我們所要求得的物件的所在位置。 一種常用於針對解決物件定位的實施方法是將影像表示為一個二維的矩陣,而矩陣的值則代表影像在所在位置對於物件的貢獻值。這將會把物件定位的問題轉換為最大子陣列搜尋,目標為搜尋出代表物件位置的最大子陣列。為此,一名為滑動視窗搜尋的窮舉搜尋法在過去被提出,可是最近另一個基於分支定界的更有效率的演算法、高效子視窗搜尋、逐漸受到矚目。 物件定位之精確度的瓶頸通常發生在於組成物件貢獻分數矩陣的視覺文句的準確度。我們的所提出的多箱型交集方法能夠在縱使影像特徵矩陣永有大量雜訊的情況下,也能夠找出物件的所在位置。多箱型交集法首先在不同取樣頻率下找出複合式定界框,通常這能夠讓我們對於物件位置的有個不錯的預測,可是當中包含不少雜訊的成分。接下來我們取之其交集使之前取得的複合式定界框的雜訊降低。以此方法可以獲得比之前物件定位方法所此用之單箱型定界框還要好的定位結果。

並列摘要


Object detection and localization is one of the most important studies in the field of computer vision, allowing for the detection of natural objects in a myriad of consumer electronic systems such as photograph archives or video databases. Object detection answers the question of whether a certain object of interest is present inside an image while object localization deals with the more difficult problem of where an object exists inside an image. For the task of object localization a common implementation is to represent the image as a 2-dimensional array of object contribution values. This transforms the localization problem into maximal sub-array search where the objective is to find the highest scoring sub-array which represents the location of the object in question. To this end an exhaustive search method called sliding windows search has been proposed and recently a more efficient method based on branch and bound search called efficient subwindow search has gained popularity. Often times it is the accurateness of the visual words which form basis of object contribution score array that is the bottleneck for localization performance. With our multi-box intersection method we could locate the position of an object within an image even if there is considerable amount of noise within the image feature array. Multi-box intersection first finds composite bounding boxes over different sampling frequencies which tend to give a good estimate of where the searched object lies. Then we reduce the amount of signal noise through intersection of the obtained composite boxes. By doing so we could obtain better localization results than previous single bounding box approaches which we demonstrate in our experiments.

參考文獻


[1] C. Lampert and M. Blaschko and T. Hofmann. "Beyond sliding windows: Object localization by efficient subwindow search." IEEE Conference on Computer Vision and Pattern Recognition. 2008. 1-8.
[2] J. Bently. "Programming pearls: Writing correct programs." Communications of the ACM 27, no. 9 (1984): 865-873.
[3] J. Bently. "Programming pearls: Writing correct programs." Communications of the ACM 27, no. 11 (1984): 1087-1092.
[4] D. Lowe. "Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision 60, no. 2 (2004): 91-110.
[6] P. Viola and M. Jones, “Robust Real-Time Face Detection,” International Journal of Computer Vision, Vol. 57, May. 2004, pp. 137-154.

延伸閱讀