透過您的圖書館登入
IP:18.232.188.122
  • 學位論文

利用立體相機之三維互動使用者介面之演算法與硬體架構設計

Algorithm and Architecture Design of 3D Interactive User Interface by Stereo Camera

指導教授 : 陳良基

摘要


在今日生活中,數位視頻技術扮演重要腳色。隨著顯示器科技的演進,顯示器能提供給人們越來越好的觀賞品質。立體顯示器比起傳統平面顯示器給使用者提供了更佳的觀賞經驗。立體影像技術在許多應用下豐富了這些應用中的內容,比方說電視廣播、電影、遊戲、攝影、教育…等。在現今立體影像已是如此真實的情況下,人們不會只滿足於觀賞立體影片。使用者會想要和如此逼真的立體虛擬影像有所接觸互動,比方說丟擲、觸摸、推…等。 在這篇論文中,我們提出了利用雙眼相機來進行「虛擬觸碰」互動的概念。目前一般的互動方式為使用者在電視或裝置前面來比出特定手勢或是一些身體姿勢,接著系統判斷出為哪種姿勢後便會將相對應的反應表現出來。此類的研究數量已經相當多,而且我們認為它的功能更像是取代遙控器而已。在現今立體影像已是如此真實的情況下,人們不會只滿足於觀賞立體影片。使用者會想要和如此逼真的立體虛擬影像有所接觸互動,比方說丟擲、觸摸、推…等。我們提出了一個基於雙眼相機的立體互動使用者介面,此介面能偵測使用者距離以及手部距離。當使用者手部距離與立體虛擬物件在空間座標位置到達一致時,此系統則判斷使用者達成了虛擬觸碰的條件,接著辨別使用者的操作來給出相對應虛擬觸碰的反應。立體互動使用者介面分成兩部分來探討:免校正使用者距離估計以及利用信心傳遞法來進行手部三維空間定位。 免校正使用者距離偵測是立體互動使用者介面的第一步。主要的概念就是將使用者視為一個物體,利用雙眼相機拍攝到的左圖及右圖,計算出代表使用者的視差。最後,利用這個視差便能算出使用者距離。 利用信心傳遞法來進行手部三維空間定位是立體互動使用者介面的另一部分。當我們只有使用者距離的資訊時,我們只能做一些相當簡單的互動。由於手是人類與機器最直觀也最有效的互動方式,系統必須取得手部三維空間定位,如此使用者才能進行更複雜或是精確的互動。我們利用深度以及彩色影像的資訊來達到手部三維空間定位以及一些簡單手勢的判別。 我們也提出了一個三階管線硬體架構,實現結果表明了此架構能在操作頻率200Mhz輸入左右影像皆為1080p時達到30fps之即時速度。

並列摘要


Digital video technology has played an important role in our daily life. With the evolution of the display technologies, display systems can provide higher visual quality to enrich human life. Immersive 3D displays provide better visual experience than conventional 2D displays. 3D technology enriches the contents of many applications, such as broadcasting, movie, gaming, photographing, camcorder, education, etc. However, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, such as slapping, sliding, throwing…. In this thesis, we proposed “virtual touch” interaction by using stereo camera. Common interactive way is that user can do some hand gesture or body gesture in front of TV or other devices, and then the system recognizes the gesture and some reaction which is corresponded to this gesture will be appeared. This kind of research is already quite mature, and its function more likes the remote control. Nowadays, in the case of stereoscopic display is quite mature and image is quite realistic, the user will want to interact with three dimensional virtual objects, the so-called "virtual touch" is such as slapping, sliding, throwing…. We proposed a 3D interactive user interface by stereo camera which can detect the user's hand and body's location. When the position of user’s hand and position of virtual object are consistent, then the system considers that the user achieve the “virtual touch”, and then the system will recognize the user’s operation, and therefore give the user a so-called "virtual touch" interaction. The 3D interactive user interface by stereo camera is discussed in two different parts: distance estimation by calibration-free captures and 3D hand localization by using belief propagation. The distance estimation by calibration-free captures is the first step of 3D interactive user interface. The main concept is that treats that user as an object, and from the left capture and right capture from stereo camera, calculates the disparity of the user. Finally, the user’s distance can be estimated by disparity of the user. 3D hand localization by using belief propagation is another part of interactive 3D user interface. When we only have the user’s distance from system, we can just do some simple interaction with system. Because of hand gesture is one of the most intuitive and nature ways for people to communicate with machines, so system have to get the user’s hand 3D localization, and thus the user can do more complex control or interaction with system. We use only depth and color information to get the hand’s 3D localization and do some simple gesture recognition to judge the reaction. We also proposed 3-stage architecture for hardware design, and the implementation result shows that the architecture can achieve real-time interaction of input Fll-HD1080p@30fps stereo images when operating at 200MHz.

並列關鍵字

3DUI distance estimation user interface

參考文獻


[34] Liang, C. K., Cheng, C. C., Lai, Y. C., Chen, L. G., & Chen, H. H. (2009, June). Hardware-efficient belief propagation. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (pp. 80-87). IEEE.
[1] C. Fehn "A 3DTV system based on video plus depth information", 37th Asilomar Conf. Signals, Syst. Comp., 2003.
[4] Shotton, J., et al. "Real-time human pose recognition in parts from single depth images." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
[5] Zimmerman, Thomas G., et al. "A hand gesture interface device." ACM SIGCHI Bulletin. Vol. 18. No. 4. ACM, 1987.
[6] Wang, Robert Y., and Jovan Popović. "Real-time hand-tracking with a color glove." ACM Transactions on Graphics (TOG). Vol. 28. No. 3. ACM, 2009.

延伸閱讀