透過您的圖書館登入
IP:3.144.93.73
  • 學位論文

基於耳機發出之聲波估計頭部姿勢

Head Pose Estimation Using Headphone Emitted Acoustic Wave

指導教授 : 蔡欣穆

摘要


近年來,越來越多的教學機構因為線上學習的諸多好處而採用此教學模式, 尤其是在2020年,它成為主流的教學模式之一,並且, 自從COVID-19大流行之後,線上學習被更廣泛的使用。因此 在不侵犯學生隱私的情況下獲取學生的頭部姿勢來評估他們對課程的參與成度成為一個實際問題。在本文中,我們提出了 AccuSense, 這是一個估計頭部姿勢的系統,此系統是利用現成的音頻設備作為追踪設備以收集用戶頭部姿勢的數據,藉此來解決隱私問題。為了實現這一點,我們利用人耳不易聽見的音頻訊號作為系統傳送的訊號。我們將傳送訊號與影片的原始音頻混合,通過耳機播放並由麥克風接收。AccuSense之後會藉由估計耳機和麥克風之間的距離來追蹤耳機的位置並估計頭部旋轉角度。距離估計的演算法是基於FMCW,FMCW 是測距系統中常用的一種訊號調變方式。我們進一步取出頭部姿勢的特徵以進行頭部是的分類,並使用支持向量機 (svm)對一些例外狀況做進一步的分類。我們分別使用三個麥克風和兩個麥克風來實現我們的系統。根據我們的系統評估,系統分別以三個麥克風和兩個麥克風實做時,分辨頭部姿勢的準確度分別超過 82%和 75%,耳機軌跡誤差分別為2.04公分和 5.82公分,旋轉角度誤差分別小於4.5度和7度。

並列摘要


More and more teaching institutions adopt online learning in recent years because of its benefits, especially in 2020, it becomes one of the mainstream teaching-learning modes and has been used extensively since COVID-19 pandemic. Obtaining students' head pose for evaluating their engagement level without invading their privacy becomes a practical issue. In this paper, we propose AccuSense which is a head pose estimation system, and it use the off-the-shelf and accessible audio devices as tracking device to collect user's head pose data without privacy concerns. To achieve this, we leverage inaudible audio signal as probing signal. We mix the probing signal with the original audio of the video, play it through headphone and received by microphones. AccuSense then tracks the headphone position and estimates head rotation angle by estimating the distance between headphone and microphones. The distance estimation algorithm is based on FMCW which is a modulation commonly used in ranging system. We further extract signal features for head pose classification and use support vector machine (svm) for special cases. We implement our system with three-microphone and two-microphone for 3D space positioning. Our evaluation shows that our system can achieve head pose estimation by using three-microphone and two-microphone with accuracy over 82% and 75%, trajectory error of 2.04 cm and 5.82 cm, and rotation angle error less than 4.5 degrees and 7 degrees.

參考文獻


[1] H. Abedifirouzjaei, G. Shaker, and C. magnier. Improving passenger safety in cars using novel radar signal processing.
[2] C. Cai, R. Zheng, J. Li, L. Zhu, H. Pu, and M. Hu. Asynchronous acoustic localization and tracking for mobile targets. IEEE Internet of Things Journal, 7(2):830–845,2020.
[3] Z. Cao, T. Simon, S.­E. Wei, and Y. Sheikh. Realtime multi­-person 2d pose estimation using part affinity fields. In CVPR, 2017.
[4] C. Chang, C. Zhang, L. Chen, and Y. Liu. An ensemble model using face and body tracking for engagement detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI ’18, page 616–622, New York, NY, USA, 2018. Association for Computing Machinery.
[5] L. Du, Z. Zhuang, H. Guan, J. Xing, X. Tang, L. Wang, Z. Wang, H. Wang, Y. Liu, W. Su, S. Benson, S. Gallagher, D. Viscusi, and W. Chen. Head-­and-­face anthropometric survey of chinese workers. The Annals of occupational hygiene, 52, 09 2008.

延伸閱讀