機器人臉部表情結合唇型同步與視覺專注應用於人機互動

在先進國家人口老化過程中，產生了許多如社會福利、醫療照護、教育與各種服務的需求，而透過智慧型機器人的輔助是各國重視的課題。在二十一世紀，智慧型機器人產業也是世界各國列為前瞻優先發展的科技產業。本篇論文的主要研究主題在於類人型機器人的人機互動利用機器人臉部表情結合唇型同步與視覺專注。在類人型機器人領域最重要的一點為模仿人類的行為模式與外觀，在外觀的部分本論文所使用的機器人仿造27歲愛因斯坦的外觀所設計，並且有專利矽膠彈性體製成的人工皮膚，柔軟、彈性大的材料優勢使機器人只需使用小功率小馬達及可驅動皮膚做出生動的表情。在機構的部分機器人擁有總共30個自由度來驅動皮膚，其自由度的配置參考人體臉部解剖學中的肌肉分布，因此可以模仿人類的各種Action Units 做出合理且生動臉部動作，如基本的快樂、驚訝、害怕、噁心、生氣、難過等，以及疼痛等特殊表情。機器人的語音部分在機器人嘴部周圍共有12個自由度以及下巴的開合，我們設計了共有16種嘴型，利用機器人的語音系統可同時輸出合成的語音以及所對應的嘴型，再經由馬達同步控制達到即時唇型同步語音合成的功能。另一方面語音辨識的功能讓機器人可以了解互動者所要說的話，使互動者可以用最自然最簡單的方式「對話」來做人機互動。語音人機介面是基於Microsoft所提供的Windows作業系統環境及 SAPI (Speech Application Programming Interface) 所設計。機器人的視覺機構部分本機器人有兩顆微型攝影機裝置在眼睛中，共有三個自由度可控制眼睛的注視方向﹝包括上下同動與獨立的左右轉動﹞。而視覺系統有臉部偵測與手勢辨識的功能，臉部偵測使用Adaboost演算法，此演算法具有高效率與高準確度，並結合機器人頸部控制達到人臉追蹤的功能。而人的手勢可以提供充足的意圖與抽象的資訊，因此藉由手勢辨識的功能機器人可以取得人類的更多互動資訊，以做出適當的反應。本論文所提出的演算法與程式設計都是以標準C++以及C#程式語言，並且將所有軟體硬體整合實現在本實驗室自行開發的類人型機器人「愛因斯坦」身上。

關鍵字

類人型機器人；人機互動；語音人機介面；機器人臉部表情

並列摘要

The growing process of the elderly population produces the demands of social welfare, medical care, education and various services. The topic of the auxiliary by the intelligent robots is the important subject of the world. In 21st century, the field of intelligent robotics is a high-priority development industry. The main topic of this thesis is the human-robot interaction through robot facial expression with lip synchronization and visual attention. In the field of humanoid robotics the most important thing is to simulate human behavior and appearance. The robot face we design is based on the 27-year-old Albert Einstein. The great physical characteristics of the artificial skin “Frubber” contribute to mimic facial expressions. There are totally 30 degrees of freedom in the head, and the configuration is based on the anatomy. Therefore, the robot face can simulate so many action units of human and performs facial expressions such as happiness, surprised, fear, disgust, anger, hurt and so on. In the robot speech system, 16 lip shapes are proposed to match the visemes. The real-time motor control synchronizes the lip shapes and synthesis speech. On the other hand speech recognition lets the robot to know what the interactor said, therefore it achieves the natural and simple way to have an interaction. We use Microsoft SAPI (Speech Application programming Interface) to build the speech system. There are two cameras in the robot’s eyes with three degrees of freedom to control the gaze action. The vision system combines with face detection and pointing gesture recognition. We use the Adaboost as the algorithm of face detection, and combines robot neck control to achieve face tracking. The motion of human hands can provide abundant information about human intention and implicit meaning to the robots. Therefore through the pointing gesture recognition system the robot can receive more information and gives some responses. All the systems, user interface, software and applications proposed in this thesis are implemented with C++ and C# programming language and Microsoft SAPI. The whole system and experiments are conducted on the “Einstein”, which is a humanoid robot developed by the Intelligent Robotics and Automation (IRA) Laboratory at National Taiwan University.

並列關鍵字

humanoid robot ； human-robot interaction ； lip synchronization ； facial expression

參考文獻

[3] K. Scherer, P. Ekman, Handbook of methods in nonverbal behavior research, Cambridge University Press, Cambridge, UK, 1982.

[4] F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 1999.

[8] J. Whitehill, C. W. Omlin, “Haar Features for FACS AU Recognition,” 7th International Conference on Automatic Face and Gesture Recognition, 2006. FGR 2006, Page(s): 5 pp. – 101.

[9] Y. Li, H. Kobatake, “Extraction of Facial Sketch Images and Expression Transformation Based on FACS,” International Conference on Image Processing, 1995. Proceedings, Page(s): 520 - 523 vol.3, 1995.

[10] C. L. Breazeal, (2000), Sociable Machines: Expressive Social Exchange Between Humans and Robots. Sc.D. dissertation, Department of Electrical Engineering and Computer Science, MIT, May, 2000.

被引用紀錄

Lin, P. H. (2012). 應用於人機互動之多人表情辨識與環境氛圍辨識系統 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2012.03340

國際替代計量

機器人臉部表情結合唇型同步與視覺專注應用於人機互動

全文下載

主題瀏覽