傳統上對於大部分的應用,臉部動畫一般是被認為重要但很繁雜的工作,主要由於臉部的肌肉很複雜且互相牽連。雖然已經有不少方法被提出來減輕藝術家的負擔,但即使如此,在製作臉部動畫上仍然沒有一個計算迅速且省容量空間的方法被提出來。 這篇論文提出了一個架構,給予任何一段語音,來生成相對應的對嘴臉部動畫。方法由追蹤訓練影像的臉部特徵開始,藉由特徵首先去辨別臉部關鍵模型,這些關鍵模型可以用來指導藝術家建立相對應的關鍵三維模型。訓練的影像接著被參數化到權重空間,經由臉部動作轉移以後,單音的權重機率模型可以藉由轉移的訓練模型學習到。 這個架構可以再非常短的時間生成一段對嘴的語音動畫,並且需求非常少量的空間。生成的動畫可以保有訓練影像的動態特徵,使得虛擬角色可以像人一樣保有說話的動態。
Facial animation is traditionally considered as important but tedious work for most applications, because the muscles on face are complex and dynamically interacting. Although there are several methods proposed to ease the burden from artists to create animating faces, non of these are fast and efficient in storage. This paper introduces a framework for synthesizing lips-sync facial animation given a speech sequence. Starting from tracking features on training videos, the method first find representative key-shapes that is important for both image reconstruction and guiding the artists to create corresponding 3D models. The training video is then parameterized to weighting space, or cross-mapping, then the dynamic of features on the face is learned for each kind of phoneme. The propose system can synthesis lips-sync 3D facial animation in very short time, and requires very small amount storage to keep information of the key-shape models and phoneme dynamics.