透過您的圖書館登入
IP:3.145.77.114
  • 學位論文

建構智慧型互動系統於數位典藏之研究

Development of an Interactive Platform for Digital Archive Applications

指導教授 : 林維暘
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著科技的進步,多媒體展示已經成為博物館展覽中一個不可缺少的功能。數位技術的進步使得數位互動裝置的探討隨著增加。直覺性互動裝置,如多點觸控螢幕和攝影機,幾乎成了不可或缺的,並且明顯的提升娛樂性和吸引參觀者。本研究的重點在探討使用攝影機提供參觀民眾在多媒體展示中進行直覺的互動。首先,本研究提出一個平台,U-Garden,可以幫助多媒體設計師以使用攝影機感測裝置開發以移動為基礎的互動專案。本研究接著以此為基礎建立多媒體展示工具,此系統藉由深度影像建構直覺性的互動能力,提供人體動作追蹤數據,讓多媒體設計師據此產生許多引人入勝的直覺互動展示。 本論文的第二部分著重在探討人體動作辨識問題。這部分主要處理兩項研究議題,分別是動作編碼(motion representation)和子空間學習(subspace learning)。為了有效地分析人體動作,本研究結合了信號距離特徵(signal distance feature)和寬度特徵(width feature),然後將這兩個特徵結合成為一個更具識別能力的特徵。這些特徵隨後運用k-means分群,以量化成中間層特徵。在中間層特徵,我們使用無參數加權特徵萃取(Nonparametric Weighted Feature Extraction)演算法建構一個緊實且有鍵別力的子空間模型(subspace model)。接著,我們使用貝氏分類器(Bayes Classifier)來分類人體動作。本研究在兩個公開可用的資料庫上進行一系列實驗,證明提出的系統的有效性。與現有的方法相比,我們的系統顯著地在分類階段降低計算複雜度,同時保持高度辨識力。 最後,我們在這套系統中加入臉部表情追蹤,用以識別用戶在觀展過程中的情感反應。近來,基於視訊的臉部表情模型建立與分類方法被普遍採用。然而,這類方法需要在進行分類前先正確選取基本表情(無表情)到發展成豐富表情的過程,在現實世界中這是頗具有挑戰性的任務。因此,我們提出了一個基於單張靜態影像的人臉表情萃取和辨識系統。我們的方法首先結合全域(holistic)和區域(local)的距離特徵,使臉部表情更詳細地被紀錄。這些距離特徵隨後依照Bag of Words模型建構中間層特徵。這些步驟可以有效提升資料的可分類性,因此我們應用一典型的方法,例如支持向量機(Support Vector Machine),來進行分類。我們在 Extended Cohn-Kanade (CK+)資料庫上進行多項測試實驗,實驗結果顯示出我們的方法可以準確,有效地識別臉部表情。

並列摘要


Multimedia presentations have become an indispensable feature of museum exhibits in recent years. Advances in technology have increased the relevance of studying digital communication using computational devices. Devices, such as multi-touch screens and cameras, are essential for natural communication, and apparently their applications involve entertainment to attract users. This study focused on the use of cameras to support natural interaction of visitors during museum presentations. First, we outlined a platform called the "U-Garden," comprising a set of tools to assist application designers in developing movement-based projects that employ camera tracking. We then established a rationale with which to foster the design of such presentation tools. This system provides interactive power to natural interaction based on depth image streams. The system also provides tracking results to designers for producing numerous fascinating applications. In our second work, we present a novel system for human action recognition. Two research issues, namely motion representation and subspace learning, are addressed. In order to have a rich motion descriptor, we propose to combine the distance signal and the width feature so that a silhouette can be characterized in more detail. These two features provide complementary information and are integrated to yield a better discriminative power. The combined features are subsequently quantized into mid-level features using k-means clustering. In the mid-level feature space, we apply the Nonparametric Weighted Feature Extraction (NWFE) to construct a compact yet discriminative subspace model. Finally, we can simply train a Bayes classifier for recognizing human actions. We have conducted a series of experiments on two publicly available datasets to demonstrate the effectiveness of the proposed system. Compared with the existing approaches, our system has a significantly reduced complexity in classification stage while maintaining high accuracy. Finally, we also work to extend the proposed system with a facial expression tracker to recognize users' affective responses during presentation. In the recent years, the video-based approach is a popular choice for modeling and classifying facial expressions. However, this kind of methods require to segment different facial expressions prior to recognition, which might be a challenging task given real world videos. Thus, we propose a novel facial expression recognition method based on extracting discriminative features from a still image. Our method first combines holistic and local distance-based features so that facial expressions could be characterized in more detail. The combined distance-based features are subsequently quantized to form mid-level features using the bag of words concept. The synergistic effect of these steps leads to much improved class separability and thus we can use a typical method, e.g., Support Vector Machine (SVM), to perform classification. We have conducted several experiments using the Extended Cohn-Kanade (CK+) dataset. The results indicate the effectiveness of the proposed method as evidenced by the high recognition rate.

參考文獻


[1] J. Aggarwal and Q. Cai. Human motion analysis: a review. Computer Vision and
Image Understanding, 73(3):428–440, 1999.
[3] O. Aran and L. Akarun. Multi-class classification strategies for fisher scores of gesture and sign sequences. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pages 1–4, dec. 2008.
[4] O. Aran and L. Akarun. A multi-class classification strategy for fisher scores: Application to signer independent sign language recognition. Pattern Recogn., 43(5):1776–1788, May 2010.
[5] O. Aran, I. Ari, L. Akarun, B. Sankur, A. Benoit, A. Caplier, P. Campr, A. Carrillo, and F.-X. Fanard. Signtutor: An interactive system for sign language tutoring. MultiMedia, IEEE, 16(1):81–93, jan.-march 2009.

延伸閱讀