透過您的圖書館登入
IP:216.73.216.100
  • 學位論文

基於視覺和聽覺的教學影片內容分析與分類

Content-Based Lecture Videos Analysis and Classification Based on Audio and Visual Cues

指導教授 : 李忠謀
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


現在大部分的教室仍使用黑板,以黑板授課的教學影片亦相當普及,但黑板授課的教學影片在多媒體語意分析的領域深具挑戰性但極少被討論。本論文針對黑板授課的教學影片,提出一個基於視覺和聽覺的研究方法,針對講者的肢體行為與語音內容進行探討,用以提醒學生在不同時段的教學影片上要投入多少的注意力。在視覺分析上,針對講者於教學中出現的各種姿態作分析,辨別出講者姿態所代表的意義;而在聽覺分析上本研究提出一個基於語音情緒辨識的模型,針對講者的語音內容將講者語音分類為快樂、生氣、厭倦、悲傷、正常等五種聲音情緒,再藉由講者語音情緒上的變化來分析講者的教學狀態。 綜合視覺與聽覺的分析結果,我們可以評估出講者在教學時候各時段的重要性,同時也反映語意的強度。學習者可以根據每個時段下講者教學的重要性投注適當的注意力,讓學習者更有效率的藉由教學影片學習。

並列摘要


Most of the classrooms come with blackboards, and blackboards are widely used as a teaching prop in lecture video recordings. However, there are very few discussions about lecture video recordings that use blackboard as teaching prop concerning its multimedia semantics analysis. The article used a visual and optical based research method to explore speaker’s body languages and tone of speech in the blackboard lecture recordings, and how the amount of attention to pay in different segments of lecture recordings to enhance students’ learning. The visual analysis focused on semantics implied in speaker’s postures. The optical analysis focused on the variations of speaker’s speech emotions in his flow of teaching. The article proposed a speech emotion recognition model that divides speech emotions into five categories of happy, angry, bored, sad, and normal. The results of the analysis showed semantic intensity of the speaker and the importance of speakers teaching in different segments, and how students can learn more effectively with their variations in amount of attention according to the importance of speakers’ teaching throughout lecture video recordings.

參考文獻


[1] Ying Li, Shrikanth Narayanan, C.-C. Jay Kuo, “Content-Based Movie Analysis and Indexing Based on AudioVisual Cues,” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO.8 , AUGUST 2004.
[2] C. Krishna Mohan, B.Yegnanarayana , “Classification of sport videos using edge-based features and autoassociative neural network models,” Signal, Image and Video Processing, 4, 1: 61-73.
[3] Cannon, W.B. , “Again the James-Lange theory of emotion: a critical examination and an alternative theory”, Am J. Psychol, 39.106-24,1931.
[5] Picard R.W., “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State”, IEEE Transactions on Pattern Analysis and Machine Intelligence Vol 23,no. 10.October 2002.
[8] B. Schuller, G. Rigoll and M. Lang(2004).“Speech Emotion Recognition Combining Acoustic Features and Linguistic Information in a Hybrid Support Vector Machine – Belief Network Architecture”, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, vol. 1, pp. 577-580.

被引用紀錄


陳南羽(2017)。多媒體學習認知理論於教學影片設計與呈現之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.00246

延伸閱讀