透過您的圖書館登入
IP:18.118.32.213
  • 學位論文

高階語意影片檢索系統之特徵選擇方法

A Feature Selection Technique for Semantic Video Indexing System

指導教授 : 陳文進
共同指導教授 : 莊永裕(Yung-Yu Chuang)

摘要


隨著現在越來越容易取得各式各樣的影片,人們也意識到要從龐大的影片中找尋特定的影片是越來越困難的事情,為了要能夠解決這項需求,針對高階語意的影片搜尋也變成影片搜尋中的主流。高階語意所指的是一些日常生活中的用語,對於使用者來說這不似低階特徵般的不熟悉,為了發展高階語意搜尋,TRECVID每年提供數百小時的影片以及公平的評分方式,因此成為了影片搜尋中的標準。有許多參與TRECVID的單位以低階特徵為基礎建出影片搜尋系統。電腦視覺的高度發展,使得越來越多具有辨識能力的低階特徵被發現出來,然而使用大量的低階特徵來訓練高階語意辨識器將會耗費極大量的時間,因此,如何有效率的使用這些低階特徵將變的越來越重要。本論文提供的特徵選擇方法,可以大幅度的減少訓練的時間,並且只會造成少量的準確度下降,甚至在我們僅使用一半的低階特徵時,可以達到98.88\%的準確度,並讓訓練時間減少了原本的36.07\%。

並列摘要


For processing the growing and easily accessing videos, users desire an automatic video search system by semantic queries, such as objects, scenes, and events from daily life. To this end TRECVID supplies sufficient video data and a fair evaluation method, annually, to progress video search techniques. Many participants build their classification through fusing results from modeling low level features (LLFs), such as color, edge, and so on. With the development of computer vision, more and more useful LLFs are designed. However, modeling all acquirable LLFs requires tremendous amount of time. Hence, how to use these LLFs efficiently has become an important issue. In this thesis, we propose an evaluation technique for LLFs, then the most appropriate concept-dependent LLF combinations can be chosen to reduce the modeling time while still keep reasonable video search precisions. In our experiments, only modeling 5 chosen LLFs out of total 16 LLFs can reduce 3.51\% modeling time with only 6.78\% performance drop. However, if a half number of LLFs are used, we can even keep 98.88\% precision with 36.07\% time saving.

參考文獻


[2] A. Amir, J. Argillander, M. Campbell, A. Haubold, S. Ebadollahi, F. Kang, M. Naphade, A. Natsev, J. R. Smith, J. Tei, and T. Volkmer. IBM research TRECVID-2005 video retrieval system. In NIST TRECVID-2005 Workshop, Gaithersburg, MD, November 2005.
[14] G. Pass, R. Zabih, and J. Miller. Comparing images using color coherence vectors. In Proc. of the fourth ACM international conference on Multimedia, pages 65–73, 1996.
[24] A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia University’s baseline detectors for 374 LSCOMsemantic visual concepts. Technical report, Columbia University, March 2007.
[6] J. Cao, Y. Lan, J. Li, Q. Li, X. Li, F. Lin, X. Liu, L. Luo, W. Peng, D. Wang, H. Wang, Z. Wang, Z. Xiang, J. Yuan, W. Zheng, B. Zhang, J. Zhang, L. Zhang, and X. Zhang. Intelligent multimedia group of Tsinghua University at TRECVID 2006. In NIST TRECVID-2006 Workshop, Gaithersburg, MD, 2006.
[19] J. R. Smith and S.-F. Chang. Visualseek: A fully automated content-based image query system. In ACM Multimedia, pages 87–98, 1996.

延伸閱讀