透過您的圖書館登入
IP:18.220.13.70
  • 學位論文

電腦視覺特徵值萃取於字幕視訊處理及視訊防手震系統設計之研究

Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques

指導教授 : 蔡宗漢
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,電腦視覺已成為一項重要的研究領域,相關研究提出許多電腦視覺演算法,而這些演算法可用來設計各種不同的應用系統,這些應用系統可進一步於硬體平台上實現。基於此背景,這篇論文即以電腦視覺為基礎設計一個新穎的系統。而在電腦視覺中,特徵萃取是一項十分重要的技術,該技術可實現各種不同的系統。在眾多的特徵中,考量字幕特徵在視訊內容的語意特性以及全域運動特徵在防手震、視訊編碼的重要性,本論文針對字幕及全域運動特徵萃取與應用於字幕視訊處理及防手震進行系統層面的探討研究,   對於字幕特徵萃取而言,現今有許多字幕被嵌入在視訊中,這些字幕有時是廣告、沒有用途的,因此,需要將字幕去除並修補視訊內容。然而,由於受到大尺寸字體、結構區域及各種不同類型視訊的影響,造成極少傳統方法能完整的修補整體視訊內容。為了回應此需求,這個研究提出了字幕視訊修補技術,這項技術是基於結構修復及紋理傳遞。為了修復結構區域,結構內插法使用新提出的旋轉區塊比對法來估算修補區域的初始位置,之後再更新修補區域的位置,周圍畫面的資訊用來填入結構區域。結構延伸法採用SP-Line曲線估算來修補結構區域,以達到不需手動修補的目標,最後衍生傳遞實現了紋理區域的修補。實驗操作採用真實的電視視訊,由實驗結果可得知所有的字幕區域都能被完整修補且保持空間時間域的一致性。此外,比較結果顯示本論文所提出方法的效能優於其它傳統方法。本論文方法的優點包含透過整合多個畫面的結構內容來減少設計的複雜度、及達到真實視訊結構內容修補的一致性。   此外,本論文利用嵌入字幕特徵來達到智慧型多媒體的顯示模式,我們設計了一個字幕子母畫面顯示系統,這個系統可以擷取子頻道的字幕並且與主頻道的視訊結合,這個系統建置於雙核心平台以達到即時字幕擷取及顯示。這個研究提出一個排程方式以進行擷取及顯示的分工運作。其次,本論文設計一個資料傳輸機制用來達到有效的資料傳輸,當中有些資料可重複被利用。再者建立SIMD機制以加速字幕擷取中大量迴旋及累加的運算。針對最佳化標籤及填入工作,開發了四倍緩衝、多重記憶體及多重工作技術。從評估結果驗證了所提出的技術能加快字幕子母畫面擷取及顯示的處理速度,比較結果也顯示了這個機制在實現字幕擷取上比其它方法更有效率。   對於全域運動特徵萃取而言,為了能移除視訊中不必要的晃動,本論文提出一個基於全域運動特徵萃取的視訊防手震演算法。為了達到即時視訊防手震,所提出的演算法實現於雙核心硬體平台。本論文之方法經由計算區域運動得知全域運動特徵值,而區域運動透過特徵區塊比對以減少運算量。基於靜止背景代表全域運動特徵值的假設,提出一個背景運動模型。首先,直方圖統計估算初始全域運動,之後,修正程序透過背景運動模型修正全域運動,最後,以所估算的全域運動對視訊內容進行防手震處理。此外,為了提升防手震的效能,提出最佳化相關技術。防手震工作被分割及排程於雙核心平台執行。函式簡化法對特徵點選擇中的回應函式進行最佳化處理。此外,以區域範圍記憶體存取及最佳化絕對誤差和提升特徵區塊比對的速度,同時也針對全域運動特徵值估算進行最佳化。實驗結果顯示所提出的防手震演算法能正確估算全域運動並產生正確的防手震視訊,比較結果也證實本論文所提出的方法有較其它方法更高的防手震效能。同時,基於驗證結果,所提出的最佳化方法能增加防手震效能以達到即時處理運算。

並列摘要


Computer vision has become an important research field in recent years. Many computer vision-based algorithms are proposed to design various novel systems. Theses systems could be further implemented and realized on an embedded platform. Based on this background, this thesis designs a novel computer vision-based system. Generally, feature extraction is an important technique in computer vision, which can realize various systems. Among the different features, text feature has high level of semantics, and global motion has more importance in video stabilization and video coding. Thus, this thesis addresses the feature extraction of text and global motion and its applications on text-video inpainting and video stabilization.  For text feature extraction, today, more superimposed text is embedded within videos. Usually some text is unnecessary. Thus, one requires an approach to remove the text and inpaint the video. However, few conventional approaches inpaints the video well due to the large-sized text, structure regions, and various types of videos. In response, this study designed a text-video inpainting algorithm that poses text-video inpainting as structure repair and texture propagation. To repair the structure regions, the structure interpolation uses the new model’s rotated block matching to estimate the initial location of inpainted regions and later refine the coordinates of inpainted regions. The information in the neighboring frames then fills the structure regions. To inpaint the structure regions without tedious manual interaction, the structure extension utilizes the spline curve estimation. Afterwards, derivative propagation realizes the texture region inpainting. The experiment results are based on several real text-video, where all of the text regions were inpainted with spatio-temporal consistency. Additionally, comparisons present that the performance of the proposed algorithm is superior to those of conventional approaches. Its advantages include the reduction of design complexity by only integrating the structure information in multi-frame and the demonstration of structure consistency for realistic videos.  Additionally, some text feature is important information. Thus, this research utilizes embedded text to achieve an intelligent multimedia display. We design a text in picture (TiP) display system which can extract the texts in the subchannel and then combine these texts with the main channel. This system was constructed on a dual-core platform to reach real-time text extraction and display. A schedulable design framework was proposed to partition the TiP display with text extraction in pipeline running. A data-aware transfer scheme was designed in which some data can be reused. Single instruction multiple data (SIMD) based mechanisms were created to enhance the computational efficiency on numerous convolutions and accumulations in text extraction. Quadruple buffering was manipulated to process the input/output in text extraction simultaneously. To optimize the labeling and filling tasks, the multi-banking and multi-tasking were developed. The evaluation results indicated that the proposed techniques can speed up the processing time of TiP display with text extraction. The equivalent comparison presented that the proposed techniques are more proficient at realizing text extraction.  For global motion feature extraction, to remove the unwanted vibration in video, a robust video stabilization algorithm based on global motion feature extraction is proposed. To achieve real-time video stabilization, the proposed algorithm is realized on a dual-core embedded platform. In our approach, the global motion is calculated from the local motion. The local motion is derived from feature-centered block matching with lower computation. Based on the assumption that the motion of static background represents the global motion, a background motion model is proposed. The histogram-based computation operates the local motion for initial global motion estimation. Afterwards, global motion is refined by an updating procedure. The updating procedure updates the global motion based on the background motion model. Finally, the video is smoothed and stabilized based on the computed global motion. In addition, to enhance the performance of video stabilization on an embedded platform, several novel optimization approaches are proposed. The video stabilization tasks are partitioned and scheduled on the dual cores. A function simplification approach is designed to optimize the response function in feature-point selection task. Moreover, the speed of feature-centered block matching is enhanced by the region-based memory access and sum of absolute difference (SAD) optimization. As well as, the global motion estimation is optimized. The experimental results present that the proposed video stabilization approach can accurately estimate the global motion and produce well stabilized videos. The comparison also demonstrates our superior performance on video stabilization. Based on the evaluation results, the proposed optimization approaches can significantly increase the performance of video stabilization for real-time processing.

參考文獻


[1] M. R. Lyu, J. Song, and M. Cai, “A comprehensive method for multilingual video text detection, localization, and extraction,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 2, Feb. 2005.
[2] T. H. Tsai, Y. C. Chen, and C. L. Fang, “2DVTE: a two-directional text extractor for rapid and elaborate design,” Pattern Recognition, vol. 42, no. 7, Jul. 2009.
[3] W. Kim and C. Kim, “A new approach for overlay text detection and extraction from complex video scene,” IEEE Trans. on Image Processing, vol. 18, no. 2, Feb. 2009.
[6] G. Bleser and D. Stricker, “Advanced tracking through efficient image processing and visual-inertial sensor fusion,” in Proc. of the IEEE Virtual Reality Conf., 2008.
[7] C. N. Chiu, C. T. Tseng, and C. J. Tsai, “Tightly-coupled MPEG-4 video encoder framework on asymmetric dual-core platforms,” in Proc. IEEE Int''l Symposium on Circuits and Syst., 2005.

被引用紀錄


傅泓翊(2012)。影片字幕檢索系統以臺大文學講座系列影片為例〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2012.00918

延伸閱讀