透過您的圖書館登入
IP:18.221.165.246
  • 學位論文

高效能視訊壓縮之先進動態補償預估方法

Advanced Motion-Compensated Prediction (MCP) for High-Efficiency Video Coding

指導教授 : 彭文孝 HASH(0xd0fadcc) 李素瑛

摘要


動態補償預估方法(Motion-Compensated Prediction)能移除視訊訊號在時間軸上的重複性,因此是許多視訊壓縮標準中常見的壓縮技術。本論文將從理論、應用與實作等不同面向來重新探討動態補償預估方法。我們首先會將動態補償預估方法視為兩個步驟;第一個步驟為動態向量取樣,第二個步驟則為利用取樣所得之動態向量作預估值的估算。從此觀點出發,我們接著提出參數化交疊區塊動作補償(Parametric Overlapped Block Motion Compensation,POBMC)的技術來加強MCP的效率。藉由提出的參數化交疊區塊動作補償架構,我們進一步發展出一套特殊的雙向預估方法(Bi-Prediction)結合樣板比對(Template Matching)之運動向量與傳統的方塊運動向量來增加預估的效率。以下分別對本論文所提出之方法作簡介。 首先,我們以新的觀點重新解讀動作補償預測機制的運作,將MCP的結構視為動態向量取樣及亮度場(intensity field)重建兩個部份。在這新觀點中,我們也發現方塊運動向量可以近似為方塊重心點的真實運動向量。我們同時提出理論的分析來支持我們提出之新觀點並用以驗證現存常見之不同動態補償預估方法例如方塊動態補償、SKIP預估方法與樣板比對預估方法等等。實驗結果也證明提出之架構能準確並分析動態補償預估方法。 在此觀點下我們提出了參數化交疊區塊動作補償(Parametric Overlapped Block Motion Compensation,POBMC)的技術來加強MCP的效率。傳統的區塊動作補償(OBMC)是用來解決區塊動作補償(BMC)所具有之動作不確定性(Motion Uncertainty)的問題,藉由考慮鄰近區塊動作估測(Block Motion Estimation,BME)的結果,來做亮度的LMMSE估測。OBMC已被證實能夠提供較BMC為佳的編碼效率。然而在H.264/AVC採用了可變區塊大小動作補償(VBSMC)的技術下,OBMC與VSBMC的結合,變成了一大挑戰。我們透過亮度與動作自相關係數的理論模型,以及將BME產生的動作向量(Motion Vector)近似為區塊中心點動作向量的假設,提出了POBMC技術。此技術根據每個像素點各自所有的鄰近動作向量以及此像素點到各動作向量對應的區塊中心點距離,來分配最佳的權重以達到最佳的MCP效能。 最後,我們利用提出之參數化交疊區塊動作補償架構來結合樣版比對預估以及方塊動量補償預估。由於樣板比對所產生的運動向量是不需耗費位元傳送,因此所以提出之雙向預估模式只需要傳送一個方塊運動向量即可達到利用兩個運動向量作雙向預估之效果。由於樣板比對預估有運算複雜度的問題,所提出的特殊雙向預估架構更可彈性地利用任何解碼端可推導出之運動向量來取代樣版比對運動向量以達到降低複雜度的目的。實驗結果最終也證明所提出之雙向預估模式可以有效增進現行視訊壓縮效能。

並列摘要


The explosive proliferation of multimedia data in education, entertainment, sport and various applications necessitates the development of multimedia application systems and tools. As important multimedia content, sports video has been attracting considerable research efforts due to the commercial benefits, entertainment functionalities and a large audience base. The majority of existing work on sports video analysis focuses on shot classification and highlight extraction. However, more keenly than ever, increasing sports fans and professionals desire computer-assisted sports information retrieval. Even more, the umpires demand assistance in judgment with computer technologies. In this thesis, we concentrate on the feature integration and semantic analysis for sports video content understanding, indexing, annotation and retrieval from single camera video. In sports games, important events are mainly caused by the ball-player interaction and the ball trajectory contains significant information and semantics. To infer the semantic and tactical content, we first propose an efficient and effective scheme to track the ball and compute the ball positions over frames. Ball tracking is arduous task due to the fast speed and small size. It is almost impossible to distinguish the ball within a single frame. Hence, we utilize the ball motion characteristic over frames to identify the true ball trajectory, instead of recognizing which object is the ball in each frame. To retrieve more information about the games and have a further insight, we design an innovative approach of 3D ball trajectory reconstruction in single camera video for court sports, where the court lines and feature objects captured in the frames can be used for camera calibration to compute the transformation between the 3D real world and the 2D frame. The problem of 2D-to-3D inference is intrinsically challenging due to the loss of the depth information in picture capturing. Incorporating the 3D-2D transformation and the physical characteristic of ball motion, we are able to approximate the depth information and accomplish the 2D-to-3D trajectory reconstruction. Manifold applications of sports video understanding and sports information retrieval can be achieved on the basis of the obtained 2D trajectory and the reconstructed 3D trajectory, such as shooting location estimation in basketball, event detection in volleyball, pitch analysis in baseball, etc. The 3D virtual replay generated from the 3D trajectory makes game watching a whole new experience that the audience are allowed to switch between different viewpoints for watching the ball motion. In baseball, the pitch location (the relative location of the ball in/around the strike zone when the ball passes by the batter) is an important factor affecting the motion of the ball hit into the field. Strike zone provides the reference for determining the pitch location. Hence, we design a contour-based strike zone shaping and visualization method. No matter the batter is right- or left-handed, we are able to shape the strike zone adaptively to the batter’s stance. Computer-assisted strike/ball judgment can also be achieved via the shaped strike zone. In addition to the pitcher/batter confrontation, the defense process after the ball is batted also attracts much attention. Therefore, we design algorithms to recognize spatial patterns in frames for classifying the active regions of event occurrence in the field. The ball routing patterns and defense process can be inferred from the transitions of the active regions captured in the video. Furthermore, the sequences with similar ball routing and defense patterns can be retrieved for defense strategy analysis. Comprehensive experiments on basketball, volleyball and baseball videos have been conducted to evaluate the performance of the proposed methods. The experimental results show that the proposed methods perform well in retrieving game information and even reconstructing 3D information from single camera video for different kinds of sports. It is our belief that the preliminary work in this thesis will lead to satisfactory solution for sports information retrieval, content understanding, tactics analysis and computer-assisted game study in more kinds of sports videos.

參考文獻


[7] C.-C. Chen, C.-L. Lee, W.-H. Peng, and H.-M. Hang, "Ce1: Report
[1] "Video coding for low bitrate communication," ITU-T, Recommendation H.263, Apr. 1995.
[9] Y. W. Chen, T. W. Wang, Y. C. Tseng, W. H. Peng, and S. Y. Lee, "A parametric window design for obmc with variable block size motion estimates,"Proc. IEEE Int. Workshop Multimedia Signal Processing, 2009.
[11] B.-D. Choi, J.-W. Han, and S.-J. Ko, "Irregular-grid-overlapped block motion compensation and its practical application," IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 8, pp. 1221-1226, 2009.
[13] S. Kamp, M. Evertz, and M. Wien, “Decoder side motion vector derivation for inter frame video coding," Proc. Int. Conf. Image Processing, 2008.

被引用紀錄


徐苡倩(2013)。Android與iOS系統轉換意圖影響因素互探與研究〔碩士論文,國立臺中科技大學〕。華藝線上圖書館。https://doi.org/10.6826/NUTC.2013.00006

延伸閱讀