透過您的圖書館登入
IP:3.144.38.130
  • 學位論文

3D 視訊深度影像壓縮與傳輸之研究

3D Video Depth Image Compression and Transmission

指導教授 : 林易泉
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在此論文中,我們研究類神經網路應用於立體視訊影像的深度圖壓縮,目的在於提升其壓縮效率、減少深度圖序列所需之傳輸頻寬,而提出一種應用倒傳遞類神經網路於深度影像的資料壓縮方法。我們所提出的方法是藉由類神經網路中的學習機制來達到重建深度影像之目的。在方法中,我們根據影像的內容將要壓縮的深度圖分為三種型別,每個類別都根據使用目的不同而有不同的訓練資料,並建立數個類神經網路用以處理並修正影像資料,以此概念來分類各個類神經網路所需處理的數據。 我們為了讓經過壓縮的立體影像能適應各種不同終端設備的解析度,而開發一種使用類神經網路庫的深度影像內插法於解碼端調整影像的大小。類神經網路庫是由數個類神經網路建構而成的,在不同類神經網路之間的分類概念上,我們主要根據H.264/AVC 內intra模式的數量來建立對應數量的類神經網路,以因應不同模式的預測方向。這個方法的構想參考了傳統H.264/AVC視訊標準在編碼次取樣的影片時,畫面中每個巨方塊(MacroBlock)的編碼資訊,利用H.264/AVC編碼器在進行編碼時所選擇MB的最佳預測方向資訊來輔助類神經網路做內插。 另外我們也模擬了當網路環境不穩定時,因遺失部分封包而導致深度影像不完整。如果放任區塊破損的情況維持而不進行修復,那這種錯誤會因H.264/AVC編碼器中互相參考的特性而使得錯誤一直擴散下去。所以我們提出兩種錯誤隱藏的方法,用以抑制這種錯誤的傳播。我們的第一個方法是應用在H.264/AVC解碼器的部分,利用紋理影像內移動向量的資訊來做錯誤隱藏。因為在深度影像只純粹紀錄距離的資訊,因此在重建深度影像時就需要濾除那些可能是因紋理影像中的光影變化而產生的移動向量,所以我們搭配了紋理影像的色度資訊來協助我們判斷這些移動向量的真偽。除此之外,為了能使判斷結果更為精準,我們也利用了深度影像在邊緣很明顯的這個特性,使用紋理影像的邊緣資訊來輔助向量真偽的判斷。而第二個方法是針對遺失類神經網路權重值的情況,在這個方法中我們利用在鄰近權重值相似的這個特性,取相鄰神經元的權重值做平均後,用其取代遺失的權重值,以達到錯誤隱藏的功能。

並列摘要


In this Thesis, two novel depth image compression schemes are presented. In order to increase the compression performance of depth map sequence data, this study utilizes the Back-Propagation Neural Networks (BPNNs) to learn the relationship that can be found among the texture and depth data in a 3D video. To release the burden of single neural network, multiple NNs are incorporated to classify the content of depth image into three kinds of content classes. Each NN can accept various kinds of input vectors to represent the distinct target values of pixels in a depth-map video sequence. In order to provide the flexibility for stereoscopic video terminal with varying display resolution, a Neural Networks Bank (NNB) based interpolation method is developed, which can lead to the possibility of adjusting video size in the decoder. In the proposed scheme, the Neural Networks Bank consists of several neural networks, in which the neural networks are classified into different modes according to the number of H.264/AVC intra modes. By this way, each neural network can take advantage of each specific prediction direction to offer better interpolation accuracy. The concept behind the design of NNB comes from the easy availability of encoded information of H.264/AVC in the decoder side. If the H.264/AVC encodes a subsampled video frame, in which the encoder should calculate the prediction direction of each macroblock or each 4x4-block in the macroblock according to the macroblock content, and chooses the best mode to encode the macroblock in the subsampled frame, the NNB can exploit this kind of information to decide a better direction of interpolation for the up-sampled macroblock. When transmitting coded depth map sequence through an unreliable channel, packet losses could cause the degraded quality of reconstructed stereoscopic views. In order to suppress the error propagation, the issues dealing with the losses of packets for H.264/AVC and the proposed NN coded bitstreams of depth map sequence is also addressed. For this end, two error concealment schemes at the decoder side for the decoded depth map sequence are proposed. The first technique targets for concealing the error of the received H.264/AVC bitstreams. The depth-map data records the distance from the camera to the surface of objects in the video scene, motion vectors for depth-map frames can closely reflect the actual movement of objects than those obtained for the corresponding texture frames. Therefore, the proposed technique is designed to employ the chrominance motion information and edge information of texture frame to select more reliable motion vectors that are used to restore or conceal the lost depth information and to prevent it causing greater degradation of 3D video reconstructions. The second technique is concerned the errors occurred in the transmission of neuron weights. It is found that there is a high correlation of connection weights between neighboring neurons on the same layer in a NN. Those lost connection weights can be approximated by the average of the connection weights that are correctly received by the neighboring neurons.

參考文獻


[2] C. Fehn, "Depth-Image-Based Rendering (DIBR), Compression and Transmission for a New Approach on 3D-TV," in Proc. Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291, pp. 93–104, 2004.
[4] D. V. S. X. De Silva, W. A. C. Fernando and S. T. Worrall, "Intra mode selection method for depth maps of 3D video based on rendering distortion modeling," IEEE Transactions on Consumer Electronics, vol. 56, no. 4, pp. 2735-2740, 2010.
[5] Y. C. Fan, S. F. Wu and B. L. Lin, "Three-Dimensional Depth Map Motion Estimation and Compensation for 3D Video Compression," IEEE Transactions on Magnetics, vol. 47, no. 3, pp. 691-695, 2011.
[6] F. Shao, M. Yu, G. Jiang, F. Li and Z. Peng, "Depth map compression and depth-aided view rendering for a three-dimensional video system," IET Signal Processing, vol. 6, no. 3, pp. 247-254, 2012.
[8] C. T. E. R. Hewage and M. G. Martini, "Reduced-reference quality evaluation for compressed depth maps associated with colour plus depth 3D video," in Proc. IEEE International Conference on Image Processing, pp. 4017-4020, 2010.

延伸閱讀