In this paper, we propose a novel algorithm to generate multiple virtual views from a video-plus-depth sequence for modern autostereoscopic displays. To synthesize realistic content in the disocclusion regions from the virtual views is the main challenging problem in this task. In order to produce perceptually satisfactory images, our proposed algorithm takes advantage of spatial coherence and temporal consistency to handle the uncertain pixels in the disocclusion regions. On the one hand, regarding the spatial coherence, we incorporate the intensity gradient strength with the depth information to determine the filling priority for inpainting the disocclusion regions, so that the continuity of image structures can be preserved. On the other hand, the temporal consistency is enforced by considering the intensities in the disocclusion regions across the adjacent frames through an optimization process. We propose an iterative re-weighted framework to jointly consider intensity and depth consistency in the adjacent frames, which not only imposes temporal consistency but also reduces noise disturbance. Finally, for accelerating the multi-view synthesis process, we apply the proposed view synthesis algorithm to generate the images plus depth at the leftmost and rightmost viewpoints, so that the intermediate views are efficiently interpolated through image warping according to the associated depth maps between the two views.
在本文中,我們提出了一個新演算法可以從一個視訊加深度的影像序列去產生多個虛擬視角的視訊,並可應用於現代的自動立體顯示器上。在這個任務中,要合成原本被遮蔽的區域裡的實際內容是最主要的挑戰。為了產生令人感覺滿意的影像,我們提出的方法利用了空間的連貫性和時間上的一致性去處理在原本被遮蔽的區域裡那些不確定的點。關於空間的連貫性,我們會整合色彩的強度梯度與深度資訊來決定填補的優先順序以維持影像裡的連續性結構。至於時間一致性方面,我們會將時間上相鄰影像的色彩強度加入考慮。我們提出了一個反覆進行的方法,這個方法不僅有加入了時間一致性也降低了雜訊干擾。最後,為了加速多視角的合成過程,我們提出的影像合成演算法會先產生最左端視角點的視訊加深度和最右端視角點的視訊加深度,然後就可以透過影像轉換有效率地內插出中間的視角點影像。