影片穩定技術是將晃動不穩定的影片消除抖動,並且需保留影片原有的主運動,是一項提升影片視覺品質的基本必備技術。以往的影片穩定方法大多基於二維影像平面上的轉換,因此難以處理場景深度差異過大的影片,進而在影片結果中產生扭曲的現象。我們提出了一個基於三維空間資訊進行穩定轉換的深度學習方法。我們首先利用兩個卷積神經網路構成的最佳化框架針對一部輸入影片估測其場景的深度以及相機的三維運動軌跡,其最佳化框架並不需要預先的訓練以及訓練資料,而是直接在實際使用階段對輸入影片進行學習並且最佳化估測結果。接著,將最佳化所得的相機運動軌跡做平滑處理後,根據估測的三維場景重建穩定的影片結果。其中,在平滑處理的演算法中,我們提供使用者針對同一部輸入影片進行即時調整其穩定度的功能,其為目前多數深度學習方法沒有提供的功能。據我們所知,我們的方法是第一個基於三維空間轉換的深度學習方法。並且透過質化與量化的方法與其他最先進的影片穩定方法進行比較,呈現出我們的優勢所在。
Video stabilization is to remove the noisy motion and preserve the primary motion from an unsteady video, which is an essential technique for enhancing the visual quality of videos. Most of the prior works are based on 2D transformation models so that they would suffer from the scenarios with complex scene depth. We present a novel 3D-based learning method for video stabilization. The proposed method estimates the scene depth and 3D camera motion with a CNN optimization framework and without needing pre-training and training data. After obtaining estimated depth and camera motion, the stabilization process performs smoothing algorithm on camera trajectory and synthesizes the stabilized video with 3D scene depth. Furthermore, the smoothing algorithm enables user to manipulate the stability of the same video in real time (34.5 fps), which is a fundamental function but most of the prior learning-based methods do not provide the flexibility. To the best of our knowledge, our work is the first learning method based on 3D motion model. We show the advantages of our 3D-based method quantitatively and qualitatively comparing to the state-of-the-art method.