利用深度學習之影片深度估測建構 3D 多視角智慧風格轉換

3D 智慧風格轉換於最近一兩年備受關注，而現有的 3D 風格轉換方法主要是透過預估影像中的場景在三維空間中的位子後再進行 3D 場景的風格轉換，並結合多視角來預估目標視角的成像。然而目前的 3D 風格轉換方法需要依賴耗時比較久的全局最佳化方法來預估深度與相機座標，因此，我們的目標是藉由較有效率的深度學習模型所生成的深度與相機座標，並且結合現有 3D 風格轉換模型來完成多視角影像估測。其中這個改變方法需要面臨的問題是不同時間點所產生的3D 座標系統會不一致，而使得現有的 3D 風格轉換方法無法產生正常的轉換成果。而我們以建構局部且隨時間增加和刪減的 3D 點雲，加上改良現有的 3D 風格轉換方法，來產生既沒有破壞 3D 場景架構也不會因點雲變換而不停閃爍的風格轉換成果。這個方法最大的好處在於我們的架構不需要依賴特定的 3D 建構方法也能產生與現有方法相似的成果，減少了許多時間成本。我們同時也提供了 VR 體驗讓使用者可以觀賞 3D 風格轉換影片。

關鍵字

多視角；風格轉換；立體影片；虛擬實境

並列摘要

3D style transfer has been investigated for about two years. Recently developed methods either used neural radiance field or point-cloud generation to apply for 3D scene style transfer and they combined novel view synthesis to predict target views. However, these works are faced with time-consuming problems because they need globallyconsistent optimization. We aim to speed up by applying the learning-based structurefrom-motion(SfM) module to SOTA. The naive combination causes problems due to the inconsistency of 3D coordinates between different views generated by local-optimized SfM-learners. To overcome this issue, we use the sliding-window point cloud set that adds points close to the current view and removes points far away from it to make sure the results in 3D space won’t be affected by the 3D difference. Stylizing different point cloud sets may generate flickering results; therefore, we modified the style transfer module we use to deal with the flickering problem. The experiment shows that our reformed method can accomplish comparable visual results to the original style transfer module, while we can utilize a much more efficient SfM constructor compared with their method. Besides, we implement novel view synthesis applications like stereo videos in a Virtual Reality system for the visual experience.

並列關鍵字

Novel View Synthesis ； Style Transfer ； Stereoscopic Video ； Virtual Reality

參考文獻

[1] J. Bian, Z. Li, N. Wang, H. Zhan, C. Shen, M.-M. Cheng, and I. Reid. Unsupervised scale-consistent depth and ego-motion learning from monocular video. Advances in neural information processing systems, 32, 2019.

Google Scholar

[2] D. Chen, J. Liao, L. Yuan, N. Yu, and G. Hua. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision, pages 1105–1114, 2017.

Google Scholar

[3] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua. Stereoscopic neural style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6654–6663, 2018.

Google Scholar

[4] T. Q. Chen and M. Schmidt. Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337, 2016.

Google Scholar

[5] P.-Z. Chiang, M.-S. Tsai, H.-Y. Tseng, W.-S. Lai, and W.-C. Chiu. Stylizing 3d scene via implicit representation and hypernetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1475–1484, 2022.

Google Scholar

主題瀏覽