網格用於高斯潑濺的即時動態場景渲染

新視角合成在3D視覺應用中扮演著關鍵角色，如虛擬現實（VR）、增強現實（AR）和電影製作，能夠生成從場景中任意視點拍攝的圖像。這一任務對於動態場景至關重要，但由於複雜的運動和稀疏的數據，面臨諸多挑戰。為了解決這一任務並推動高質量渲染，神經輻射場（NeRF）被提出來通過使用神經隱式函數來表示場景。此外，其繼任者3D高斯濺射（3D-GS）通過採用高效的3D高斯投影進一步加速了渲染速度達到實時水平。然而，由於3D-GS的顯式特性，在使用4D數據渲染動態場景時，即使通過考慮高斯變形場來擴展3D-GS至4D高斯濺射（4D-GS）以實現準確和高效的渲染，其訓練和渲染成本仍然昂貴。為了解決這些問題，本論文引入了基於網格的4D-GS方法，通過網格表示來進一步提高4D-GS在動態場景渲染中的效率。通過整合網格，我們的框架優化了計算複雜度，加速了處理時間，顯著減少了內存使用，同時保持了高渲染質量。這些在實時渲染能力方面的進步為未來在動態場景操控、重建和下游任務中的應用鋪平了道路。

關鍵字

深度學習；三維電腦視覺；三維高斯潑濺；動態場景渲染

並列摘要

Novel view synthesis plays a pivotal role in 3D vision applications like virtual reality (VR), augmented reality (AR), and movie production, enabling the generation of images from arbitrary viewpoints within a scene. This task, essential for dynamic scenes, faces challenges due to complex motion and sparse data. Neural Radiance Field (NeRF) has been proposed to tackle the task and advanced this field for high-quality rendering by representing scene with neural implicit functions. In addition, its successor, 3D Gaussian Splatting (3D-GS), have further accelerated the rendering speed to be real-time by employing efficient 3D Gaussian projections. However, due to the explicit nature of 3D-GS, when rendering dynamic scenes with 4D data, the training and rendering costs are still expensive, even after extending 3D-GS to 4D Gaussian Splatting (4D-GS) by considering temporal dynamics through leveraging Gaussian deformation fields for accurate and efficient rendering. To address these issues, this thesis introduces Voxels to 4D-GS, enhancing 4D-GS with voxel-based representations for even greater efficiency in dynamic scene rendering. By integrating voxels, our framework optimizes computational complexity, accelerates processing times, and reduces memory usage significantly while maintaining high rendering quality. These advancements in real-time rendering capabilities pave the way for future applications in dynamic scene manipulation, reconstruction and downstream tasks.

並列關鍵字

Deep Learning ； 3D Computer Vision ； 3D Gaussian Spatting ； Dynamic Scene Rendering

參考文獻

[1] B. Attal, J.-B. Huang, C. Richardt, M. Zollh ̈ofer, J. Kopf, M. O’Toole, andC. Kim. Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling.arXiv preprint arXiv:2301.02238, 2023.

Google Scholar

[2] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, andP. P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neuralradiance fields. In Proceedings of the IEEE/CVF International Conference onComputer Vision (ICCV), pages 5855–5864, 2021.

Google Scholar

[3] A. Cao and J. Johnson. Hexplane: A fast representation for dynamic scenes.In Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 130–141, 2023.

Google Scholar

[4] J. Fang, T. Yi, X. Wang, L. Xie, X. Zhang, W. Liu, M. Nießner, and Q. Tian.Fast dynamic radiance fields with time-aware neural voxels. arXiv preprintarXiv:2205.15285, 2022.

Google Scholar

[5] S. Fridovich-Keil, G. Meanti, F. Warburg, B. Recht, and A. Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. arXiv preprintarXiv:2301.10241, 2023.

Google Scholar

主題瀏覽