CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

Sungkyunkwan University
*Equal contribution
AAAI 2025

Performance Comparison on Objaverse Dataset

Performance Comparison on GSO Dataset

Abstract

Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, to establish a ubiquitous presence in everyday media formats, such as images and videos, we need to fulfill three key objectives: 1. fast encoding and decoding time, 2. compact model sizes, and 3. high-quality renderings. Despite recent advancements, a comprehensive algorithm that adequately addresses all objectives has yet to be fully realized. In this work, we present CodecNeRF, a neural codec for NeRF representations, consisting of an encoder and decoder architecture that can generate a NeRF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we propose a finetuning method to efficiently adapt the generated NeRF representations to a new test instance, leading to high-quality image renderings and compact code sizes. The proposed CodecNeRF, a newly suggested encoding-decoding-finetuning pipeline for NeRF, achieved unprecedented compression performance of more than 100x and remarkable reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets.

CodecNeRF Pipeline

Given N input images from different viewpoints, the goal is to produce a NeRF representation (multi-resolution triplanes). First, a 2D image feature extractor module processes all input images and generates 2D feature maps for each input image. Then, the unproject and aggregation module lifts the 2D features to 3D features and aggregates the unprojected 3D features into a single 3D feature. The 3D feature is further processed by axis-aligned average pooling along each axis, resulting in three 2D features. These 2D features are used to generate multi-resolution triplanes and finally, we perform the volumetric rendering to render an image using MLP. Furthermore, 2D features are compressed by the compression module, producing the minimal sizes of the codes to be transmitted. The entire pipeline is differentiable, and we train end-to-end to optimize all learnable parameters.

Similar to other NeRF generalization models, our approach can also leverage fine-tuning of NeRF representations to enhance visual quality for new scenes during testing time. We propose to adapt parameter-efficient finetuning (PEFT) in our test time optimization. We first generate multi-resolution triplanes using multi-view test images, and train only the triplanes and decoder in a efficient way. Also, we leverage neural compression methods that have demonstrated efficacy in image and video domains to seek to achieve the optimal compression rate. We adopt a fully factorized density model to our proposed parameter-efficient finetuning of the triplanes and MLP.

Entropy Coding visualization

w/ Entropy Coding (top)   vs.   w/o Entropy Coding (bottom)

We visualize the delta feature maps across finetuning iterations using our entropy coding method. The feature maps following the entropy coding, eliminate unnecessary components across the different resolutions, and sequentially get a high compression ratio resulting from quantization. The histogram shows the progress of compression during finetuning stage.

Encoding time and compression performance

For the quantitative metrics, we report the standard image quality measurements, PSNR, SSIM, and MS-SSIM. We also measure the storage requirements of the representation to show the compression performance of our method. Due to the improved generalization capability of our method, we outperform the per-scene optimization-based baseline, Triplanes, in novel view synthesis. Remarkably, the PEFT method showed comparable or better performance across the data with fewer trainable parameters. Left two figures depict the results on the Objaverse dataset, and right two figures represent the results on the GSO dataset.

BibTeX


      @misc{kang2024codecnerf,
      title={CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis}, 
      author={Gyeongjin Kang and Younggeun Lee and Eunbyung Park},
      year={2024},
      eprint={2404.04913},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
      }