Enhancing Novel View Synthesis via Geometry Grounded Set Diffusion

Farhad G. Zanjani, Hong Cai, Amirhossein Habibian

arXiv:2601.07540·cs.CV·Published 2026-01-12·Updated 2026-03-13

We present SetDiff, a geometry-grounded multi-view diffusion framework that enhances novel-view renderings produced by 3D Gaussian Splatting. Our method integrates explicit 3D priors, pixel-aligned coordinate maps and pose-aware Plucker ray embeddings, into a set-based diffusion model capable of jointly processing variable numbers of reference and target views. This formulation enables robust occlusion handling, reduces hallucinations under low-signal conditions, and improves photometric fidelity in visual content restoration. A unified set mixer performs global token-level attention across all input views, supporting scalable multi-camera enhancement while maintaining computational efficiency through latent-space supervision and selective decoding. Extensive experiments on EUVS, Para-Lane, nuScenes, and DL3DV demonstrate significant gains in perceptual fidelity, structural similarity, and robustness under severe extrapolation. SetDiff establishes a state-of-the-art diffusion-based solution for realistic and reliable novel-view synthesis in autonomous driving scenarios.

TopicsGenerative Design & Molecule Optimization

Tagsdiffusion-model

arXiv categoriescs.CV

arXiv abstract pagePDF