FUSER: Feed-Forward Multiview 3D Registration Transformer and SE(3)^N Diffusion Refinement

Abstract

Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints.

This paper proposes FUSER, the first feed-forward multiview registration transformer that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency.

Building upon FUSER, we further introduce FUSER-DF, an SE(3)^N diffusion refinement framework to correct FUSER's estimates via denoising in the joint SE(3)^N space. FUSER acts as a surrogate multiview registration model to construct the denoiser, and a prior-conditioned SE(3)^N variational lower bound is derived for denoising supervision. Extensive experiments on 3DMatch, ScanNet and ArkitScenes demonstrate that our approach achieves the superior registration accuracy and outstanding computational efficiency.

BibTeX

@InProceedings{Jiang_2026_CVPR, author = {Jiang, Haobo and Xie, Jin and Yang, Jian and Yu, Liang and Zheng, Jianmin}, title = {FUSER: Feed-Forward Multiview 3D Registration Transformer and SE(3)\${\textasciicircum}N\$ Diffusion Refinement}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026} }

FUSER: Feed-Forward Multiview 3D Registration Transformer and SE(3)^N Diffusion Refinement

CVPR 2026 (Oral & Best Paper Award Candidate)

FUSER performs feed-forward multiview 3D registration in one pass, achieving fast inference speed and low GPU memory consumption.

Abstract

Demos

BibTeX

FUSER: Feed-Forward Multiview 3D Registration Transformer and SE(3)N Diffusion Refinement

CVPR 2026 (Oral & Best Paper Award Candidate)

FUSER performs feed-forward multiview 3D registration in one pass, achieving fast inference speed and low GPU memory consumption.

Abstract

Demos

BibTeX

FUSER: Feed-Forward Multiview 3D Registration Transformer and SE(3)^N Diffusion Refinement