Ablations

Synthetic vs. Real Training Data

We compare variants of our prior model trained on a synthetic dataset, a real dataset, and a mixed dataset with both real and synthetic images (with a 50/50 split). All models are trained with the same total number of multiview frames (N=19,500)

The first row show shows examples of the synthetic images for training the prior model. The second row shows the initialization before finetuning after warm-up (left) and the finetuned result (right). All finetuned results are generated from three inputs.

Synthetic

Pre-training Samples
Initialization
Result

Real

Pre-training Samples
Initialization
Result

Mixed (50/50)

Pre-training Samples
Initialization
Result