We compare variants of our prior model trained on a synthetic dataset, a real dataset, and a mixed dataset with both real and synthetic images (with a 50/50 split). All models are trained with the same total number of multiview frames (N=19,500)
The first row show shows examples of the synthetic images for training the prior model. The second row shows the initialization before finetuning after warm-up (left) and the finetuned result (right). All finetuned results are generated from three inputs.