Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Face Captures

Limitations

Our synthetic prior is trained on glasses, however only 20% of the synthetic identities wear glasses. In some cases, the fine-tuned results incorrectly paint the frame of glasses onto the skin (see the example on the first row). This happens when only a single input view shows the frame from a side angle. Furthermore, our method tends to produce bad results when the shoulders are not unobserved in the input (bottom left), and it fails for heavily out-of-distribution faces (bottom right). Note how the prior tries to reconstruct Gollum by adding glasses. Please see the paper for a discussion of limitations.

Inconsistent Frame for Glasses

Future Work: Animation

While our focus is on novel view synthesis of static faces, future work could leverage similar prior models for facial animation. To demonstrate this, we test our model by fine-tuning to 12 smartphone images and interpolating random expressions from the synthetic pre-training set. Fine-tuning to multiple expressions helps preserve the expression space of the prior model, and enables the rendering of expressions that are not seen in the input images.

Limitations

Inputs

Result

Inconsistent Frame for Glasses

Input

Result

Input

Result

Future Work: Animation

Inputs

Novel Expressions