The End Of Knowledge - Vault 5/53 - CVPR - 2020 - Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef singleView fill:#d4f9d4, font-weight:bold, font-size:14px classDef training fill:#d4d4f9, font-weight:bold, font-size:14px classDef symmetry fill:#f9f9d4, font-weight:bold, font-size:14px classDef model fill:#f9d4f9, font-weight:bold, font-size:14px A[Unsupervised Learning of
Probably Symmetric Deformable
3D Objects from
Images in the
Wild] --> B[Unsupervised 3D from
single-view images 1] A --> C[Training: single-view images,
object category 2] A --> D[Predicts instance 3D
from single image 3] A --> E[Many objects exhibit
bilateral symmetry 4] A --> F[Pipeline disentangles shape,
pose, texture 5] F --> G[Trained with reconstruction
loss, renderer 6] E --> H[Symmetry avoids
degenerate solutions 7] H --> I[Flipped predictions minimize
reconstructions simultaneously 8] E --> J[Lighting, albedo separated,
symmetry enforced 9] E --> K[Asymmetries modeled with
confidence maps 10] B --> L[Strong priors
learned for faces 11] B --> M[Applied to video
without fine-tuning 12] J --> N[Easy relighting with
albedo, lighting 13] C --> O[Trained on cats,
supervision-free 14] E --> P[Symmetry plane rendered
on inputs 15] K --> Q[Confidence visualizes
modeled asymmetries 16] E --> R[Ablations: symmetry on
albedo, depth 17] F --> S[Shading helps avoid
bumpy shapes 18] K --> T[Confidence effectively models
asymmetric perturbations 19] A --> U[Unsupervised deformable 3D
with cues 20] F --> V[Unsupervised intrinsic
image decomposition 21] B --> W[Web demo:
faces, cats 22] A --> X[Code available for
research, reproducibility 23] A --> Y[Q&A sessions
at CVPR Live 24] A --> Z[Summary, web demo,
code invitation 25] class A main class B,C,D,L,M,O,W singleView class F,G,V,S training class E,H,I,J,K,P,Q,R,T,U symmetry class N,X,Y,Z model

Resume:

1.- Learning deformable 3D objects from single-view images without manual annotations or additional supervision.

2.- Training only requires a set of single-view images of a certain object category.

3.- After training, the model predicts instance-specific 3D shapes from a single input image.

4.- Many objects in the world, including animals and man-made objects, exhibit bilateral symmetry.

5.- Photogeometric autoencoding pipeline disentangles 3D shape (depth map), pose, and texture from an input image.

6.- The pipeline is trained with reconstruction loss using a differentiable renderer.

7.- Symmetry is used to avoid degenerate solutions by enforcing prediction of a symmetric view of the object.

8.- Canonical predictions are flipped to obtain two reconstructions, minimizing both reconstructions simultaneously.

9.- Lighting and intrinsic albedo are separated, enforcing symmetry only on the albedo to handle asymmetric lighting.

10.- Asymmetries in albedo or deformed shapes are accounted for using uncertainty modeling with confidence maps.

11.- The model learns strong priors on human faces and generalizes well to abstract faces, including drawings and emojis.

12.- The trained model can be applied to video frames without fine-tuning.

13.- Objects can be easily relit with different lighting conditions due to the intrinsic albedo and lighting decomposition.

14.- The model was also trained on cat faces, which wouldn't be possible with methods requiring additional supervision.

15.- Symmetric canonical views allow for easy rendering of the symmetry plane on input images.

16.- Asymmetries modeled by the confidence model can be visualized.

17.- Ablation studies demonstrate the importance of symmetry constraints on both albedo and depth.

18.- Predicting shading from directional light helps avoid bumpy shapes and utilizes shading cues.

19.- Confidence maps effectively model asymmetries, as demonstrated by experiments with asymmetric perturbations on images.

20.- The unsupervised method learns deformable 3D objects from single-view images using symmetry and shading as geometric cues.

21.- Intrinsic image decomposition is achieved without supervision.

22.- A web demo is available for users to try the model with their own faces or cats.

23.- The code is available online for reproducibility and further research.

24.- CVPR Live Q&A sessions are scheduled to address questions and discuss the work.

25.- The presentation concludes with a summary of the key contributions and an invitation to explore the web demo and code.

Knowledge Vault built byDavid Vivancos 2024