Knowledge Vault 5 /7 - CVPR 2015
Category-Specific Object Reconstruction from a Single Image
Abhishek Kar, Shubham Tulsiani, João Carreira, Jitendra Malik
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef pipeline fill:#f9d4d4, font-weight:bold, font-size:14px classDef model fill:#d4f9d4, font-weight:bold, font-size:14px classDef reconstruction fill:#d4d4f9, font-weight:bold, font-size:14px classDef results fill:#f9f9d4, font-weight:bold, font-size:14px classDef misc fill:#f9d4f9, font-weight:bold, font-size:14px A[Category-Specific Object Reconstruction
from a Single
Image] --> B[3D object reconstruction
from single image 1] A --> C[Offline deformable model,
online reconstruction 2] C --> D[Models capture intra-class
shape variation 3] C --> E[Deformable models
from 2D images 4] A --> F[Pipeline: annotation, pose
estimation, modeling 5] F --> G[Non-rigid structure-from-motion
estimates poses, keypoints 6] F --> H[Models built by
deforming mesh 7] F --> I[Energy minimization: data,
priors, manifold 8] I --> J[Block coordinate descent
solves shape 9] A --> K[Good core shape,
thin structures 10] K --> L[Data-driven deformation modes
reflect variation 11] A --> M[Reconstruction: category knowledge,
pose, shape 12] M --> N[Pipeline: detection, segmentation,
pose, fitting 13] N --> O[Simultaneous detection and
segmentation system 14] N --> P[Viewpoint prediction system
predicts angles 15] N --> Q[Shape estimation combines
outputs, models 16] N --> R[Optimization similar to
learning, refinement 17] A --> S[High-frequency details added
using cues 18] S --> T[Shape-from-shading, intrinsic images
leverage cues 19] S --> U[Category-specific prior injects
high-frequency details 20] A --> V[Automatic reconstructions combine
models, cues 21] A --> W[Various categories reconstructed
with structures 22] A --> X[Evaluation measures mesh,
depth errors 23] X --> Y[Lower error than
CAD approach 24] A --> Z[Robustness to noisy
recognition inputs 25] A --> AA[Related work: viewpoint,
correspondence, single-image 26] A --> AB[Code released for
research use 27] A --> AC[Video demonstrates automatic
reconstructions 28] A --> AD[Possible extension: incorporate
3D meshes 29] A --> AE[Goal: learn from images,
not CAD 30] class B,M,N,Q,V,W reconstruction class C,D,E,H,J,L model class F,G,I,O,P,R pipeline class K,S,T,U,X,Y,Z results class AA,AB,AC,AD,AE misc

Resume:

1.- Goal: 3D reconstruction of objects in a single image

2.- Two-stage approach: offline deformable 3D shape model construction, online 3D reconstruction

3.- Deformable shape models capture intra-class shape variation with mean shape and principal components

4.- Deformable models built from 2D images for general object categories

5.- Pipeline: annotated image collection, camera pose estimation, deformable 3D model building

6.- Non-rigid structure-from-motion (NRSFM) estimates camera poses and 3D keypoints

7.- Deformable 3D models built by iteratively deforming mesh to explain silhouettes

8.- Energy minimization framework with data terms, shape priors, and linear manifold constraint

9.- Block coordinate descent minimizes objective to solve for mean shape and deformation basis

10.- Results: good estimate of core object shape and thin structures from Pascal VOC

11.- Deformation modes learned from data reflect variations in object categories

12.- Online reconstruction motivated by human perception: category knowledge, pose, prior shape notions

13.- Reconstruction pipeline: object detection, segmentation, pose estimation, shape model fitting, bottom-up cue integration

14.- Simultaneous detection and segmentation system (Hariharan et al.) for object detections and segmentations

15.- Viewpoint prediction system (Tulsiani and Malik) predicts three Euler angles for each detection

16.- Shape estimation by combining recognition outputs with learned shape models

17.- Optimization problem similar to learning, without keypoint term, and camera refinement

18.- High-frequency details added using low-level cues like edges and shading

19.- Shape-from-shading and intrinsic image algorithms (SIRFS by Barron and Malik) leverage bottom-up cues

20.- SIRFS modified to incorporate category-specific shape prior for injecting high-frequency details

21.- Fully automatic reconstructions obtained by combining learned models with bottom-up cues

22.- Results show reconstruction of various object categories with thin structures

23.- Empirical evaluation on Pascal 3D dataset measuring errors for deformed meshes and depth maps

24.- Comparison with CAD-based approach (Kar et al.) shows lower error using learned models and SIRFS

25.- Robustness to noisy recognition inputs demonstrated by graceful performance degradation

26.- Related work: viewpoint prediction system, correspondence-based reconstruction, 3D from single image workshop

27.- Code released for research use

28.- Video demonstrates fully automatic reconstructions using the proposed method

29.- Possible extension: incorporating moderate number of 3D meshes into learning framework

30.- Goal: move away from requiring CAD models by learning from images

Knowledge Vault built byDavid Vivancos 2024