The End Of Knowledge - Vault 5/60 - CVPR - 2020 - Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

graph LR classDef total fill:#f9d4d4, font-weight:bold, font-size:14px classDef modules fill:#d4f9d4, font-weight:bold, font-size:14px classDef training fill:#d4d4f9, font-weight:bold, font-size:14px classDef detection fill:#f9f9d4, font-weight:bold, font-size:14px classDef features fill:#f9d4f9, font-weight:bold, font-size:14px classDef results fill:#d4f9f9, font-weight:bold, font-size:14px A[Total3DUnderstanding: Joint Layout,
Object Pose and
Mesh Reconstruction for
Indoor Scenes from
a Single Image] --> B[3D scene understanding from
single indoor image 1] A --> C[Room layout, object poses,
meshes joint reconstruction 2] C --> D[Room layout estimation module 8] C --> E[Object bounding box prediction
module 7] C --> F[3D mesh generation module
for each object 9] A --> G[Joint training of modules
for enriched 3D scene 3] A --> H[2D object detector generates
bounding boxes 4] A --> I[ResNet extracts geometric, appearance
features from 2D detections 5] A --> J[Attention module relates target
object to surroundings 6] F --> K[AtlasNet regresses 3D shape
from template sphere 10] F --> L[Edge classifier removes redundant
faces for topology 11] A --> M[Inference transforms meshes to
camera then layout system 12] A --> N[Tested on single, multiple
object images 13] N --> O[Smooth surfaces, better topology
for single objects 14] N --> P[Appealing meshes, reasonable placement
for multiple objects 15] A --> Q[Quantitative evaluation on layout,
detection, pose, mesh 16] G --> R[Joint training improves metrics,
enriches state-of-the-art 17] A --> S[Ablation study explores
each network design 18] S --> T[Relational feature, joint training
benefit layout, detection, mesh 19] F --> U[Novel topology modifier
for mesh generation 20] A --> V[End-to-end 3D scene understanding,
mesh reconstruction solution 21] G --> W[Joint training shows complementary
components 22] A --> X[State-of-the-art performance reached
on each task 23] class A,B,C total class D,E,F,G modules class H,I,J,K,L detection class M,N,O,P,Q results class R,S,T,U,V,W,X training

Resume:

1.- Total 3D Understanding: Joint reconstruction of room layout, object poses, and meshes from a single indoor scene image.

2.- Three modules: Room layout estimation, object bounding box prediction, and 3D mesh generation for each object.

3.- Joint training: Embedding outputs from the three modules to produce a semantically enriched 3D scene.

4.- 2D object detector: Generates 2D bounding boxes from the source image.

5.- Geometric and appearance features: Extracted from 2D detections using ResNet.

6.- Attention module: Obtains relational feature from the target object to its surroundings.

7.- 3D object detector: Regresses bounding box parameters (size, orientation, location) in the camera system.

8.- Layout estimation: Similar structure to the 3D detector, generates layout bounding box parameters.

9.- Mesh generation: Predicts and modifies mesh topology to approximate 3D shape for each object.

10.- AtlasNet: Used to regress 3D shape from a template sphere by concatenating appearance feature and detector category code.

11.- Edge classifier: Removes redundant faces to make shape topology identical to the ground truth.

12.- Inference: Transforms object meshes from canonical to camera system, then to room layout in the wall system.

13.- Results: Tested on single and multiple object images, compared with state-of-the-art methods.

14.- Smooth surfaces and better topology quality achieved for single object cases.

15.- Visually-appealing object meshes with reasonable placement maintained for multiple object images.

16.- Quantitative evaluation: Conducted on layout estimation, 3D object detection, object pose estimation, and mesh reconstruction.

17.- Joint training strategy: Consistently improves the method on each metric and enriches the state-of-the-art.

18.- Ablation study: Explores the effects of each design in the network.

19.- Relational feature and joint training: Contribute to scores in layout estimation, 3D detection, and mesh generation.

20.- Novel topology modifier: Provided for mesh generation.

21.- End-to-end 3D scene understanding and mesh reconstruction solution.

22.- Complementary role of each component demonstrated through joint training strategy.

23.- State-of-the-art performance reached on each task.

Knowledge Vault built byDavid Vivancos 2024