Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image
Yinyu Nie; Xiaoguang Han; Shihui Guo; Yujian Zheng; Jian Chang; Jian.J Zhang
1.- Total 3D Understanding: Joint reconstruction of room layout, object poses, and meshes from a single indoor scene image.

2.- Three modules: Room layout estimation, object bounding box prediction, and 3D mesh generation for each object.

3.- Joint training: Embedding outputs from the three modules to produce a semantically enriched 3D scene.

4.- 2D object detector: Generates 2D bounding boxes from the source image.

5.- Geometric and appearance features: Extracted from 2D detections using ResNet.

6.- Attention module: Obtains relational feature from the target object to its surroundings.

7.- 3D object detector: Regresses bounding box parameters (size, orientation, location) in the camera system.

8.- Layout estimation: Similar structure to the 3D detector, generates layout bounding box parameters.

9.- Mesh generation: Predicts and modifies mesh topology to approximate 3D shape for each object.

10.- AtlasNet: Used to regress 3D shape from a template sphere by concatenating appearance feature and detector category code.

11.- Edge classifier: Removes redundant faces to make shape topology identical to the ground truth.

12.- Inference: Transforms object meshes from canonical to camera system, then to room layout in the wall system.

13.- Results: Tested on single and multiple object images, compared with state-of-the-art methods.

14.- Smooth surfaces and better topology quality achieved for single object cases.

15.- Visually-appealing object meshes with reasonable placement maintained for multiple object images.

16.- Quantitative evaluation: Conducted on layout estimation, 3D object detection, object pose estimation, and mesh reconstruction.

17.- Joint training strategy: Consistently improves the method on each metric and enriches the state-of-the-art.

18.- Ablation study: Explores the effects of each design in the network.

19.- Relational feature and joint training: Contribute to scores in layout estimation, 3D detection, and mesh generation.

20.- Novel topology modifier: Provided for mesh generation.

21.- End-to-end 3D scene understanding and mesh reconstruction solution.

22.- Complementary role of each component demonstrated through joint training strategy.

23.- State-of-the-art performance reached on each task.

