Knowledge Vault 5 /30 - CVPR 2017
OctNet: Learning Deep 3D Representations at High Resolutions
Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef octnet fill:#f9d4d4, font-weight:bold, font-size:14px classDef learning fill:#d4f9d4, font-weight:bold, font-size:14px classDef resolution fill:#d4d4f9, font-weight:bold, font-size:14px classDef future fill:#f9f9d4, font-weight:bold, font-size:14px A[OctNet: Learning Deep
3D Representations at
High Resolutions] --> B[3D deep learning
gaining popularity 1] A --> C[3D learning memory
requirements increase cubically 2] A --> D[3D data typically sparse 3] A --> E[Prior work leveraged
3D sparsity 4] A --> F[OctNet: octree partitioning
near surfaces 5] F --> G[Efficient shallow octrees
encode volume 6] F --> H[OctNet operations enable
end-to-end learning 7] F --> I[OctNet: higher resolution,
faster performance 8] I --> J[OctNet maintains accuracy,
resolution diminishing returns 9] I --> K[Higher resolutions benefit
certain tasks 10] A --> L[Future: learning octrees
for unknown partitioning 11] class B,C,D,E learning class F,G,H,I,J,K octnet class L future


1.- Deep learning for 3D data is becoming popular, with applications in shape classification, semantic scene completion, 3D reconstruction, etc.

2.- Memory requirements for 3D deep learning increase cubically with input resolution, limiting networks to 64^3 resolution on a single GPU.

3.- 3D data is usually sparse - point clouds cover large areas with low density, voxelized meshes have decreasing occupancy at higher resolutions.

4.- Previous work exploited 3D data sparsity: field probing networks, PointNet (lacks local structure), sparse convolutions (memory increases after each convolution).

5.- OctNet focuses memory and computation near surfaces using a space partitioning function - an octree with smaller cells near surfaces.

6.- Shallow octrees with fixed depth placed in a grid efficiently cover the volume. Encoded as bit strings for fast GPU implementation.

7.- OctNet defines convolution, pooling, unpooling operations on the irregular octree structure, which are differentiable for end-to-end learning.

8.- OctNet enables the same network to fit up to 256^3 resolution, is faster than dense networks beyond 64^3 resolution.

9.- OctNet maintains similar classification accuracy as dense networks. Input resolution has diminishing returns, 32^3 or 64^3 is often sufficient.

10.- Higher resolutions help more for orientation estimation and semantic 3D point cloud labeling. 256^3 needed for state-of-the-art results.

11.- Future work: learning to generate octrees for tasks like depth completion where the space partitioning is not known a priori.

Knowledge Vault built byDavid Vivancos 2024