The End Of Knowledge - Vault 5/35 - CVPR - 2018 - CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM

graph LR classDef codeslam fill:#f9d4d4, font-weight:bold, font-size:14px classDef representations fill:#d4f9d4, font-weight:bold, font-size:14px classDef training fill:#d4d4f9, font-weight:bold, font-size:14px classDef autoencoder fill:#f9f9d4, font-weight:bold, font-size:14px classDef keyframe fill:#f9d4f9, font-weight:bold, font-size:14px classDef testing fill:#d4f9f9, font-weight:bold, font-size:14px classDef future fill:#f9d9d4, font-weight:bold, font-size:14px A[CodeSLAM — Learning
a Compact, Optimisable
Representation for Dense
Visual SLAM] --> B[CodeSLAM: Deep learning
SLAM system 1] A --> C[Sparse vs dense
SLAM representations 2] A --> D[Depth maps: Subspace,
structural correlation 3] A --> E[Autoencoder: Encodes
depth maps 4] A --> F[Depth prediction:
Network modulates 5] A --> G[Training: CNET dataset,
end-to-end 6] G --> H[Code size:
128 dimensions 7] G --> I[Linear decoder,
grayscale images 8] E --> J[Predicted uncertainty:
Depth discontinuities 9] E --> K[Linear decoding:
Depth function 10] E --> L[Jacobian: Constant
derivative 11] E --> M[Smooth code
perturbations 12] B --> N[Keyframe-based SLAM:
Pose, code variables 13] N --> O[Dense bundle adjustment:
Photometric error 14] N --> P[Joint optimization:
Pose, codes 15] N --> Q[Optimization results:
Reconstructions achieved 16] N --> R[Speed: 10 Hz
iterations 17] B --> S[Real-world testing:
New York dataset 18] B --> T[Visual odometry:
NYU dataset 19] T --> U[Simple system:
One optimization 20] T --> V[Zero code prior:
Robustness 21] B --> W[Future: Real data,
self-supervision 22] B --> X[Network improvements:
Architecture, structure 23] B --> Y[Demo: Preliminary
live system 24] B --> Z[Generalization: Zero-code
prediction, optimization 25] class A,B,Y codeslam class C,D representations class E,F,J,K,L,M autoencoder class G,H,I training class N,O,P,Q,R keyframe class S,T,U,V testing class W,X,Z future

Resume:

1.- CodeSLAM: A deep learning and SLAM (Simultaneous Localization and Mapping) system using depth map representations.

2.- Sparse vs. dense SLAM representations: Sparse methods use key points; dense methods use point clouds, TSDFs, voxel grids, or meshes.

3.- Depth maps live on a subspace of all possible pixel values and have structural correlation with images.

4.- Autoencoder network: Encodes depth maps using image features to improve reconstruction and output uncertainty.

5.- Depth from monocular prediction: The network can modulate depth predictions using the code.

6.- Training: Used CNET dataset, Laplacian loss, Adam optimizer, and trained end-to-end from scratch.

7.- Code size: Diminishing returns with increased size; settled on 128 dimensions.

8.- Linear decoder and grayscale images: No gain from using RGB or nonlinear decoders.

9.- Predicted uncertainty: Highlights depth discontinuities in the depth map.

10.- Linear decoding: Depth is a function of code and image, split into zero code prediction and linear code term.

11.- Jacobian: Derivative of depth with respect to code is constant for a given image.

12.- Spatially smooth code perturbations: Perturbing a single code entry results in smooth depth changes.

13.- Keyframe-based SLAM: System uses pose and code variables for each keyframe.

14.- Dense bundle adjustment: Pairs keyframes, warps images using depth and relative pose, and minimizes photometric error.

15.- Joint optimization of pose and codes: A novel approach in dense SLAM.

16.- Optimization results: Photometric error decreases, and good reconstructions are achieved on CNET dataset.

17.- Speed: Iterations at 10 Hz, boosted by pre-computing Jacobians due to linear decoder.

18.- Real-world testing: Applied on New York dataset, jointly optimizing 10 frames.

19.- Visual odometry system: Tested on NYU dataset with a sliding window of 5 keyframes.

20.- Simple system: Only one optimization problem, no special bootstrapping.

21.- Zero code prior: Provides robustness, handling rotational-only motion without baseline.

22.- Future directions: Training on real data using self-supervised costs, closer coupling with optimization.

23.- Network improvements: Exploring different architectures and enforcing more structure.

24.- Demo: Co-author Jan demonstrated a preliminary live system.

25.- Generalization: Zero-code prediction worsens on dissimilar datasets; optimization allows adaptation to different scenes.

Knowledge Vault built byDavid Vivancos 2024