The End Of Knowledge - Vault 5/44 - CVPR - 2019 - Learning the Depths of Moving People by Watching Frozen People

graph LR classDef depth fill:#f9d4d4, font-weight:bold, font-size:14px classDef stereo fill:#d4f9d4, font-weight:bold, font-size:14px classDef dataset fill:#d4d4f9, font-weight:bold, font-size:14px classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px classDef applications fill:#f9d4f9, font-weight:bold, font-size:14px A[Learning the Depths
of Moving People
by Watching Frozen
People] --> B[Learning depth of
moving people 1] A --> C[Classical stereo unsuitable
for moving objects 2] A --> D[Data-driven approach using
Mannequin Challenge dataset 3] D --> E[Dataset spans scenes,
poses, people 4] D --> F[Structure-from-motion, multi-view
stereo recover poses, depths 5] F --> G[Multi-view stereo depths
train neural network 6] A --> H[Single-image prediction ignores
neighboring frames 7] A --> I[Flow between frames
converted to depths 8] I --> J[Inaccurate moving people
depths masked out 9] A --> K[Model inputs: RGB,
mask, parallax depths, confidence 10] K --> L[Network inpaints masked
depth, refines scene 11] K --> M[Model applied to
moving people videos 12] M --> N[Outperforms baselines on
TUM RGBD dataset 13] M --> O[Qualitative comparison shows
models predictions most similar 14] M --> P[Accurate, coherent predictions
on internet videos 15] P --> Q[Enables defocus, focus
pause effects 16] P --> R[Synthetic objects inserted,
occluded using depth 17] P --> S[Novel view synthesis
using near-field frames 18] P --> T[Human regions inpainted
when camera, people move 19] A --> U[Code, dataset released
on project website 20] class B,H,I,J depth class C,F,G stereo class D,E dataset class K,L,M,N,O learning class P,Q,R,S,T applications