The End Of Knowledge - Vault 5/59 - CVPR - 2020 - DeepCap: Monocular Human Performance Capture Using Weak Supervision

graph LR classDef deepcap fill:#f9d4d4, font-weight:bold, font-size:14px classDef networks fill:#d4f9d4, font-weight:bold, font-size:14px classDef losses fill:#d4d4f9, font-weight:bold, font-size:14px classDef results fill:#f9f9d4, font-weight:bold, font-size:14px A[DeepCap: Monocular Human
Performance Capture Using
Weak Supervision] --> B[Monocular human performance capture
using RGB camera. 1] A --> C[Captures pose, clothing deformation
for realistic characters. 2] A --> D[Weakly supervised training avoids
complex data processing. 3] A --> E[Monocular setting: challenging due to
depth ambiguities, high dimensions. 4] A --> F[Previous work: template-free, parametric,
template-based methods. 5] A --> G[DeepCAP: personalized 3D template with
embedded graph, skeleton. 6] G --> H[PoseNet regresses skeleton pose. 7] G --> I[DepthNet regresses surface deformation
in canonical pose. 7] A --> J[Networks weakly supervised with
multi-view 2D joints, masks. 8] J --> K[Differentiable 3D to 2D modules
required for loss. 9] A --> L[Training: multi-camera green screen studio. 10] H --> M[PoseNet: 3D landmarks in camera,
root-relative space. 11] H --> N[Global alignment layer computes,
applies rotation, translation. 12] H --> O[Multi-view sparse keypoint loss:
3D landmarks project to 2D. 13] I --> P[DepNet regresses embedded graph
rotations, translations. 14] I --> Q[Deformation layer combines pose,
deformation via embedded deformation,
dual quaternion skinning. 15] I --> R[Global alignment layer applied for
global vertices, landmarks. 16] I --> S[Multi-view sparse keypoint loss for
posed, deformed markers. 17] I --> T[Silhouette loss: model matches
image for dense supervision. 18] A --> U[DeepCAP vs LifeCap: better 3D pose,
plausible invisible deformation. 19] A --> V[DeepCAP vs implicit surface methods:
consistent geometry, no missing limbs. 20] A --> W[Multi-view IoU measures surface
reconstruction accuracy. 21] A --> X[DeepCAP outperforms previous work:
accounts for clothing deformation,
consistent 3D cloth prediction. 22] A --> Y[Weakly supervised training: multi-view
2D joints, foreground masks. 23] A --> Z[Differentiable 3D to 2D modules
enable training loss. 24] A --> AA[Personalized 3D template with embedded
graph, skeleton for regression. 25] A --> AB[PoseNet, DepthNet: main DeepCAP networks. 26] A --> AC[Global alignment layer for global
3D landmark loss computation. 27] A --> AD[Deformation layer: combines regressed
pose, deformation via embedded deformation,
dual quaternion skinning. 28] A --> AE[Silhouette, multi-view keypoint losses:
dense, sparse supervision. 29] A --> AF[DeepCAP: realistic capture, consistent
geometry, clothing from single RGB. 30] class A,B,C,D,E,F,G,AA,AF deepcap class H,I,M,N,P,Q,R,AB networks class J,K,O,S,T,Y,Z,AE losses class U,V,W,X results

Resume:

1.- DeepCAP: monocular human performance capture approach using a single RGB camera.

2.- Captures pose and clothing deformation for realistic virtual characters.

3.- Weakly supervised training prevents complicated data processing.

4.- Monocular setting is challenging due to depth ambiguities and high-dimensional problem.

5.- Previous work: template-free, parametric body models, and template-based methods.

6.- DeepCAP uses a personalized 3D template mesh with an embedded graph and skeleton.

7.- PoseNet regresses the skeleton pose, and DepthNet regresses surface deformation in canonical pose.

8.- Networks are weakly supervised with multi-view 2D joint detections and foreground masks.

9.- Differentiable 3D to 2D modules are required for loss evaluation.

10.- Capture setup: multi-camera green screen studio for training.

11.- PoseNet outputs 3D landmark positions in camera and root-relative space.

12.- Global alignment layer computes and applies rotation and translation for global 3D landmark positions.

13.- Multi-view sparse keypoint loss ensures 3D landmarks project onto 2D joint detections.

14.- DepNet regresses per-node rotation angles and translations of the embedded graph.

15.- Deformation layer combines regressed pose and deformation using embedded deformation and dual quaternion skinning.

16.- Global alignment layer is applied to obtain global vertices and landmarks.

17.- Multi-view sparse keypoint loss is applied for posed and deformed markers.

18.- Silhouette loss enforces model silhouette to match image silhouette for dense vertex supervision.

19.- Comparison to LifeCap shows improved 3D pose and plausible deformation of invisible surfaces.

20.- Comparison to implicit surface methods demonstrates consistent geometry over time and avoids missing limbs.

21.- Multi-view intersection over union measures surface reconstruction accuracy.

22.- DeepCAP outperforms previous work by accounting for clothing deformations and consistent 3D cloth deformation prediction.

23.- Weakly supervised training using multi-view 2D joint detections and foreground masks.

24.- Differentiable 3D to 2D modules enable loss evaluation during training.

25.- Personalized 3D template mesh with embedded graph and skeleton is used for pose and deformation regression.

26.- PoseNet and DepthNet are the two main networks in the DeepCAP approach.

27.- Global alignment layer ensures global 3D landmark positions for loss computation.

28.- Deformation layer combines regressed pose and deformation using embedded deformation and dual quaternion skinning.

29.- Silhouette loss and multi-view sparse keypoint loss provide dense and sparse supervision, respectively.

30.- DeepCAP achieves realistic human performance capture with consistent geometry and clothing deformation from a single RGB camera.

Knowledge Vault built byDavid Vivancos 2024