Knowledge Vault 5 /6 - CVPR 2015
Picture: A Probabilistic Programming Language for Scene Perception
Tejas D Kulkarni, Pushmeet Kohli, Joshua B Tenenbaum, Vikash Mansinghka
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef inverse fill:#f9d4d4, font-weight:bold, font-size:14px classDef challenges fill:#d4f9d4, font-weight:bold, font-size:14px classDef probabilistic fill:#d4d4f9, font-weight:bold, font-size:14px classDef variability fill:#f9f9d4, font-weight:bold, font-size:14px classDef inference fill:#f9d4f9, font-weight:bold, font-size:14px classDef helmholtz fill:#d4f9f9, font-weight:bold, font-size:14px classDef memory fill:#f9d4d4, font-weight:bold, font-size:14px classDef framework fill:#d4f9d4, font-weight:bold, font-size:14px classDef examples fill:#d4d4f9, font-weight:bold, font-size:14px classDef future fill:#f9f9d4, font-weight:bold, font-size:14px A[Picture: A Probabilistic
Programming Language for
Scene Perception] --> B[Inverse graphics: scenes from images 1] A --> C[Challenges: shape, appearance, real-time 2] A --> D[Probabilistic programming: stochastic simulators 3] D --> E[Constrained simulation: conditioning on data 4] C --> F[Shape variability: rich simulators, processes 5] C --> G[Appearance variability: cartoon models, projection 6] C --> H[Faster inference: top-down plus bottom-up 7] H --> I[Helmholtz machines: train bottom-up 8] A --> J[Sleep state: hallucinate, store data 9] A --> K[Program traces: stack with outputs 10] K --> L[Function approximators: train on traces 11] J --> M[Structured memory: project hallucinations 12] M --> N[Inference: sample traces from memory 13] N --> O[Pattern matching and reasoning: efficient 14] A --> P[Conceptual framework: language, renderer, representation 15] A --> Q[Human pose: 3D mesh example 16] Q --> R[Discrete and continuous: DPM plus top-down 17] A --> S[3D shape: flexible mesh program 18] A --> T[Intrinsic images: Baron and Honek comparison 19] A --> U[Future work: simulators, deep learning 20] U --> V[AD engines: mixed CPU/GPU support 21] U --> W[Deep integration: CNN encoders, program decoders 22] U --> X[Torch or Caffe: close integration 23] U --> Y[Learning beyond parameters: programs, subroutines 24] U --> Z[Exciting direction: neural nets, programs 25] class B,C,H,I,N,O inverse class C challenges class D,E probabilistic class F,G variability class H,N,O inference class I helmholtz class J,K,L,M memory class P framework class Q,R,S,T examples class U,V,W,X,Y,Z future

Resume:

1.- Inverse graphics: Analyzing scenes by inverting the graphics rendering process, going from images to scene descriptions.

2.- Challenges: Shape variability, appearance variability, and real-time inference are the main challenges in inverse graphics.

3.- Probabilistic programming: Adding random variables to graphics programs to create stochastic scene simulators for inference.

4.- Constrained simulation: Running probabilistic programs conditioned on test data to infer scene properties.

5.- Handling shape variability: Using rich forward graphics simulators and non-parametric statistical processes over 3D meshes.

6.- Appearance variability: Building cartoon models and projecting them with real data using deep networks for abstract comparison.

7.- Faster inference: Combining top-down inference methods with fast bottom-up recognition proposals.

8.- Helmholtz machines: Using top-down knowledge to train bottom-up discriminative pipelines, as proposed by Hinton et al.

9.- Sleep state: Hallucinating data from the probabilistic program and storing it in an external long-term memory.

10.- Program traces: Running a program once to get a stack with all variables and corresponding outputs.

11.- Function approximators: Training neural networks with partial values from program traces as targets.

12.- Structured long-term memory: Projecting hallucinated data using learned approximators to create a semantically structured memory.

13.- Inference with Helmholtz proposals: Sampling program traces from the structured memory region corresponding to a test image.

14.- Pattern matching and reasoning: Doing 90% pattern matching from memory and 10% reasoning for efficient inference.

15.- Conceptual framework: Scene language model, approximate renderer, representation layer, and score function.

16.- Human body pose example: Using an off-the-shelf 3D mesh in Blender for pose estimation.

17.- Combining discrete and continuous models: Integrating DPM pose models with top-down inference for improved results.

18.- 3D shape program example: Writing a flexible program to define a distribution over 3D meshes.

19.- Comparing with intrinsic images: Running Baron and Honek's intrinsic image method for comparison on a test table.

20.- Future work: Building a library of rich forward simulators and integrating with deep learning frameworks.

21.- Automatic differentiation engines: Developing fast AD engines with mixed CPU and GPU support for probabilistic programs.

22.- Deep integration of programs and neural networks: Training end-to-end systems with CNN encoders and differentiable probabilistic program decoders.

23.- Torch or Caffe integration: Implementing the proposed approach with close integration to deep learning frameworks.

24.- Learning beyond parameters: Exploring learning in the space of programs or subroutines.

25.- Exciting research direction: Developing models that combine deep neural networks with differentiable probabilistic programs.

Knowledge Vault built byDavid Vivancos 2024