Knowledge Vault 5 /31 - CVPR 2018
Taskonomy: Disentangling Task Transfer Learning
Amir R. Zamir, Alexander Sax, William Shen, Leonidas J. Guibas, Jitendra Malik, and Silvio Savarese.
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef codeslam fill:#f9d4d4, font-weight:bold, font-size:14px classDef representations fill:#d4f9d4, font-weight:bold, font-size:14px classDef training fill:#d4d4f9, font-weight:bold, font-size:14px classDef autoencoder fill:#f9f9d4, font-weight:bold, font-size:14px classDef keyframe fill:#f9d4f9, font-weight:bold, font-size:14px classDef testing fill:#d4f9f9, font-weight:bold, font-size:14px classDef future fill:#f9d9d4, font-weight:bold, font-size:14px A[Taskonomy: Disentangling Task
Transfer Learning] --> B[CodeSLAM: Deep learning
SLAM system 1] A --> C[Sparse vs dense
SLAM representations 2] A --> D[Depth maps: Subspace,
structural correlation 3] A --> E[Autoencoder: Encodes
depth maps 4] A --> F[Depth prediction:
Network modulates 5] A --> G[Training: CNET dataset,
end-to-end 6] G --> H[Code size:
128 dimensions 7] G --> I[Linear decoder,
grayscale images 8] E --> J[Predicted uncertainty:
Depth discontinuities 9] E --> K[Linear decoding:
Depth function 10] E --> L[Jacobian: Constant
derivative 11] E --> M[Smooth code
perturbations 12] B --> N[Keyframe-based SLAM:
Pose, code variables 13] N --> O[Dense bundle adjustment:
Photometric error 14] N --> P[Joint optimization:
Pose, codes 15] N --> Q[Optimization results:
Reconstructions achieved 16] N --> R[Speed: 10 Hz
iterations 17] B --> S[Real-world testing:
New York dataset 18] B --> T[Visual odometry:
NYU dataset 19] T --> U[Simple system:
One optimization 20] T --> V[Zero code prior:
Robustness 21] B --> W[Future: Real data,
self-supervision 22] B --> X[Network improvements:
Architecture, structure 23] B --> Y[Demo: Preliminary
live system 24] B --> Z[Generalization: Zero-code
prediction, optimization 25] class A,B,Y codeslam class C,D representations class E,F,J,K,L,M autoencoder class G,H,I training class N,O,P,Q,R keyframe class S,T,U,V testing class W,X,Z future

Resume:

1.- Vision tasks are related, not independent (e.g. depth estimation, surface normals, object detection, room layout)

2.- Quantifying task relationships enables seeing tasks in concert, not isolation, to utilize redundancies

3.- Reducing need for labeled data is desirable, focus of research on self-supervised learning, unsupervised learning, meta-learning, domain adaptation, ImageNet features, fine-tuning

4.- Task relationships enable transfer learning - using model developed for one task to help solve another related task

5.- Intuitive example: surface normal estimation benefits more from transfer learning from image reshading task than from segmentation task

6.- Quantifying task relationships at scale allows forming complete graph to understand redundancies between tasks

7.- This enables solving set of tasks in concert while minimizing supervision by leveraging redundancies (all tasks transferred from 3 sources)

8.- Also enables solving desired novel task without much labeled data by inserting it into the task relationship structure

9.- Taskonomy: fully computational method to quantify task relationships at scale and extract unified transfer learning structure

10.- Defined set of 26 diverse vision tasks (semantic, 3D, 2D) as sample task dictionary

11.- Collected dataset of 4M real indoor images with ground truth for all 26 tasks

12.- Trained task-specific network for each of 26 tasks, freeze weights

13.- Quantify task relationships by using encoder of one task's network to train small readout network to solve another task

14.- Readout network performance on test set determines strength of directed task transfer relationship

15.- Computed 26x25 transfer functions to get complete directed graph of task relationships

16.- Normalize adjacency matrix of graph using analytic hierarchical process to account for tasks' different output spaces and numerical properties

17.- Extract optimal subgraph from normalized complete graph to maximize collective task performance while minimizing sources used

18.- Subgraph selection also handles transferring to novel tasks not in original dictionary

19.- Higher-order transfers (multiple sources transferring to one target) also included in framework

20.- Experimental results: 26 tasks, 26 task-specific networks, ~3000 transfer functions, 47,000 GPU hours, transfer training used 8-100x less data

21.- Sample computed taxonomy shows intuitive connections (3D tasks connected, semantic tasks connected), enables solving tasks with limited data for some

22.- Gain metric: measures value gained by transfer learning. Quality metric: measures how close transfer results are to task-specific networks.

23.- Live web API to compute taxonomies with custom arguments and compare to ImageNet features baseline

24.- Additional experiments: significance tests, generalization tests, sensitivity analyses, comparisons to self-supervised/unsupervised baselines

25.- Taskonomy is a step towards understanding space of vision tasks and treating tasks as structured space vs isolated concepts

26.- Provides fully computational framework and unified transfer learning model to move towards generalist perception model

27.- Taskonomy outperforms ImageNet feature transfer learning baselines

28.- Includes mechanism to handle novel tasks not in original task dictionary

29.- Can provide guidance for multi-task learning in terms of gauging similarity between tasks

30.- Optimized subgraph maximizes collective performance on all tasks while minimizing number of source tasks

Knowledge Vault built byDavid Vivancos 2024