Knowledge Vault 2/25 - ICLR 2014-2023
Raquel Urtasun ICLR 2016 - Keynote - Incorporating Structure in Deep Learning
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef deeplearning fill:#f9d4d4, font-weight:bold, font-size:14px; classDef computervision fill:#d4f9d4, font-weight:bold, font-size:14px; classDef predicting fill:#d4d4f9, font-weight:bold, font-size:14px; classDef standarddeeplearning fill:#f9f9d4, font-weight:bold, font-size:14px; classDef multitasklearning fill:#f9d4f9, font-weight:bold, font-size:14px; classDef markovrandomfields fill:#d4f9f9, font-weight:bold, font-size:14px; classDef incorporatingdependencies fill:#f9d4d4, font-weight:bold, font-size:14px; classDef graphicalmodels fill:#d4f9d4, font-weight:bold, font-size:14px; classDef conditionalrandomfields fill:#d4d4f9, font-weight:bold, font-size:14px; classDef learningcrfs fill:#f9f9d4, font-weight:bold, font-size:14px; classDef deepcrfmodels fill:#f9d4f9, font-weight:bold, font-size:14px; classDef experiments fill:#d4f9f9, font-weight:bold, font-size:14px; classDef deepstructuredmodels fill:#f9d4d4, font-weight:bold, font-size:14px; classDef minimizingtaskloss fill:#d4f9d4, font-weight:bold, font-size:14px; classDef embeddings fill:#d4d4f9, font-weight:bold, font-size:14px; A[Raquel Urtasun
ICLR 2016] --> B[Deep learning success in
various domains 1] A --> C[Computer vision & machine
learning focus 2] A --> D[Predicting statistically related
variables with deep learning 3] A --> E[Standard deep learning
for single output 4] A --> F[Multitask learning shares
parameters & specializes branches 5] A --> G[Markov random fields for
post-processing smoothness 6] A --> H[Incorporating dependencies while
learning features is desirable 7] H --> I[Graphical models encode
dependencies via energy functions 8] H --> J[Conditional random fields
model output given input 9] H --> K[Learning CRFs: empirical test
loss minimization is difficult 10] K --> L[CRF surrogate losses are
convex on parameters 11] H --> M[Deep CRF models combine
CRFs with deep learning 12] M --> N[Double-loop algorithm for
learning deep CRF models 13] N --> O[Inference approximation &
parallelization for efficiency 14] M --> P[Single-loop algorithm is
faster for general models 15] A --> Q[Experiments show joint
training improves performance 16] Q --> R[Character recognition: deep
nets + CRFs boost results 16] Q --> S[Image tagging: single-loop
converges faster 17] Q --> T[Semantic segmentation: +3%
with joint feature/CRF learning 18] Q --> U[Instance-level segmentation is
challenging but addressable 19] A --> V[Deep structured models
enable world mapping 20] A --> W[Deep structured models
applied in various domains 21] A --> X[Minimizing task loss directly
is desirable but challenging 22] X --> Y[Regularity conditions enable
convergence to correct update 23] X --> Z[Modified update rule allows
training with complex losses 24] X --> AA[Direct loss optimization
benefits shown experimentally 25] X --> AB[Direct optimization is
robust to label noise 26] A --> AC[Deep learning popular
for learning embeddings 27] AC --> AD[Prior knowledge of
relationships can be embedded 28] AC --> AE[Hierarchical relationships
can be encoded 29] AC --> AF[Embedding partial order
hierarchies is promising 30] class A,B deeplearning; class C computervision; class D predicting; class E standarddeeplearning; class F multitasklearning; class G markovrandomfields; class H incorporatingdependencies; class I,J graphicalmodels; class K,L learningcrfs; class M,N,O,P deepcrfmodels; class Q,R,S,T,U experiments; class V,W deepstructuredmodels; class X,Y,Z,AA,AB minimizingtaskloss; class AC,AD,AE,AF embeddings;

Resume:

1.-Deep learning has had success in personal assistants, games, robotics, drones, and self-driving cars.

2.-Computer vision focuses on applying neural nets, while machine learning focuses on improving neural nets.

3.-Many problems involve predicting statistically related random variables, which deep learning can help with.

4.-Standard deep learning uses feedforward methods to predict a single output by minimizing a simple loss function.

5.-Multitask learning shares network parameters and specializes branches for different prediction types.

6.-Markov random fields can be used for post-processing to impose smoothness on predictions.

7.-Incorporating output variable dependencies while learning deep features is desirable.

8.-Graphical models encode dependencies between random variables using energy functions.

9.-Conditional random fields model the conditional distribution of outputs given inputs.

10.-Learning in CRFs involves minimizing empirical test loss, which is difficult, so surrogate losses are used.

11.-CRF surrogate losses are convex on log-linear model parameters.

12.-Making CRFs less shallow by combining them with deep learning is a solution.

13.-Learning deep CRF models involves a double-loop algorithm with inference and parameter updates.

14.-Inference can be approximated for efficiency, and the algorithm can be parallelized across examples and machines.

15.-A single-loop algorithm interleaving learning and inference is faster for general graphical models.

16.-Character recognition experiments show that jointly training deep nets and CRFs improves performance.

17.-Image tagging experiments demonstrate faster convergence with the single-loop algorithm.

18.-Semantic segmentation performance improves by 3% when jointly learning deep features and CRF parameters.

19.-Instance-level segmentation is more challenging due to permutation invariance but can be addressed with ordering heuristics.

20.-Building maps of the world from aerial imagery is possible with deep structured models.

21.-Deep structured models have been applied in various domains, with increasing popularity.

22.-Directly minimizing task loss during training is desirable but challenging due to non-differentiability.

23.-Mild regularity conditions allow convergence to the right update when minimizing task loss.

24.-Training with arbitrarily complicated loss functions is possible using a modified update rule.

25.-Experiments on average precision ranking, action classification, and object detection show benefits of direct loss optimization.

26.-Label noise significantly degrades performance of cross-entropy and hinge loss, but direct loss optimization is robust.

27.-Deep learning is popular for learning embeddings of sentences, images, and multimodal data.

28.-Prior knowledge of relationships between concepts can be incorporated into embedding spaces.

29.-Hierarchical relationships like hypernymy, entailment, and abstraction can be encoded in embeddings.

30.-Creating embeddings that respect partial order hierarchies is an interesting research direction.

Knowledge Vault built byDavid Vivancos 2024