Raquel Urtasun ICLR 2016 - Keynote - Incorporating Structure in Deep Learning

**Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:**

graph LR
classDef deeplearning fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef computervision fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef predicting fill:#d4d4f9, font-weight:bold, font-size:14px;
classDef standarddeeplearning fill:#f9f9d4, font-weight:bold, font-size:14px;
classDef multitasklearning fill:#f9d4f9, font-weight:bold, font-size:14px;
classDef markovrandomfields fill:#d4f9f9, font-weight:bold, font-size:14px;
classDef incorporatingdependencies fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef graphicalmodels fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef conditionalrandomfields fill:#d4d4f9, font-weight:bold, font-size:14px;
classDef learningcrfs fill:#f9f9d4, font-weight:bold, font-size:14px;
classDef deepcrfmodels fill:#f9d4f9, font-weight:bold, font-size:14px;
classDef experiments fill:#d4f9f9, font-weight:bold, font-size:14px;
classDef deepstructuredmodels fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef minimizingtaskloss fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef embeddings fill:#d4d4f9, font-weight:bold, font-size:14px;
A[Raquel Urtasun

ICLR 2016] --> B[Deep learning success in

various domains 1] A --> C[Computer vision & machine

learning focus 2] A --> D[Predicting statistically related

variables with deep learning 3] A --> E[Standard deep learning

for single output 4] A --> F[Multitask learning shares

parameters & specializes branches 5] A --> G[Markov random fields for

post-processing smoothness 6] A --> H[Incorporating dependencies while

learning features is desirable 7] H --> I[Graphical models encode

dependencies via energy functions 8] H --> J[Conditional random fields

model output given input 9] H --> K[Learning CRFs: empirical test

loss minimization is difficult 10] K --> L[CRF surrogate losses are

convex on parameters 11] H --> M[Deep CRF models combine

CRFs with deep learning 12] M --> N[Double-loop algorithm for

learning deep CRF models 13] N --> O[Inference approximation &

parallelization for efficiency 14] M --> P[Single-loop algorithm is

faster for general models 15] A --> Q[Experiments show joint

training improves performance 16] Q --> R[Character recognition: deep

nets + CRFs boost results 16] Q --> S[Image tagging: single-loop

converges faster 17] Q --> T[Semantic segmentation: +3%

with joint feature/CRF learning 18] Q --> U[Instance-level segmentation is

challenging but addressable 19] A --> V[Deep structured models

enable world mapping 20] A --> W[Deep structured models

applied in various domains 21] A --> X[Minimizing task loss directly

is desirable but challenging 22] X --> Y[Regularity conditions enable

convergence to correct update 23] X --> Z[Modified update rule allows

training with complex losses 24] X --> AA[Direct loss optimization

benefits shown experimentally 25] X --> AB[Direct optimization is

robust to label noise 26] A --> AC[Deep learning popular

for learning embeddings 27] AC --> AD[Prior knowledge of

relationships can be embedded 28] AC --> AE[Hierarchical relationships

can be encoded 29] AC --> AF[Embedding partial order

hierarchies is promising 30] class A,B deeplearning; class C computervision; class D predicting; class E standarddeeplearning; class F multitasklearning; class G markovrandomfields; class H incorporatingdependencies; class I,J graphicalmodels; class K,L learningcrfs; class M,N,O,P deepcrfmodels; class Q,R,S,T,U experiments; class V,W deepstructuredmodels; class X,Y,Z,AA,AB minimizingtaskloss; class AC,AD,AE,AF embeddings;

ICLR 2016] --> B[Deep learning success in

various domains 1] A --> C[Computer vision & machine

learning focus 2] A --> D[Predicting statistically related

variables with deep learning 3] A --> E[Standard deep learning

for single output 4] A --> F[Multitask learning shares

parameters & specializes branches 5] A --> G[Markov random fields for

post-processing smoothness 6] A --> H[Incorporating dependencies while

learning features is desirable 7] H --> I[Graphical models encode

dependencies via energy functions 8] H --> J[Conditional random fields

model output given input 9] H --> K[Learning CRFs: empirical test

loss minimization is difficult 10] K --> L[CRF surrogate losses are

convex on parameters 11] H --> M[Deep CRF models combine

CRFs with deep learning 12] M --> N[Double-loop algorithm for

learning deep CRF models 13] N --> O[Inference approximation &

parallelization for efficiency 14] M --> P[Single-loop algorithm is

faster for general models 15] A --> Q[Experiments show joint

training improves performance 16] Q --> R[Character recognition: deep

nets + CRFs boost results 16] Q --> S[Image tagging: single-loop

converges faster 17] Q --> T[Semantic segmentation: +3%

with joint feature/CRF learning 18] Q --> U[Instance-level segmentation is

challenging but addressable 19] A --> V[Deep structured models

enable world mapping 20] A --> W[Deep structured models

applied in various domains 21] A --> X[Minimizing task loss directly

is desirable but challenging 22] X --> Y[Regularity conditions enable

convergence to correct update 23] X --> Z[Modified update rule allows

training with complex losses 24] X --> AA[Direct loss optimization

benefits shown experimentally 25] X --> AB[Direct optimization is

robust to label noise 26] A --> AC[Deep learning popular

for learning embeddings 27] AC --> AD[Prior knowledge of

relationships can be embedded 28] AC --> AE[Hierarchical relationships

can be encoded 29] AC --> AF[Embedding partial order

hierarchies is promising 30] class A,B deeplearning; class C computervision; class D predicting; class E standarddeeplearning; class F multitasklearning; class G markovrandomfields; class H incorporatingdependencies; class I,J graphicalmodels; class K,L learningcrfs; class M,N,O,P deepcrfmodels; class Q,R,S,T,U experiments; class V,W deepstructuredmodels; class X,Y,Z,AA,AB minimizingtaskloss; class AC,AD,AE,AF embeddings;

**Resume: **

**1.-**Deep learning has had success in personal assistants, games, robotics, drones, and self-driving cars.

**2.-**Computer vision focuses on applying neural nets, while machine learning focuses on improving neural nets.

**3.-**Many problems involve predicting statistically related random variables, which deep learning can help with.

**4.-**Standard deep learning uses feedforward methods to predict a single output by minimizing a simple loss function.

**5.-**Multitask learning shares network parameters and specializes branches for different prediction types.

**6.-**Markov random fields can be used for post-processing to impose smoothness on predictions.

**7.-**Incorporating output variable dependencies while learning deep features is desirable.

**8.-**Graphical models encode dependencies between random variables using energy functions.

**9.-**Conditional random fields model the conditional distribution of outputs given inputs.

**10.-**Learning in CRFs involves minimizing empirical test loss, which is difficult, so surrogate losses are used.

**11.-**CRF surrogate losses are convex on log-linear model parameters.

**12.-**Making CRFs less shallow by combining them with deep learning is a solution.

**13.-**Learning deep CRF models involves a double-loop algorithm with inference and parameter updates.

**14.-**Inference can be approximated for efficiency, and the algorithm can be parallelized across examples and machines.

**15.-**A single-loop algorithm interleaving learning and inference is faster for general graphical models.

**16.-**Character recognition experiments show that jointly training deep nets and CRFs improves performance.

**17.-**Image tagging experiments demonstrate faster convergence with the single-loop algorithm.

**18.-**Semantic segmentation performance improves by 3% when jointly learning deep features and CRF parameters.

**19.-**Instance-level segmentation is more challenging due to permutation invariance but can be addressed with ordering heuristics.

**20.-**Building maps of the world from aerial imagery is possible with deep structured models.

**21.-**Deep structured models have been applied in various domains, with increasing popularity.

**22.-**Directly minimizing task loss during training is desirable but challenging due to non-differentiability.

**23.-**Mild regularity conditions allow convergence to the right update when minimizing task loss.

**24.-**Training with arbitrarily complicated loss functions is possible using a modified update rule.

**25.-**Experiments on average precision ranking, action classification, and object detection show benefits of direct loss optimization.

**26.-**Label noise significantly degrades performance of cross-entropy and hinge loss, but direct loss optimization is robust.

**27.-**Deep learning is popular for learning embeddings of sentences, images, and multimodal data.

**28.-**Prior knowledge of relationships between concepts can be incorporated into embedding spaces.

**29.-**Hierarchical relationships like hypernymy, entailment, and abstraction can be encoded in embeddings.

**30.-**Creating embeddings that respect partial order hierarchies is an interesting research direction.

Knowledge Vault built byDavid Vivancos 2024