Meta-Learning With Differentiable Convex Optimization

Kwonjoon Lee; Subhransu Maji; Avinash Ravichandran; Stefano Soatto

**Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d4d4, font-weight:bold, font-size:14px
classDef fewshot fill:#d4f9d4, font-weight:bold, font-size:14px
classDef metalearning fill:#d4d4f9, font-weight:bold, font-size:14px
classDef results fill:#f9f9d4, font-weight:bold, font-size:14px
A[Meta-Learning With Differentiable

Convex Optimization] --> B[Few-shot: Generalize from few samples. 1] A --> C[Meta-learning: Embeddings generalize across tasks. 3] C --> D[Process: Model, meta-loss, backpropagate, meta-learn. 4] A --> E[Linear predictors: SVM, regression in network. 5] E --> F[SVM beats nearest neighbor: Adaptive, scalable. 6] E --> G[SVM gradient: Closed-form, leveraging convexity. 7] E --> H[Dual formulation: Computational efficiency, linear combination. 8] B --> I[Prototypical nets: Average, classify by prototype. 2] A --> J[Results] J --> K[miniImageNet, tieredImageNet: MetaOpNet improves accuracy. 9] J --> L[CIFAR-FS, FC100: Similar gains, larger gaps. 10] J --> M[Meta-training shot: More shots, one-time training. 11] class A main class B,I fewshot class C,D metalearning class J,K,L,M results

Convex Optimization] --> B[Few-shot: Generalize from few samples. 1] A --> C[Meta-learning: Embeddings generalize across tasks. 3] C --> D[Process: Model, meta-loss, backpropagate, meta-learn. 4] A --> E[Linear predictors: SVM, regression in network. 5] E --> F[SVM beats nearest neighbor: Adaptive, scalable. 6] E --> G[SVM gradient: Closed-form, leveraging convexity. 7] E --> H[Dual formulation: Computational efficiency, linear combination. 8] B --> I[Prototypical nets: Average, classify by prototype. 2] A --> J[Results] J --> K[miniImageNet, tieredImageNet: MetaOpNet improves accuracy. 9] J --> L[CIFAR-FS, FC100: Similar gains, larger gaps. 10] J --> M[Meta-training shot: More shots, one-time training. 11] class A main class B,I fewshot class C,D metalearning class J,K,L,M results

**Resume: **

**1.-** Few-shot classification: Computing classification models that generalize to unseen test sets, given few training samples per category.

**2.-** Prototypical networks: Embeds training samples, computes class prototypes by averaging, classifies test examples based on nearest prototype.

**3.-** Meta-learning objective: Learning feature embeddings that generalize well across tasks when used with nearest class prototype classifier.

**4.-** Meta-learning process: Compute classification model, calculate meta loss measuring generalization error, meta-learn embedding by backpropagating error signal across tasks.

**5.-** Linear predictors (SVM, logistic/ridge regression): Proposed for computing classification model, incorporated convex optimizer in deep network to solve.

**6.-** Advantages of SVM over nearest neighbor: Adaptive (task-dependent inference-time adaptation), scalable (less overfitting with larger embeddings, outperforms nearest neighbor in high dimensions).

**7.-** Gradient computation for SVM: Obtained closed-form gradient expression for embedding network without differentiating through optimization, by leveraging convex nature of problem.

**8.-** Dual formulation: Addressed computational issues with large embedding dimensions by solving dual problem, expressing model as linear combination of training embeddings.

**9.-** Results on miniImageNet, tieredImageNet: MetaOpNet improves prototypical network accuracy by ~3% with 30-50% inference time increase. Ridge regression variant is comparable.

**10.-** Results on CIFAR-FS, FC100: Similar performance on CIFAR-FS, 3% improvement over prototypical networks on harder FC100 dataset with larger train/test class gaps.

**11.-** Influence of meta-training shot: Model performance generally increases with more meta-training shots, enabling one-time high-shot training for all meta-test scenarios.

Knowledge Vault built byDavid Vivancos 2024