Knowledge Vault 2/92 - ICLR 2014-2023
Masashi Sugiyama ICLR 2023 - Invited Talk - Importance-Weighting Approach to Distribution Shift Adaptation
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef reliable fill:#f9d4d4, font-weight:bold, font-size:14px; classDef weakly fill:#d4f9d4, font-weight:bold, font-size:14px; classDef noisy fill:#d4d4f9, font-weight:bold, font-size:14px; classDef transfer fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Masashi Sugiyama
ICLR 2023] --> B[Reliable ML: challenges, improve reliability. 1] B --> C[Topics: weakly supervised, noisy labels,
transfer learning. 2] C --> D[Weakly supervised uses weak
data, not fully labeled. 3] D --> E[PU classification: positive, unlabeled
samples only. 4] D --> F[Other binary problems: PC,
UU, SD, PNU. 5] D --> G[Multi-class problems: complementary, partial,
single-class labels. 6] D --> H["Weakly Supervised Learning" book:
unified framework. 7] C --> I[Noisy label learning: train
with label noise. 8] I --> J[Loss correction by estimating
noise transition matrix. 9] I --> K[Volume minimization jointly estimates
classifier, noise matrix. 10] C --> L[Importance weight ratio estimation
without separate densities. 11] L --> M[Joint importance-predictor estimation minimizes
test risk bound. 12] L --> N[Online ensemble for continuous
covariate shift. 13] C --> O[Handle distribution shift beyond
covariate shift. 14] O --> P[Minibatch-wise approach for arbitrary
joint shift. 15] A --> Q[Future directions: combine techniques,
handle joint shift. 16] Q --> R[Practical: balance updates, robustness
to malicious data. 17] Q --> S[Estimate class prior in
PU learning. 18] Q --> T[Estimate noise matrix end-to-end
by volume minimization. 19] Q --> U[Meta-learn dynamic learning rate
for online shift. 20] Q --> V[Estimate input densities from
empirical samples. 21] Q --> W[Bridge theory and deep
learning practice gap. 22] Q --> X[Combine weakly supervised and
dynamic feature learning. 23] Q --> Y[Analyze joint representation, importance,
shift methods. 24] Q --> Z[Scale shift handling in
large language models. 25] Z --> AA[Question need for adaptation
in large models. 26] Z --> AB[Continual learning under shift
in language models. 27] Z --> AC[Limited memory approaches for
joint shift adaptation. 28] A --> AD[Overview: weakly supervised, noisy
label, transfer learning. 29] AD --> AE[Themes: risk estimation, importance,
noise matrices, algorithms. 30] class A,B,AD,AE reliable; class C,D,E,F,G,H weakly; class I,J,K noisy; class L,M,N,O,P transfer; class Q,R,S,T,U,V,W,X,Y,Z,AA,AB,AC future;

Resume:

1.-The talk focuses on reliable machine learning, addressing challenges like insufficient information, label noise, and data bias to improve system reliability.

2.-Three main topics are covered: weakly supervised learning, noisy label learning, and transfer learning, with the goal of more reliable ML.

3.-Weakly supervised classification uses weak supervision like positive and unlabeled data instead of fully labeled data, which is often too costly.

4.-Positive-Unlabeled (PU) classification trains a classifier using only positive and unlabeled samples, without any negative samples, by estimating risk functionals.

5.-Other weakly supervised binary classification problems include Positive-Confidence, Unlabeled-Unlabeled, Similar-Dissimilar, and Positive-Negative-Unlabeled classification, solvable using the same risk estimation framework.

6.-Multi-class weakly supervised problems like complementary labels, partial labels, and single-class confidence can also be addressed within the empirical risk minimization framework.

7.-The book "Weakly Supervised Learning" covers this topic in detail, providing a unified framework combining any loss function, classifier, optimizer and regularizer.

8.-Noisy label learning aims to train classifiers from data with noisy labels, which is challenging especially for input-dependent label noise.

9.-Loss correction methods based on estimating the noise transition matrix T can handle noisy labels, but T is difficult to estimate accurately.

10.-A volume minimization approach is proposed to jointly estimate the classifier and noise transition matrix T by minimizing the simplex volume.

11.-Methods are proposed for directly estimating the importance weight ratio between test and train distributions without estimating them separately.

12.-A joint importance-predictor estimation method minimizes a justifiable upper bound on the test risk, improving upon two-step importance weighting approaches.

13.-Under continuous covariate shift where the input distribution changes over time, an online ensemble approach achieves optimal dynamic regret without knowing shift speed.

14.-Reliable machine learning requires handling distribution shift beyond covariate shift, as the test domain may not be covered by the training domain.

15.-For arbitrary joint shift where both P(x) and P(y|x) change, a minibatch-wise approach dynamically estimates importance weights by loss matching.

16.-Future directions include combining joint shift adaptation with weakly supervised learning, handling continuous joint shift, and incorporating limited memory continual learning.

17.-Practical considerations include balancing frequent model updates to reflect new data with robustness to malicious data through periodic/buffered updating schemes.

18.-Estimating the class prior probability p in PU learning is challenging and requires assumptions like positive-negative separability; various estimation methods have been proposed.

19.-In noisy label learning, the noise transition matrix T can be estimated end-to-end using a volume minimization approach with simplicial constraints.

20.-Meta-learning approaches to dynamically estimate the learning rate in online learning under continuous distribution shift are a promising research direction.

21.-Marginal input densities in importance weighting methods can be estimated from empirical samples, enabling practical implementation with representation learning models.

22.-Bridging the gap between theoretical analysis and deep learning practice in reliable machine learning is an ongoing challenge and opportunity.

23.-Weakly supervised learning techniques can potentially be combined with dynamic feature learning in practice to boost robustness and performance.

24.-Analyzing combined methods that jointly learn representations, estimate importance weights, and adapt to distribution shift remains an open theoretical problem.

25.-Scaling techniques for handling distribution shift in very large language models during fine-tuning is an important research problem.

26.-The need for domain adaptation in large pre-trained models is questioned, as their generality may already suffice for many domains.

27.-Continual learning under distribution shift in large language models is a key scenario requiring techniques that avoid storing all data.

28.-Limited memory approaches for continual joint distribution shift adaptation are crucial for scalability but require further research and development efforts.

29.-The talk gives an overview of reliable machine learning research spanning weakly supervised, noisy label, and transfer learning settings.

30.-Key themes include estimating risk functionals, importance weights and noise transition matrices, aiming to provide practical algorithms with theoretical guarantees.

Knowledge Vault built byDavid Vivancos 2024