Knowledge Vault 2/76 - ICLR 2014-2023
Kate Saenko ICLR 2021 - Invited Talk - Is My Dataset Biased?
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef bias fill:#f9d4d4, font-weight:bold, font-size:14px; classDef adaptation fill:#d4f9d4, font-weight:bold, font-size:14px; classDef alignment fill:#d4d4f9, font-weight:bold, font-size:14px; classDef challenges fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Kate Saenko
ICLR 2021] --> B[Dataset bias: inadequate
rare situation coverage 1] A --> C[Collecting more data:
difficult, costly 2] A --> D[Domain adaptation: source
to target 3] D --> E[Source labeled, target
unlabeled, distributional difference 4] D --> F[Distribution shift causes
poor performance 5] A --> G[Domain confusion aligns
distributions unsupervised 6] A --> H[Adversarial alignment: discriminator
vs encoder 7] A --> I[Pixel-level adaptation: conditional
GANs translate style 8] I --> J[Few-shot translation: COCO-FUNIT
source to target 9] A --> K[Pixel vs feature
alignment tradeoffs 10] A --> L[Class-conditional alignment: classifier
loss aligns features 11] A --> M[Open set adaptation:
unknown target classes 12] M --> N[DANCE: clusters, aligns,
rejects unknown 13] A --> O[Cross-domain self-supervision: unsupervised
nearest neighbor alignment 14] A --> P[Office-31, VisDA, DomainNet
benchmark progress 15] B --> Q[Unbiased dataset difficult:
equal coverage hard 16] B --> R[Collection shortcuts introduce
bias vs control 17] A --> S[Adaptation techniques applicable
beyond vision 18] S --> T[NLP news data
faces domain shifts 19] A --> U[Unknown test domains
without target harder 20] A --> V[Time series: continuous
online adaptation 21] A --> W[Hyperparameter tuning challenges
without target labels 22] A --> X[Bias/shift overlap fairness
but nuanced 23] X --> Y[Fairness beyond accuracy:
equalizing errors 24] X --> Z[People decisions: carefully
evaluate target performance 25] A --> AA[Unsupervised learning progress
may help shift 26] A --> AB[Open problems: unknown
overlap, unseen generalization 27] A --> AC[VisDA 2021: universal
adaptation competition 28] A --> AD[Research: open set,
shift, fairness, multi-domain 29] A --> AE[Speaker invites discussion,
questions in chat 30] class A,B,C,Q,R bias; class D,E,F,G,H,I,J,K,L,M,N,O,P,S,T,U,V,W,AA,AB,AC adaptation; class X,Y,Z challenges; class AD,AE future;

Resume:

1.-Dataset bias is an issue where datasets do not adequately cover rare situations or have uneven representation of various attributes.

2.-Collecting more data to address bias can be very difficult and costly in practice due to exponential growth in labeling budget.

3.-Domain adaptation is the problem of adapting a model trained on a source domain to perform well on a target domain.

4.-The source domain has labeled data while the target domain is unlabeled, and there is a distributional difference between them.

5.-Poor model performance under distribution shift is caused by differences in the distribution of training and test data points.

6.-Domain confusion aligns the source and target distributions by adding an unsupervised loss to encourage similar statistics across domains.

7.-Adversarial domain alignment uses a domain discriminator network to distinguish domains while the encoder tries to confuse it, aligning distributions.

8.-Pixel-level domain adaptation uses conditional GANs to translate source images to match the style of the target domain.

9.-Few-shot pixel-space translation, like COCO-FUNIT, translates a source image to a target style given a few target examples.

10.-Pixel alignment makes adaptation effects more interpretable but can have GAN issues; feature alignment is more flexible but can fail silently.

11.-Class-conditional alignment uses the task classifier loss to align target features with source and push them from decision boundaries.

12.-Open set domain adaptation handles differing label spaces between source and target by detecting and rejecting unknown target classes.

13.-DANCE clusters target data, aligns known classes with source, and pushes unknown target classes away via entropy separation loss.

14.-Cross-domain self-supervision finds nearest neighbors between domains to align representations without labels as an unsupervised pre-training step.

15.-Datasets with multiple distributions like Office-31, VisDA, and DomainNet enable benchmarking progress on adaptation algorithms.

16.-It's difficult to have an unbiased dataset, as equal coverage of all variations and latent factors in complex data is hard.

17.-Dataset collection takes shortcuts for cost/time, like web scraping, which introduces biases compared to more controlled gathering.

18.-Some adaptation techniques like distribution alignment losses are applicable beyond vision to domains like text that face similar shifts.

19.-In NLP, news data faces domain shifts due to differences in political leaning and format of news sources.

20.-Handling unknown domains at test time without target data is much harder and requires learning to generalize from diverse training domains.

21.-For time series data, domains may change continuously over time, requiring methods that can adapt in an online fashion.

22.-Applying adaptation methods in practice faces challenges in hyperparameter tuning when target labels are unavailable to guide choices.

23.-Some overlap exists between dataset bias/shift issues and fairness issues, as both relate to performance gaps across data attributes.

24.-However, fairness involves additional considerations beyond accuracy, like equalizing error rates, that require more nuance than pure adaptation.

25.-Making decisions about people requires carefully evaluating and understanding performance on target distributions, not just naive adaptation.

26.-Recent progress in unsupervised learning could help address domain shift by learning from unlabeled data across multiple domains.

27.-Open problems remain in handling unknown class overlap between domains and generalizing to fully unseen domains without target data.

28.-A visual domain adaptation competition called VisDA will be featured at NeurIPS 2021, focused on the universal adaptation setting.

29.-Research interests include the intersection of open set recognition, domain shift, fairness, and learning from unlabeled multi-domain data.

30.-The speaker invites further discussion on these topics and will check the chat for any additional questions.

Knowledge Vault built byDavid Vivancos 2024