The End Of Knowledge - Vault 2 - ICLR (2014-2023) - Girmaw Abebe Tadesse ICLR 2023

graph LR classDef microsoft fill:#f9d4d4, font-weight:bold, font-size:14px; classDef ai fill:#d4f9d4, font-weight:bold, font-size:14px; classDef health fill:#d4d4f9, font-weight:bold, font-size:14px; classDef data fill:#f9f9d4, font-weight:bold, font-size:14px; classDef representation fill:#f9d4f9, font-weight:bold, font-size:14px; classDef collaboration fill:#d4f9f9, font-weight:bold, font-size:14px; A[Girmaw Abebe Tadesse
ICLR 2023] --> B[Girmaw: Microsoft AI for Good
research scientist, Africa lead. 1] A --> C[Trustworthy AI needs data
understanding, deviation detection. 2] A --> D[Girmaw's research: maternal, newborn,
child health in Africa. 3] D --> E[Africa, Global South underrepresented
in AI research. 4] C --> F[Data crucial in AI,
ensures trustworthiness, interpretability. 5] F --> G[Systematic data deviations inform
quality, robustness, attacks, drift. 6] G --> H[Automated deviation identification, characterization
overcomes manual limitations. 7] G --> I[Deviation detection methods vary
in expectations, optimization, description. 8] D --> J[Demographic health surveys detect
changes, lagging subpopulations. 9] D --> K[Better Birth study: reduce
newborn deaths, checklist intervention. 10] K --> L[Deviation techniques found irregularities,
high-risk mothers. 11] K --> M[Intervention helped normal gestation,
known parity, no abortion. 12] D --> N[Dermatology representation issues:
skin diseases, types, care. 13] N --> O[Girmaw validated dermatology robustness,
out-of-distribution samples. 14] N --> P[Dermatology textbooks underrepresent
dark skin, impacts care. 15] E --> Q[Representation beyond healthcare
generative models for discovery. 16] Q --> R[Girmaw applied deviation techniques
to small molecule generation. 17] C --> S[Domain expert collaboration crucial
for validation, meaningful solutions. 18] S --> T[Domain expert training data
consideration, cyclic issues. 19] C --> U[AI Fairness 360, Robustness
toolboxes for deviation exploration. 20] C --> V[Deviation expectations depend on
task: representation, detection, effects. 21] S --> W[Domain experts set expectations
based on knowledge, practices. 22] C --> X[Deviation challenges: analysis, interpretation,
validation, expert communication. 23] X --> Y[Randomization testing validates
real deviations, not correlations. 24] C --> Z[Deviation methods consider available
data, not unobserved confounders. 25] Z --> AA[Deviation findings correlational, not
causal, may have confounders. 26] S --> AB[Early expert involvement key,
navigates expectations, shares knowledge. 27] S --> AC[Deviations uncover unknown insights,
confirm findings, foster engagement. 28] A --> AD[Researchers focus on positive
impact in AI pipelines. 29] S --> AE[Expert collaboration validates AI
pipelines, ensures meaningful outcomes. 30] class B microsoft; class C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,U,V,W,X,Y,Z,AA,AD,AE ai; class D,J,K,L,M,N,O,P health; class F,G,H,I,T,U,V,W,X,Y,Z,AA data; class E,N,O,P,Q representation; class S,T,W,AB,AC,AE collaboration;

Resume:

1.-Girmaw is a principal research scientist at Microsoft AI for Good Lab, leads efforts on the African continent.

2.-Trustworthy AI solutions require understanding data and detecting systematic deviations to unlock their full potential.

3.-Girmaw's research focuses on healthcare, specifically maternal, newborn, and child health, a critical issue in Africa.

4.-Lack of representation in AI research from Africa and the Global South can adversely affect populations.

5.-Data is crucial in AI pipelines; understanding data helps interpret model outputs and ensure trustworthiness.

6.-Systematic deviations in data can inform data quality, robustness, adversarial attacks, and temporal drift.

7.-Automated identification and characterization of systematic deviations in data help overcome manual evaluation limitations.

8.-Existing methods for detecting systematic deviations vary in setting expectations, optimizing size vs. severity, and describing subgroups.

9.-Girmaw used demographic health surveys to detect longitudinal changes and identify subpopulations lagging in health improvements.

10.-The Better Birth study aimed to reduce newborn deaths using a Safe Childbirth Checklist intervention.

11.-Systematic deviation techniques identified data collection irregularities and mothers with the highest risk of neonatal death.

12.-The intervention helped mothers with normal gestational age, known parity, and no abortion history, demonstrating heterogeneous treatment effects.

13.-Dermatology faces representation issues, as skin diseases manifest differently across skin types, affecting the quality of care.

14.-Girmaw validated robustness in dermatology datasets by detecting out-of-distribution samples from new disease conditions and environmental settings.

15.-Dermatology textbooks underrepresent images of darker skin tones, potentially impacting the quality of care for diverse populations.

16.-Representation issues extend beyond healthcare; generative models should be understood to facilitate scientific discovery effectively.

17.-Girmaw applied systematic deviation techniques to understand patterns in small molecule generation models for various applications.

18.-Collaboration with domain experts is crucial for validating findings and ensuring the developed solutions are meaningful.

19.-Data used to train domain experts should also be considered, as problematic data can lead to cyclic issues.

20.-Toolboxes like AI Fairness 360 and Robustness are available for exploring systematic deviations in data.

21.-Setting expectations for systematic deviations depends on the task, such as representation, out-of-distribution detection, or treatment effects.

22.-Domain experts can also set expectations based on their knowledge and desired deviations from day-to-day practices.

23.-Challenges in identifying systematic deviations include exploratory analysis, interpretation, validation, and communication with domain experts.

24.-Randomization testing helps validate findings as real deviations rather than spurious correlations.

25.-Systematic deviation methods focus on the available data and do not consider unobserved confounders.

26.-Findings from systematic deviations are correlational, not causal, and may be linked to unknown confounders.

27.-Involving domain experts early in the design process is key, navigating expectations and sharing knowledge.

28.-Systematic deviations can uncover insights unknown to domain experts and confirm obvious findings, fostering engagement.

29.-Researchers should focus on the positive impact they want to achieve in their AI pipelines.

30.-Collaboration with domain experts is essential for validating different AI pipelines and ensuring meaningful outcomes.

Knowledge Vault built byDavid Vivancos 2024