The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef speaker fill:#f9d4d4, font-weight:bold, font-size:14px; classDef nlp fill:#d4f9d4, font-weight:bold, font-size:14px; classDef medical fill:#d4d4f9, font-weight:bold, font-size:14px; classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px; classDef vision fill:#f9d4f9, font-weight:bold, font-size:14px; A[Regina Barzilay
ICLR 2017 ] --> B[Speaker: ICLEAR NLP papers
limited topics, ample data. 1] A --> C[Speaker: Lacking data,
wanting good performance. 2] C --> D[Medical domains common issue. 3] D --> E[Speaker's breast cancer experience:
ML underutilized despite data. 4] A --> F[Trials: small, biased subset
of oncology decisions. 5] A --> G[Speaker compelled to work
on impactful NLP problems. 6] G --> H[Medical records info extraction
challenges without labeled data. 7] H --> I[Rule-based medical systems
due to lacking data. 8] A --> J[Goal: High accuracy with
limited doctor supervision. 9] J --> K[Approach: Task-specific encodings
using keyword supervision. 10] K --> L[Adversarial training aligns
encodings, single classifier. 11] L --> M[Experiments: Rivaled in-domain data,
outperformed baselines. 12] A --> N[MGH system extracts
breast cancer attributes. 13] N --> O[Interpretability key for
doctor trust, utilization. 14] O --> P[Goal: Extractive rationales
without annotations. 15] P --> Q[Beer review experiments:
Maintained accuracy, outperformed. 16] A --> R[MGH doctors use
rationale system. 17] A --> S[Beyond NLP: Analyze
raw measurements, images. 18] S --> T[Mammograms to detect
early cancer signs. 19] S --> U[Match radiologist ability
to predict BI-RADS. 20] U --> V[Breast density prediction
92% accuracy, solved. 21] U --> W[BI-RADS very challenging,
below radiologist accuracy. 22] W --> X[NYU study corroborated
difficulty despite data. 23] A --> Y[Abnormality annotation improved
to 0.85 AUC. 24] A --> Z[Vision model needs
improvement before reports. 25] A --> AA[Speaker requests ideas
to predict risk. 26] A --> AB[Speaker motivated by
experience, impactful problems. 27] A --> AC[Audience can tackle
low-resource challenges. 28] class A,B,C,AB,AC speaker; class D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R nlp; class S,T,U,V,W,X,Y,Z,AA vision; class D,E,F,N,T,U,V,W,X,Y medical; class J,K,L,M learning;

Resume:

1.-Speaker was surprised by many interesting NLP papers at ICLEAR conference, but most focused on limited topics with ample training data.

2.-Speaker wondered what to do when lacking training data and still wanting good performance, a common issue in medical domains.

3.-Speaker's experience with breast cancer and its treatment revealed machine learning wasn't being utilized much despite ample data.

4.-Only 3% of oncology decisions in the U.S. are based on patients in clinical trials, a small and biased subset.

5.-After treatment, speaker felt compelled to work on impactful problems where NLP could make a difference, despite limited training data.

6.-Collaborating with doctors revealed challenges in extracting information from medical records when lacking labeled data for every condition.

7.-Many medical information extraction systems remain rule-based today due to lack of labeled training data.

8.-Goal became achieving high accuracy with limited supervision doctors could provide in minutes, by transferring knowledge between related tasks.

9.-Approach: Generate task-specific encodings of medical records using limited keyword supervision indicating sentence relevance to the condition.

10.-Adversarial training aligns encodings between source and target tasks to enable a single classifier, along with reconstruction to preserve context.

11.-Experiments showed approach rivaled using in-domain data and outperformed baselines, discovering transferable representations of sentiment across domains.

12.-System is now implemented and used at Massachusetts General Hospital (MGH) to extract breast cancer attributes from pathology reports.

13.-Adding interpretability is key for doctors to trust and utilize machine learning predictions in medical settings.

14.-Goal: Learn to provide extractive rationales for classifications, without rationale annotations, by jointly training generator and predictor.

15.-Beer review experiments showed extractive rationales maintained accuracy while aligning well with human rationales and outperforming baselines.

16.-Rationale system is used by doctors at MGH to quickly see classification explanations, make corrections, and retrain the model.

17.-Speaker realized biggest impact requires going beyond NLP to analyze raw measurements like mammograms to potentially detect early cancer signs.

18.-Studies show potential to identify tissue change patterns in mammograms before cancer appears, possibly preventing cancer with chemoprevention.

19.-First step: Match human radiologist ability to predict breast density and cancer risk (BI-RADS score) from mammograms.

20.-Breast density prediction from mammograms achieved 92% accuracy versus 86% human agreement, a solved task.

21.-BI-RADS score prediction to identify 1% of women needing re-examination was very challenging, with accuracy far below radiologists.

22.-Poor results were corroborated by a recent NYU study, suggesting a very difficult task despite large datasets.

23.-Annotating abnormality locations in a thousand mammograms improved performance to 0.85 AUC, better but still not great.

24.-Speaker hoped to use radiology reports to guide the vision model to abnormalities, but vision model needs improvement first.

25.-Speaker requests ideas from the audience on improving mammogram interpretation to predict cancer risk.

26.-Speaker was motivated by personal experience to work on impactful problems without much labeled data.

27.-Speaker says audience doesn't need to go through a similar ordeal to be motivated to tackle low-resource challenges.

Knowledge Vault built byDavid Vivancos 2024