Knowledge Vault 6 /57 - ICML 2020
Learning despite the unknown - missing data imputation in healthcare
Mihaela van der Schaar
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d4f9, font-weight:bold, font-size:14px classDef challenges fill:#f9d4d4, font-weight:bold, font-size:14px classDef methods fill:#d4f9d4, font-weight:bold, font-size:14px classDef models fill:#d4d4f9, font-weight:bold, font-size:14px classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#d4f9f9, font-weight:bold, font-size:14px Main[Learning despite the
unknown - missing
data imputation in
healthcare] --> A[Challenges in Healthcare ML] Main --> B[Data Handling Methods] Main --> C[ML Models and Techniques] Main --> D[Applications and Systems] Main --> E[Future Directions] A --> A1[Healthcare ML: complex, ill-defined, hard
verifying 1] A --> A2[Missing data imputation crucial for
AutoML 3] A --> A3[Clinical judgments shape missing data
patterns 10] A --> A4[Inference requires change point detection,
MLE 14] A --> A5[Imputation handles mixed data types 28] A --> A6[Bidirectional RNNs unsuitable for clinical
predictions 29] B --> B1[GAIN: effective multiple imputations without
complete data 4] B --> B2[GAIN generalizes GANs with discriminator
hints 5] B --> B3[GAIN outperforms as missing rates
increase 6] B --> B4[MRNN: interpolation, imputation for time
series 7] B --> B5[MRNN adapts bidirectional RNNs to
causal 8] B --> B6[MRNN outperforms baselines in various
datasets 9] C --> C1[Semi-Markov model captures patient trajectories 11] C --> C2[Hawkes processes model clinician sampling
behavior 12] C --> C3[Gaussian processes model irregular vital
signs 13] C --> C4[Real-time inference uses forward filtering,
programming 15] C --> C5[Informative sampling improves deterioration predictions 16] C --> C6[Model discovers states, provides interpretability 17] D --> D1[AutoML adapts models to changing
situations 2] D --> D2[Active information collection determines screening
strategies 18] D --> D3[Deep Sensing explores cost-performance trade-offs 19] D --> D4[Clairvoyance: unified pipeline for personalized
predictions 20] D --> D5[Autoprognosis builds entire ML pipelines 25] D --> D6[ML system forecasts COVID-19 risks,
resources 27] E --> E1[ML revolutionizes healthcare with precision
medicine 21] E --> E2[Augment, not replace, medical personnel 22] E --> E3[EHR data improves clinical practice,
research 23] E --> E4[Causal discovery improves drug development 24] E --> E5[COVID-19 requires ML for clinical
decisions 26] E --> E6[Clinical information value requires adaptive
learning 30] class Main main class A,A1,A2,A3,A4,A5,A6 challenges class B,B1,B2,B3,B4,B5,B6 methods class C,C1,C2,C3,C4,C5,C6 models class D,D1,D2,D3,D4,D5,D6 applications class E,E1,E2,E3,E4,E5,E6 future

Resume:

1.- Machine learning for healthcare is complex due to ill-defined problems and solutions that are hard to verify.

2.- Automated machine learning enables crafting models for various diseases and needs, adapting to changing situations.

3.- Missing data imputation is crucial in clinical datasets for effective automated machine learning.

4.- GAIN (Generative Adversarial Imputation Nets) performs effective multiple imputations even when complete data is unavailable.

5.- GAIN generalizes GANs by providing hints to the discriminator about which data is real and which is imputed.

6.- GAIN outperforms other imputation methods, especially as missing rates increase and in complex missing data scenarios.

7.- Multidirectional Recurrent Neural Networks (MRNN) perform both interpolation and imputation for time series data in clinical settings.

8.- MRNN adapts bidirectional RNNs to be causal, learning from current and past data without using future information.

9.- MRNN outperforms state-of-the-art baselines in various datasets with different dimensions, missing data amounts, and sampling rates.

10.- Clinical judgments shape missing data patterns, which can be learned from to improve predictions.

11.- A probabilistic model using a semi-Markov process can capture patient trajectories and informative sampling in clinical settings.

12.- Hawkes point processes model clinicians' sampling behavior, capturing the impact of patient health on observation frequency.

13.- Switching multitask Gaussian processes model temporal correlations in irregularly sampled vital signs and lab tests.

14.- Inference and learning in doubly stochastic models require change point detection and maximum likelihood estimation techniques.

15.- Real-time inference can be performed using forward filtering and dynamic programming.

16.- Learning from informatively sampled data improves performance in predicting patient deterioration compared to traditional risk scores.

17.- The probabilistic model enables discovery of distinct clinical states and provides model interpretability.

18.- Active collection of information determines who to screen, when to screen, and what information to acquire.

19.- Deep Sensing learns the value of information by exploring different cost-performance trade-offs through deliberate missingness.

20.- Clairvoyance is a unified end-to-end pipeline for personalized prediction, treatment planning, and monitoring in longitudinal settings.

21.- Machine learning can revolutionize healthcare by delivering precision medicine and improving clinical pathways.

22.- The vision is to augment clinicians and medical personnel rather than replace them.

23.- Electronic health record data can be used to improve clinical practice and research.

24.- Causal discovery informed by various data sources can lead to better drug development.

25.- Autoprognosis builds entire pipelines including missing data imputation, feature processing, classification, and calibration.

26.- COVID-19 presents complex challenges requiring machine learning to assist in difficult clinical decisions.

27.- A machine learning system for COVID-19 forecasts personalized risks and resource requirements at hospital and national levels.

28.- Imputation methods need to handle mixed data types (categorical and continuous) in clinical datasets.

29.- Bidirectional RNNs, while effective in some domains, are not causal and thus unsuitable for clinical predictions.

30.- The value of information in clinical settings is unknown and dynamically changing, requiring adaptive learning approaches.

Knowledge Vault built byDavid Vivancos 2024