Knowledge Vault 6 /3 - ICML 2015
Learning Treatment Policies in Mobile Health
Susan Murphy
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef overview fill:#d4f9d4, font-weight:bold, font-size:14px classDef data fill:#d4d4f9, font-weight:bold, font-size:14px classDef trials fill:#f9f9d4, font-weight:bold, font-size:14px classDef policy fill:#f9d4f9, font-weight:bold, font-size:14px classDef challenges fill:#d4f9f9, font-weight:bold, font-size:14px Main[Learning Treatment Policies
in Mobile Health] Main --> A[Mobile Health Interventions Overview] A --> A1[Adaptive mobile health intervention
for behaviors 1] A --> A2[Studies: smoking cessation, sedentary
behavior reduction 2] A --> A3[Determine context for intrusive
interventions 3] A --> A4[Various interventions, focus on
treatment provision 7] A --> A5[Contextual smartphone activity suggestion
example 8] A --> A6[Treatments require availability of
person 9] Main --> B[Data Collection and Structure] B --> B1[Time series data: observations,
actions, responses 4] B --> B2[Decision times: regular intervals
or requests 5] B --> B3[Observations: passive sensors, minimal
self-reports 6] B --> B4[Availability enables intervention, non-availability
informative 16] Main --> C[Micro-randomized Trials] C --> C1[Micro-randomized trials assess intervention
effects 10] C --> C2[Randomization prevents confounding in
observational data 11] C --> C3[Assess intervention effect signal
over time 12] C --> C4[Enable addressing various research
questions 13] C --> C5[Main effects: difference in
proximal responses 14] C --> C6[Effects change: habituation, burden,
availability 15] Main --> D[Statistical Considerations] D --> D1[Marginal effect estimated, averaged
over context 17] D --> D2[Low-dimensional hypotheses enable smaller
trials 18] D --> D3[Within-person contrasts increase statistical
power 19] D --> D4[HeartSteps: 40 people, 80% power,
0.1 effect 20] Main --> E[Policy Learning] E --> E1[Learning when to push interventions 21] E --> E2[Current policies use domain theories 22] E --> E3[Interpretable, stochastic policies improve
engagement 23] E --> E4[Average reward aligns with
intervention goals 24] E --> E5[Bellman equation enables off-policy
learning 25] E --> E6[Estimators use reversed importance
sampling weights 26] Main --> F[Case Studies and Challenges] F --> F1[Smoking cessation study: twice-daily
interventions 27] F --> F2[Mindfulness policy based on
self-control, burden 28] F --> F3[Interventions: no self-control increase,
no burden 29] F --> F4[Open problems: missing data,
sensors, causality 30] class Main main class A,A1,A2,A3,A4,A5,A6 overview class B,B1,B2,B3,B4 data class C,C1,C2,C3,C4,C5,C6,D,D1,D2,D3,D4 trials class E,E1,E2,E3,E4,E5,E6 policy class F,F1,F2,F3,F4 challenges

Resume:

1.- Goal is to construct a continually learning mobile health intervention that helps people maintain healthy behaviors and adjusts to challenges.

2.- Two example studies: smoking cessation coach using wearables and sedentary behavior reduction for heart attack patients using smartphones.

3.- Push interventions on phone/wearables can be intrusive, so it's important to determine the right context to deliver suggestions.

4.- Mobile health studies generate time series data for each person with observations, actions (interventions), and proximal response measures.

5.- Decision times can be at regular intervals (e.g. every minute or few hours) or when the person requests support.

6.- Observations include passively collected sensor data and actively collected self-report data. Goal is to minimize self-report burden.

7.- Wide variety of intervention actions possible (cognitive, behavioral, social, etc). Focus here is on whether to provide a treatment.

8.- Example of a smartphone activity suggestion, tailored to context. Person can accept, dismiss or snooze the suggestion.

9.- Providing treatments requires the person to be available (e.g. not driving, already walking, or having turned off interventions).

10.- Micro-randomized trials randomize each individual at each decision point. Enable causal effects of pushing interventions to be assessed.

11.- Without randomization, effects of interventions are confounded with the reasons why individuals choose to access them in observational data.

12.- Goals include assessing if there is any signal that pushing interventions has an effect, how that changes over time.

13.- Also want to enable a variety of other questions to be addressed with the resulting data beyond just treatment effects.

14.- Main effects are the difference in average proximal response between available individuals who received the intervention vs not.

15.- Effects can change over time due to habituation, burden, and changing composition of people who remain available.

16.- Availability means a person can receive the intervention at that time. Non-availability can provide useful information.

17.- A marginal, population-level effect is estimated, averaged over current context. Allows a relatively simple initial analysis.

18.- Propose using low-dimensional, smooth alternative hypotheses to enable sizing the trial with fewer participants while maintaining power.

19.- Within-person contrasts of response when treated vs not increase power and reduce required sample size compared to between-person contrasts.

20.- In HeartSteps, 40 person study provides 80% power to detect 0.1 standardized effect size with 40% availability.

21.- How to use micro-randomized trial data to learn a treatment policy of when to push interventions in each context?

22.- Current approaches fully specify treatment policies using domain theories. Goal is to use data to inform the policy.

23.- Want interpretable policies that experts can vet. Stochastic policies may improve engagement by retarding habituation to messages.

24.- Average reward formulation aligns with goal of keeping people in states with lower burden to enable response to interventions.

25.- Bellman equation forms basis for off-policy learning, as expectation doesn't depend on stationary distribution induced by the policy.

26.- This enables forming estimators with reversed importance sampling weights, approximating differential value function and maximizing over policy parameters.

27.- Analysis of smoking cessation study with no sensor data and twice daily self-reports and interventions over 14 days.

28.- Estimating policy for when to provide mindfulness interventions based on self-control demands and indicated burden, despite small sample.

29.- Results suggest providing interventions most often when there's no increase in self-control demands and no indicated burden.

30.- Many open problems remain, including missing data, reducing self-report in favor of sensors, causal inference issues, and confidence intervals.

Knowledge Vault built byDavid Vivancos 2024