The End Of Knowledge - Vault 6/7 - CVPR - 2015 - Online Time Series Prediction with Missing Data

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef timeseries fill:#d4f9d4, font-weight:bold, font-size:14px classDef model fill:#d4d4f9, font-weight:bold, font-size:14px classDef approach fill:#f9f9d4, font-weight:bold, font-size:14px classDef results fill:#f9d4f9, font-weight:bold, font-size:14px Main[Online Time Series
Prediction with Missing
Data] Main --> A[Time Series and Models] A --> A1[Time series: uniform interval
signal observations 1] A --> A2[AR model: noisy linear combination
of previous observations 2] A --> A3[Missing data common
in real-life series 3] A --> A4[Existing models: strict
statistical assumptions 4] Main --> B[Approach and Challenges] B --> B1[Goal: relax assumptions
on missing data 5] B --> B2[Setting: adversary chooses signal,
reveals selectively 6] B --> B3[Regret: player vs
best fixed predictor 7] B --> B4[Challenge: AR undefined
for missing data 8] Main --> C[Solution and Implementation] C --> C1[Solution: replace missing
with recursive predictions 9] C --> C2[AR1 example: observe
or predict recursively 10] C --> C3[Non-linearity: recursive prediction
complicates coefficient learning 11] C --> C4[Non-proper learning: richer model,
compete against AR 12] Main --> D[Prediction and Optimization] D --> D1[Expanded prediction: weight vector,
feature vector 13] D --> D2[Advantage: enables online
convex optimization 14] D --> D3[Result: OsqrtT regret
bound achieved 15] D --> D4[Issues: exponential factors in D 16] Main --> E[Computational Aspects] E --> E1[Partial solution: efficient
inner product computation 17] E --> E2[Inner product: common
good paths coefficients 18] E --> E3[Efficient counting: common
missing points exponent 19] Main --> F[Results and Conclusions] F --> F1[Experiments: synthetic data,
model comparisons 20] F --> F2[Stochastic: similar performance,
online more robust 21] F --> F3[Heteroscedastic: online approach
shows robustness 22] F --> F4[Conclusions: prediction without
stochastic assumptions 23] F --> F5[Open questions: reducing exponential,
multivariate generalization 24] F --> F6[Key aspects: online learning,
recursive prediction 25] class Main main class A,A1,A2,A3,A4 timeseries class B,B1,B2,B3,B4 model class C,C1,C2,C3,C4,D,D1,D2,D3,D4,E,E1,E2,E3 approach class F,F1,F2,F3,F4,F5,F6 results

Resume:

1.- Time series definition: sequence of signal observations measured at uniform intervals, with examples from finance, weather, and medicine.

2.- Autoregressive (AR) model: each observation is a noisy linear combination of previous observations, with Gaussian noise. Focus is on AR(p) model.

3.- Motivation: real-life time series often have missing data due to mistakes or equipment issues. Existing models make strict statistical assumptions.

4.- Limitations of existing approaches: assume data follows AR model with Gaussian noise and missing data is random. Misspecification leads to suboptimal results.

5.- Goal: relax or remove statistical assumptions on time series with missing data.

6.- Setting: adversary chooses signal value arbitrarily at each time point and decides whether to reveal it. Player predicts and suffers loss.

7.- Regret criterion: compares player's cumulative loss to loss of best fixed predictor in hindsight. Standard in online learning.

8.- Challenge: AR prediction not well-defined for missing data since previous missing values are needed for prediction.

9.- Solution: replace missing data with predictions of missing data, using recursive AR predictor up to finite number of steps back (D).

10.- Example: For AR(1) with D=2, prediction uses observed value if available, else uses previous prediction recursively.

11.- Non-linearity issue: recursive prediction is non-linear in AR coefficients, preventing learning of optimal coefficients using standard online convex optimization techniques.

12.- Non-proper learning: generate predictions from a richer model and compete against best AR predictor, without directly learning AR coefficients.

13.- Expanded prediction: use weight vector W of dimension 2^D and feature vector phi, giving a much richer linear-in-W prediction.

14.- Advantage of expanded prediction: problem becomes linear in W, enabling online convex optimization. Prediction is richer than original AR.

15.- Main result: regret bound of O(sqrt(T)) w.r.t. best fixed recursive AR predictor in hindsight, using standard online learning algorithms.

16.- Issues: constant factor exponential in D, and time/space complexity also exponential in D due to maintaining/updating high-dimensional W.

17.- Partial solution: efficient inner product computation between feature vectors phi enables avoiding explicit maintenance of W.

18.- Inner product computation: coefficients depend on number of "common good paths" - paths through missing data points.

19.- Efficient path counting: number of common good paths of length k equals 2^(num. common missing points), enabling efficient computation.

20.- Experimental results: comparisons on synthetic data capture merits of different models. Real data experiments in paper.

21.- Stochastic data: all approaches perform similarly when data and missingness are stochastic. Online methods more robust to AR coefficient changes.

22.- Heteroscedastic noise: baselines not robust to non-Gaussian noise, but online approach is robust. Online algorithms faster and simpler to implement.

23.- Conclusions: new approach for time series prediction without requiring stochastic assumptions. Theoretical guarantees and good practical performance.

24.- Open questions: reducing exponential constant to polynomial in D (some progress in follow-up work), and generalizing to multivariate case.

25.- Key aspects: online learning framework, recursive prediction for missing data, non-proper learning, efficient path counting, robustness to model misspecification.

Knowledge Vault built byDavid Vivancos 2024