Bayesian Time Series Modeling: Structured Representations for Scalability

Emily Fox

**Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:**

graph LR
classDef main fill:#f9d4d4, font-weight:bold, font-size:14px
classDef concepts fill:#d4f9d4, font-weight:bold, font-size:14px
classDef models fill:#d4d4f9, font-weight:bold, font-size:14px
classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px
classDef methods fill:#f9d4f9, font-weight:bold, font-size:14px
classDef misc fill:#d4f9f9, font-weight:bold, font-size:14px
Main[Bayesian Time Series

Modeling: Structured Representations

for Scalability] Main --> A[Time Series Concepts] A --> A1[Time series data: ubiquitous,

high-dimensional challenges 1] A --> A2[Key concepts: Gaussians, HMMs,

VAR, state-space 2] A --> A3[HMMs: discrete Markov states,

efficient inference 3] A --> A4[VARp: linear combination of

lags, noise 4] A --> A5[State-space: continuous Markov state,

linear Gaussian 5] Main --> B[Latent Factor Models] B --> B1[Latent factor models: low-rank

covariance decomposition 6] B --> B2[Dynamic factors: Markov latent,

high-dimensional projections 7] B --> B3[Evolving loadings: time-varying covariance,

scalable factors 8] Main --> C[Applications] C --> C1[MEG experiment: time-varying embeddings

outperform others 9] C --> C2[House prices: state-space, Dirichlet

process clustering 10] C --> C3[Clustering innovations: correlated prices,

improved predictions 11] C --> C4[Local indices: proposed method

enables tract-level 13] C --> C5[Seattle analysis: downtown volatile,

sparse improvement 24] Main --> D[Methods and Approaches] D --> D1[Bayesian nonparametrics: adaptive complexity,

growing clusters 12] D --> D2[Gaussian graphical models: conditional

independence, inverse 14] D --> D3[Identifiability: avoids equivalent parameterizations,

computational trade-off 15] D --> D4[Common spatial patterns: alternative

dimensionality reduction 16] D --> D5[Gaussian processes: flexible prior,

squared exponential 19] Main --> E[Results and Analysis] E --> E1[Classification: held-out words, category

prediction 20] E --> E2[Correlation maps: reveal semantic

processing structure 21] E --> E3[Changing embeddings: efficient low-dimensional

evolving covariance 22] E --> E4[Time series clustering: sharing

sparse information 23] E --> E5[Case-Shiller index: proposed method

enables tract-level 25] Main --> F[Miscellaneous] F --> F1[Break ensures covering remaining

material 17] F --> F2[Tutorial parts: relationships, scalability,

efficient inference 18] F --> F3[Bayesian approach: priors, Gaussian

process dynamics 26] F --> F4[Technical difficulties: Microsoft Surface

video playback 27] F --> F5[Zillow provided data, local

index motivation 28] F --> F6[Extensions: spatial kernels, identifiability,

non-stationary dynamics 29] F --> F7[Q&A: dictionary, stationarity, identifiability,

alternative methods 30] class Main main class A,A1,A2,A3,A4,A5 concepts class B,B1,B2,B3 models class C,C1,C2,C3,C4,C5 applications class D,D1,D2,D3,D4,D5,E,E1,E2,E3,E4,E5 methods class F,F1,F2,F3,F4,F5,F6,F7 misc

Modeling: Structured Representations

for Scalability] Main --> A[Time Series Concepts] A --> A1[Time series data: ubiquitous,

high-dimensional challenges 1] A --> A2[Key concepts: Gaussians, HMMs,

VAR, state-space 2] A --> A3[HMMs: discrete Markov states,

efficient inference 3] A --> A4[VARp: linear combination of

lags, noise 4] A --> A5[State-space: continuous Markov state,

linear Gaussian 5] Main --> B[Latent Factor Models] B --> B1[Latent factor models: low-rank

covariance decomposition 6] B --> B2[Dynamic factors: Markov latent,

high-dimensional projections 7] B --> B3[Evolving loadings: time-varying covariance,

scalable factors 8] Main --> C[Applications] C --> C1[MEG experiment: time-varying embeddings

outperform others 9] C --> C2[House prices: state-space, Dirichlet

process clustering 10] C --> C3[Clustering innovations: correlated prices,

improved predictions 11] C --> C4[Local indices: proposed method

enables tract-level 13] C --> C5[Seattle analysis: downtown volatile,

sparse improvement 24] Main --> D[Methods and Approaches] D --> D1[Bayesian nonparametrics: adaptive complexity,

growing clusters 12] D --> D2[Gaussian graphical models: conditional

independence, inverse 14] D --> D3[Identifiability: avoids equivalent parameterizations,

computational trade-off 15] D --> D4[Common spatial patterns: alternative

dimensionality reduction 16] D --> D5[Gaussian processes: flexible prior,

squared exponential 19] Main --> E[Results and Analysis] E --> E1[Classification: held-out words, category

prediction 20] E --> E2[Correlation maps: reveal semantic

processing structure 21] E --> E3[Changing embeddings: efficient low-dimensional

evolving covariance 22] E --> E4[Time series clustering: sharing

sparse information 23] E --> E5[Case-Shiller index: proposed method

enables tract-level 25] Main --> F[Miscellaneous] F --> F1[Break ensures covering remaining

material 17] F --> F2[Tutorial parts: relationships, scalability,

efficient inference 18] F --> F3[Bayesian approach: priors, Gaussian

process dynamics 26] F --> F4[Technical difficulties: Microsoft Surface

video playback 27] F --> F5[Zillow provided data, local

index motivation 28] F --> F6[Extensions: spatial kernels, identifiability,

non-stationary dynamics 29] F --> F7[Q&A: dictionary, stationarity, identifiability,

alternative methods 30] class Main main class A,A1,A2,A3,A4,A5 concepts class B,B1,B2,B3 models class C,C1,C2,C3,C4,C5 applications class D,D1,D2,D3,D4,D5,E,E1,E2,E3,E4,E5 methods class F,F1,F2,F3,F4,F5,F6,F7 misc

**Resume: **

**1.-** Time series data is everywhere, from audio features to human motion to stock indices. Modeling high-dimensional time series has many challenges.

**2.-** Key concepts reviewed: multivariate Gaussians, hidden Markov models (HMMs), vector autoregressive (VAR) processes, state-space models.

**3.-** HMMs assume an underlying discrete state sequence that is Markov. Observations are conditionally independent given the state. Allows efficient inference.

**4.-** VARp process: p-dimensional observation is linear combination of p lags plus noise. Stable if companion matrix eigenvalues < 1.

**5.-** State-space model: Continuous latent Markov state with linear Gaussian dynamics. Observations are conditionally independent given state.

**6.-** Latent factor models for IID data: Covariance has low-rank + diagonal decomposition. Assumes uncertainty lies in lower-dimensional subspace.

**7.-** Dynamic latent factor models extend to time series. Markov latent factors project to high-dimensional observations. Subclass of state-space models.

**8.-** Evolving factor loadings over time allows capturing time-varying covariance structure. Factor structure on loadings enables scalability.

**9.-** MEG experiment: Classifying brain responses to word categories. Time-varying embeddings outperform, likely due to capturing semantic processing.

**10.-** Modeling sparsely observed house prices by clustering correlated neighborhoods. Combines state-space models, Dirichlet processes for unknown number of clusters.

**11.-** Clustering on factor model innovations allows correlated but different latent price series. Improves predictions, especially for sparse series.

**12.-** Bayesian nonparametric methods like Dirichlet processes allow complexity to adapt to data. Number of clusters grows with observations.

**13.-** Industry housing indices very noisy at local level due to sparsity. Proposed method enables census-tract-level indices by sharing information.

**14.-** Gaussian graphical models capture conditional independence via sparsity in inverse covariance. More flexible than marginal independence.

**15.-** Enforcing identifiability in latent variable models avoids exploring equivalent parameterizations. Trade-off with computational complexity.

**16.-** Common spatial patterns mentioned as alternative to dimensionality reduction for brain data, but tutorial focuses on general time series.

**17.-** Break at 347 to ensure covering remaining material.

**18.-** Tutorial covers three main parts: capturing relationships in high-dimensional time series, scalable modeling, and computationally efficient inference.

**19.-** Gaussian processes provide flexible prior over latent factor evolution. Squared exponential kernel used, though may not capture expected brain dynamics.

**20.-** Classification performance assessed by holding out a subset of words and testing category prediction. Time-varying model outperforms others and chance.

**21.-** Correlation maps at different time points reveal emergence of differential structure during semantic processing window, aiding classification.

**22.-** Changing how observations are embedded over time allows efficient use of low-dimensional representation to capture evolving high-dimensional covariance.

**23.-** Clustering of time series allows sharing information when individual series are sparse, e.g. in housing price data.

**24.-** Seattle housing data analyzed with proposed method. Downtown identified as most volatile. Largest improvement in sparse census tracts.

**25.-** Case-Shiller housing index very noisy at zip code level, can't be computed at census tract level. Proposed method enables this.

**26.-** Bayesian approach taken, placing priors on all parameters. Gaussian processes used for latent factor and factor loading dynamics.

**27.-** Speaker used Microsoft Surface for presentation, causing some technical difficulties with video playback.

**28.-** Zillow, a Seattle-based housing company, provided data and motivation for the local house price index application.

**29.-** Various modeling extensions possible, e.g. alternative spatial kernels for brain data, enforcing identifiability, non-stationary latent dynamics.

**30.-** Q&A covered topics like dictionary choice for factor loadings, stationarity assumptions, identifiability, and alternative methods like common spatial patterns.

Knowledge Vault built byDavid Vivancos 2024