The End Of Knowledge - Vault 5/9 - CVPR - 2015 - Space-Time Tree Ensemble for Action Recognition

graph LR classDef action fill:#f9d4d4, font-weight:bold, font-size:14px classDef structure fill:#d4f9d4, font-weight:bold, font-size:14px classDef trees fill:#d4d4f9, font-weight:bold, font-size:14px classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px classDef experiments fill:#f9d4f9, font-weight:bold, font-size:14px A[Space-Time Tree Ensemble
for Action Recognition] --> B[Recognizing actions in videos 1] A --> C[Actions: structured body movements 2] C --> D[Previous approaches discard
or weakly capture structure 3] A --> E[Space-time trees model actions 4] E --> F[Tree edges: time, space, hierarchy 5] E --> G[Tree collections capture
variation, partial matching 6] G --> H[Trees efficient, approximate graphs 7] E --> I[Tree components learned from data 8] I --> J[Action words share parameters,
reduce complexity 9] E --> K[Tree ensemble classifies actions 10] A --> L[Learning action words:
discriminative clustering 11] A --> M[Learning trees: mining
frequent subtrees 12] M --> N[Mining, clustering, ranking
for compact set 13] A --> O[Matching allows partial
matches, uses DP 14] A --> P[Experiments: UCF Sports, Hi5 15] P --> Q[Outperforms bag-of-words by ~80% 16] P --> R[Beats predefined structure
methods needing bboxes 17] P --> S[Larger trees more discriminative 18] A --> T[Inference matches nodes
to regions over time 19] A --> U[Cross-dataset generalization:
Hi5 trees beat recent methods 20] A --> V[Method discovers space-time
action trees 21] A --> W[Tree ensemble captures rich
structure for classification 22] A --> X[Promising results, generalization shown 23] A --> Y[Sensitive to video
segmentation quality 24] Y --> Z[Experiments on good
400x600 videos 25] Y --> AA[Learning provides some
robustness to noise 26] A --> AB[Poster: more results 27] A --> AC[Question on segmentation
noise sensitivity 28] AC --> AD[Uses ICSU 2013 segmentation 29] Y --> AE[Resolution, quality impact
segmentation, learning helps 30] class A,B,C action class D,E,F,G,H,I,J,K,L,M,N,O,T,V,W structure class P,Q,R,S,U,X,AB experiments class Y,Z,AA,AC,AD,AE learning

Resume:

1.- Human action recognition: Recognizing actions in video sequences

2.- Actions as structured body movements: Spatial, temporal, hierarchical

3.- Previous approaches: Bag-of-words discards structure, space-time pyramids only weakly capture it

4.- Space-time trees model actions: Root nodes for whole body, part nodes for body parts

5.- Tree edges represent time, space, hierarchy

6.- Collection of trees per action for variation and partial matching

7.- Trees efficient to infer, collections approximate graphs

8.- Tree components (nodes, edges, weights) learned from data

9.- Action words share parameters between trees, reducing complexity

10.- Ensemble of trees used to classify actions

11.- Learning action words: Discriminative clustering of root/part space-time segments

12.- Learning trees: Hierarchical space-time segment graphs mined for frequent subtrees

13.- Tree mining, clustering, ranking to get compact set

14.- Matching tree to graph allows partial matches, uses DP

15.- Experiments on UCF Sports and Hi5 datasets

16.- Outperforms bag-of-words with same features by ~80%

17.- Beats predefined structure methods needing bbox labels

18.- Larger trees capturing more complex structures are more discriminative

19.- Inference example matches tree nodes to video regions over time

20.- Cross-dataset generalization: Hi5 trees beat recent methods on Hollywood3D without using depth

21.- Method automatically discovers space-time action trees

22.- Ensemble of trees captures rich structure for classification

23.- Promising results and cross-dataset generalization shown

24.- Model is sensitive to video segmentation quality

25.- Experiments on good 400x600 resolution videos

26.- Learning approach provides robustness to some noise

27.- Poster will demonstrate more results

28.- Question on sensitivity to segmentation noise

29.- Uses ICSU 2013 segmentation method

30.- Resolution and quality impact segmentation, but learning provides some robustness

Knowledge Vault built byDavid Vivancos 2024