Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
graph LR
classDef action fill:#f9d4d4, font-weight:bold, font-size:14px
classDef structure fill:#d4f9d4, font-weight:bold, font-size:14px
classDef trees fill:#d4d4f9, font-weight:bold, font-size:14px
classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px
classDef experiments fill:#f9d4f9, font-weight:bold, font-size:14px
A[Space-Time Tree Ensemble
for Action Recognition] --> B[Recognizing actions in videos 1]
A --> C[Actions: structured body movements 2]
C --> D[Previous approaches discard
or weakly capture structure 3]
A --> E[Space-time trees model actions 4]
E --> F[Tree edges: time, space, hierarchy 5]
E --> G[Tree collections capture
variation, partial matching 6]
G --> H[Trees efficient, approximate graphs 7]
E --> I[Tree components learned from data 8]
I --> J[Action words share parameters,
reduce complexity 9]
E --> K[Tree ensemble classifies actions 10]
A --> L[Learning action words:
discriminative clustering 11]
A --> M[Learning trees: mining
frequent subtrees 12]
M --> N[Mining, clustering, ranking
for compact set 13]
A --> O[Matching allows partial
matches, uses DP 14]
A --> P[Experiments: UCF Sports, Hi5 15]
P --> Q[Outperforms bag-of-words by ~80% 16]
P --> R[Beats predefined structure
methods needing bboxes 17]
P --> S[Larger trees more discriminative 18]
A --> T[Inference matches nodes
to regions over time 19]
A --> U[Cross-dataset generalization:
Hi5 trees beat recent methods 20]
A --> V[Method discovers space-time
action trees 21]
A --> W[Tree ensemble captures rich
structure for classification 22]
A --> X[Promising results, generalization shown 23]
A --> Y[Sensitive to video
segmentation quality 24]
Y --> Z[Experiments on good
400x600 videos 25]
Y --> AA[Learning provides some
robustness to noise 26]
A --> AB[Poster: more results 27]
A --> AC[Question on segmentation
noise sensitivity 28]
AC --> AD[Uses ICSU 2013 segmentation 29]
Y --> AE[Resolution, quality impact
segmentation, learning helps 30]
class A,B,C action
class D,E,F,G,H,I,J,K,L,M,N,O,T,V,W structure
class P,Q,R,S,U,X,AB experiments
class Y,Z,AA,AC,AD,AE learning
Resume:
1.- Human action recognition: Recognizing actions in video sequences
2.- Actions as structured body movements: Spatial, temporal, hierarchical
3.- Previous approaches: Bag-of-words discards structure, space-time pyramids only weakly capture it
4.- Space-time trees model actions: Root nodes for whole body, part nodes for body parts
5.- Tree edges represent time, space, hierarchy
6.- Collection of trees per action for variation and partial matching
7.- Trees efficient to infer, collections approximate graphs
8.- Tree components (nodes, edges, weights) learned from data
9.- Action words share parameters between trees, reducing complexity
10.- Ensemble of trees used to classify actions
11.- Learning action words: Discriminative clustering of root/part space-time segments
12.- Learning trees: Hierarchical space-time segment graphs mined for frequent subtrees
13.- Tree mining, clustering, ranking to get compact set
14.- Matching tree to graph allows partial matches, uses DP
15.- Experiments on UCF Sports and Hi5 datasets
16.- Outperforms bag-of-words with same features by ~80%
17.- Beats predefined structure methods needing bbox labels
18.- Larger trees capturing more complex structures are more discriminative
19.- Inference example matches tree nodes to video regions over time
20.- Cross-dataset generalization: Hi5 trees beat recent methods on Hollywood3D without using depth
21.- Method automatically discovers space-time action trees
22.- Ensemble of trees captures rich structure for classification
23.- Promising results and cross-dataset generalization shown
24.- Model is sensitive to video segmentation quality
25.- Experiments on good 400x600 resolution videos
26.- Learning approach provides robustness to some noise
27.- Poster will demonstrate more results
28.- Question on sensitivity to segmentation noise
29.- Uses ICSU 2013 segmentation method
30.- Resolution and quality impact segmentation, but learning provides some robustness
Knowledge Vault built byDavid Vivancos 2024