Knowledge Vault 5 /39 - CVPR 2018
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef flow fill:#d4f9d4, font-weight:bold, font-size:14px classDef pwcnet fill:#d4d4f9, font-weight:bold, font-size:14px classDef costvolume fill:#f9d4d4, font-weight:bold, font-size:14px classDef performance fill:#f9f9d4, font-weight:bold, font-size:14px classDef misc fill:#f9d4f9, font-weight:bold, font-size:14px A[PWC-Net: CNNs for
Optical Flow Using
Pyramid, Warping, and
Cost Volume] --> B[Optical flow: pixel motion. 1] A --> C[FlowNet2: fast, near
state-of-art. 2] A --> D[Ideal algorithm: outperforms,
fast. 3] A --> E[Performance correlates
with model size. 4] A --> F[PWC-Net: compact,
state-of-art leveraging. 5] B --> G[Brightness constancy:
pixel retains brightness. 6] B --> H[Patch comparison
reveals true motion. 7] G --> I[Correlation invariant
to color changes. 9] G --> J[Aperture problem:
patch ambiguity issue. 11] F --> K[PWC-Net constructs
multi-resolution cost volumes. 12] F --> L[Always uses small
search range. 13] F --> M[Feature pyramid: large
receptive field. 14] K --> N[Cost volume correlates
features, not pixels. 15] K --> O[Concatenates cost volume,
uses CNN. 16] K --> P[Upsamples, rescales
flow to pyramid. 17] L --> Q[Warping aligns images
using flow. 18] L --> R[Smaller motion
in warped image. 19] L --> S[Warps features,
not images. 20] N --> T[Constructs cost volume
at every level. 21] N --> U[Pyramid, warping, cost
volume critical. 22] M --> V[Compact model:
competitive performance. 23] V --> W[Data augmentation
critical for datasets. 24] W --> X[Won Robust Vision
Challenge flow. 25] D --> Y[TVNet converts TV-L1
to CNN. 27] D --> Z[PWC-Net, TVNet encode
domain knowledge. 28] D --> AA[PWC-Net: pyramids, warping,
small range. 29] D --> AB[Cost volumes affordable
at coarse resolutions. 30] class B,G,H flow class F,K,L,M,N,O,P,Q,R,S pwcnet class T,U,V,W,X costvolume class Y,Z,AA,AB performance class I,J misc

Resume:

1.- Optical flow estimates 2D motion vectors for every pixel between frames.

2.- FlowNet2 CNN performs close to state-of-the-art, much faster.

3.- Ideal algorithm would outperform state-of-the-art while being fast.

4.- Performance correlates with model size for published CNN optical flow models.

5.- PWC-Net is compact but outperforms state-of-the-art by leveraging domain knowledge.

6.- Brightness constancy - pixel retains brightness despite position change over time.

7.- Exhaustive patch comparison between frames via normalized cross-correlation reveals true motion.

8.- Cost volume stores patch similarity for all motion vectors per pixel.

9.- Correlation has some invariance to color changes.

10.- Cost volume used for stereo (1D search) but not flow (2D search) due to computation.

11.- Aperture problem - local patch ambiguity requires careful patch size selection.

12.- PWC-Net constructs cost volumes at multiple resolutions using feature pyramids.

13.- Always uses small search range in cost volume construction.

14.- Feature pyramid has large receptive field at smallest resolution (16x8).

15.- Cost volume constructed by correlating features, not raw pixels.

16.- Concatenates cost volume with features, uses CNN to estimate flow.

17.- Upsamples and rescales flow to next pyramid level.

18.- Warping aligns second image to first using upsampled flow.

19.- Smaller motion between first and warped second image.

20.- Warps features, not raw images, to propagate information through pyramid.

21.- Constructs cost volume at each pyramid level using small search range.

22.- Feature pyramid, feature warping, cost volume at each level all contribute significantly.

23.- Compact model performs competitively with state-of-the-art.

24.- Data augmentation (no Gaussian noise, horizontal flipping) critical for small datasets.

25.- Won Robust Vision Challenge optical flow track.

26.- Code available on GitHub.

27.- TVNet converts classic TV-L1 optimization into CNN.

28.- PWC-Net and TVNet share spirit of encoding domain knowledge into network.

29.- PWC-Net principles: feature pyramids, feature warping, cost volume with small search range.

30.- Constructing cost volumes computationally affordable at coarse resolutions.

Knowledge Vault built byDavid Vivancos 2024