The End Of Knowledge - Vault 5/57 - CVPR - 2020 - Disentangled image generation through structured noise injection

graph LR classDef noise fill:#f9d4d4, font-weight:bold, font-size:14px classDef editing fill:#d4f9d4, font-weight:bold, font-size:14px classDef architecture fill:#d4d4f9, font-weight:bold, font-size:14px classDef disentanglement fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px A[Disentangled image generation
through structured noise
injection] --> B[Structured noise injection
enables image editing. 1] A --> C[Networks generate realistic
images, fail at editing. 2] A --> D[Goal: Restrict noise influence,
separate details. 3] A --> E[Noise codes: direct mapping,
instance normalization. 4] A --> F[Spatial correspondence: input
tensor, final image. 5] A --> G[Architecture: high disentanglement,
two noise codes. 6] G --> H[Spatially variable codes
resample image regions. 7] G --> I[Spatially invariable code
defines style, color. 8] G --> J[Spatial disentanglement: structured
variable codes. 9] J --> K[Variable code cells:
local, shared, global. 10] K --> L[Global, shared codes
encode spanning information. 11] K --> M[Independent layers guarantee
local code independence. 12] G --> N[Invariable code: unique
local, stylistic information. 13] N --> O[Without invariable code,
local changes background. 14] A --> P[Method outperforms StyleGAN
in disentanglement. 15] P --> Q[PathLength measures invariable
code interpolation influence. 16] P --> R[Linear separability measures
classifier inaccuracy. 17] P --> S[Higher scores indicate
entangled mapping. 18] H --> T[Resampling variable global
affects pose, maintains style. 19] H --> U[Resampling variable shared
affects age, accessories, dimensions. 20] H --> V[Resampling mouth local
codes changes shape. 21] H --> W[Resampling top local
codes changes hairstyle. 22] I --> X[Resampling invariable maintains
pose, changes background, ethnicity. 23] A --> Y[Future: content-style separation,
invariable code control. 24] Y --> Z[Potential ethnicity change,
maintaining stylistic aspects. 25] P --> AA[Disentanglement scores compare methods. 26] F --> AB[Spatial correspondence enables
targeted image editing. 27] G --> AC[Architecture achieves independence,
local-shared-global control. 28] G --> AD[Variable, invariable codes
allow fine-grained editing. 29] A --> AE[Future research: complete
separation, attribute manipulation. 30] class A,B,C noise class D,E,F,AB,AC,AD,AE editing class G,H,I,J,K,L,M,N,O,T,U,V,W,X,Y,Z architecture class P,Q,R,S,AA disentanglement class Y,Z,AE future

Resume:

1.- Disentangled image generation through structured noise injection enables editing of randomly generated images.

2.- Networks generate realistic images but fail at editing.

3.- Goal: Restrict influence of noise code entries to specific image regions, separate global/stylistic details from local details.

4.- Two ways of using input noise codes: direct mapping (DCGAN) and instance normalization parameter computation (style-based generators).

5.- Spatial correspondence exists between input tensor and final image.

6.- Proposed architecture achieves high disentanglement using two input noise codes: spatially variable and spatially invariable.

7.- Spatially variable codes allow resampling of specific image regions.

8.- Spatially invariable code defines most style and color information.

9.- Spatial disentanglement achieved by structuring spatially variable codes with local, shared, and global codes.

10.- Each cell of spatially variable code has unique local code, shared code with neighbors, and shared global code.

11.- Global and shared codes encode information spanning multiple locations (pose, accessories).

12.- Each cell has independent fully connected layer, guaranteeing local code independence after mapping.

13.- Spatially invariable code contains unique local code, leveraged for expressing stylistic information.

14.- Without spatially invariable code, local codes can change background and style in addition to local details.

15.- Method outperforms state-of-the-art StyleGAN in disentanglement scores.

16.- PathLength measures influence of interpolating spatial invariable code.

17.- Linear separability measures inaccuracy of linear attribute classifiers trained on input codes.

18.- Higher PathLength and linear separability scores indicate entangled mapping.

19.- Resampling global part of spatially variable code affects pose while maintaining likeness and background style.

20.- Resampling shared part of spatially variable code affects age, accessories, and face dimensions.

21.- Resampling local codes around mouth changes mouth shape.

22.- Resampling local codes in top rows changes hairstyle.

23.- Resampling spatially invariable codes maintains pose, age, facial expressions, and clothing shape while changing background, ethnicity, and sex.

24.- Future work: separating content and style, offering more control in spatially invariable code, determining suitable decomposition of generation process.

25.- Potential to change ethnicity while maintaining other stylistic aspects of face image.

26.- Disentanglement scores (PathLength and linear separability) used to compare methods.

27.- Spatial correspondence enables targeted editing of generated images.

28.- Architecture structured to achieve independence and control over local, shared, and global image aspects.

29.- Combination of spatially variable and invariable codes allows for fine-grained editing capabilities.

30.- Opens up possibilities for future research in complete content-style separation and controlled attribute manipulation in generated images.

Knowledge Vault built byDavid Vivancos 2024