Knowledge Vault 5 /25 - CVPR 2017
Learning from Simulated and Unsupervised Images through Adversarial Training
Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, & Russell Webb
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef simgan fill:#f9d4d4, font-weight:bold, font-size:14px classDef refiner fill:#d4f9d4, font-weight:bold, font-size:14px classDef discriminator fill:#d4d4f9, font-weight:bold, font-size:14px classDef training fill:#f9f9d4, font-weight:bold, font-size:14px classDef performance fill:#f9d4f9, font-weight:bold, font-size:14px A[Learning from Simulated
and Unsupervised Images
through Adversarial Training] --> B[SimGAN: Bridging synthetic-real
image distribution gap. 1] B --> C[Refiner network: Outputs
realistic refined images. 2] B --> D[Discriminator network:
Classifies real vs. refined. 3] B --> E[Alternating training: Refiner,
discriminator updated alternately. 4] B --> F[Self-regularization loss:
Preserves annotations in refinement. 5] B --> G[Visual Turing test:
Synthetic vs. real difficulty. 8] B --> H[No correspondence required:
Synthetic-real pairing unnecessary. 18] C --> I[Reducing artifacts: Local
loss, history help. 19] C --> J[Fully convolutional networks:
Refiner and discriminator. 20] D --> K[Discriminator loss:
Real/refined cross-entropy loss. 9] D --> L[Buffer of refined images:
Improves discriminator stability. 13] E --> M[Refiner loss: Fools
discriminator, appears real. 10] E --> N[Unstable training: Moving
refiner/discriminator targets. 11] E --> O[Local adversarial loss: Fully
convolutional, reduces artifacts. 12] A --> P[Eye gaze estimation:
Image input, gaze output. 6] A --> Q[Hand pose estimation:
Depth image to joints. 7] A --> R[Synthetic data generation:
Simulators produce infinite data. 21] A --> S[Refining synthetic data:
Refiner processes synthetic data. 22] A --> T[Quantitative experiments: Synthetic,
refined, real performance. 14] T --> U[Comparing performance: Synthetic,
refined, real models. 23] T --> V[Performance improvement:
Refined beats synthetic. 15] T --> W[Outperforming limited real data:
Refined can outperform. 16] T --> X[Improving simulated data utility:
SimGAN enhances utility. 24] A --> Y[Preserving annotations:
SimGAN maintains synthetic annotations. 17] A --> Z[Blog post: More
SimGAN info available. 25] class A,B,G,H,Y,Z simgan class C,I,J,R,S refiner class D,K,L discriminator class E,M,N,O,P,Q training class T,U,V,W,X performance

Resume:

1.- SimGAN: A data-driven approach to bridge the distribution gap between synthetic and real images.

2.- Refiner network: A fully convolutional neural network that outputs refined images that look realistic.

3.- Discriminator network: A two-class classification network that distinguishes between real and refined images.

4.- Alternating training: Refiner and discriminator networks are updated alternately to generate realistic images.

5.- Self-regularization loss: Minimizes the distance between synthetic and refined images to preserve annotation information.

6.- Eye gaze estimation: A task where the input is an image, and the output is the gaze direction.

7.- Hand pose estimation: A task where the input is a hand depth image, and the output is joint locations.

8.- Visual Turing test: Comparing the difficulty of distinguishing between synthetic vs. real and refined vs. real images.

9.- Discriminator loss: A two-class cross-entropy loss for classifying real and refined images.

10.- Refiner loss: Tries to fool the discriminator by generating refined images that appear real.

11.- Unstable training: Alternating training can be unstable due to moving targets for refiner and discriminator.

12.- Local adversarial loss: Using a fully convolutional discriminator to make local changes and reduce artifacts.

13.- Buffer of refined images: Using a history of refined images to update the discriminator and improve stability.

14.- Quantitative experiments: Evaluating the performance of ML models trained on synthetic, refined, and real images.

15.- Performance improvement: Refined images lead to better performance compared to synthetic images in gaze estimation.

16.- Outperforming limited real data: Refined images can outperform models trained on a limited amount of real data.

17.- Preserving annotations: SimGAN preserves annotations from synthetic images in the refined images.

18.- No correspondence required: SimGAN does not require correspondence between synthetic and real images.

19.- Reducing artifacts: Local adversarial loss and using a history of refined images help reduce artifacts.

20.- Fully convolutional networks: Both refiner and discriminator networks are fully convolutional.

21.- Synthetic data generation: Simulators can generate an almost infinite amount of synthetic data.

22.- Refining synthetic data: Synthetic data is refined by feeding it through the refiner network.

23.- Comparing performance: Performance is compared between models trained on synthetic, refined, and real data.

24.- Improving simulated data utility: SimGAN improves the utility of simulated data for training ML models.

25.- Blog post: Additional information about SimGAN is available on the Apple Machine Learning blog.

Knowledge Vault built byDavid Vivancos 2024