Knowledge Vault 5 /37 - CVPR 2018
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Yunjey Choi ; Minje Choi ; Munyoung Kim ; Jung-Woo Ha ; Sunghun Kim ; Jaegul Choo
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef imagetranslation fill:#f9d4d4, font-weight:bold, font-size:14px classDef stargan fill:#d4f9d4, font-weight:bold, font-size:14px classDef losses fill:#d4d4f9, font-weight:bold, font-size:14px classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px classDef results fill:#f9d4f9, font-weight:bold, font-size:14px A[StarGAN: Unified Generative
Adversarial Networks for
Multi-Domain Image-to-Image Translation] --> B[Source to target domain conversion 1] A --> C[Translating multiple domains/attributes 2] A --> D[Separate models per domain pair 3] A --> E[Unified multi-domain model 4] E --> F[Image + domain label input 5] E --> G[Encoder-decoder with residual blocks 6] A --> H[Translated images match real ones 7] A --> I[Target domain classification 8] A --> J[Preserves input content 9] A --> K[Facial attribute transfer on CelebA 10] A --> L[Outperforms DIAT, CycleGAN, IcGAN 11] A --> M[Benefits from multi-task learning 12] M --> N[Preferred for realism, quality, identity 13] A --> O[Facial expression synthesis 14] O --> P[Leverages all domains data 15] O --> Q[Most realistic expressions 16] A --> R[Single model vs many 17] A --> S[Pytorch implementation available 18] A --> T[Limited diversity in CelebA 19] A --> U[Potential for style transfer 20] class B,C imagetranslation class E,F,G stargan class H,I,J losses class K,O applications class L,M,N,P,Q,R,S,T,U results

Resume:

1.- Image-to-image translation: Converting images from source to target domain.

2.- Multi-domain translation: Translating between multiple domains/attributes (e.g. hair colors).

3.- Previous work limitations: Separate model per domain pair; not scalable.

4.- StarGAN: Unified model for multi-domain translation using a single generator.

5.- Generator inputs: Image + target domain label. Learns to flexibly translate to target domain.

6.- Generator architecture: Encoder-decoder with residual blocks.

7.- Adversarial loss: Makes translated images indistinguishable from real ones.

8.- Domain classification loss: Ensures translated images are properly classified to target domain.

9.- Reconstruction loss: Preserves input image content; only changes domain-related parts.

10.- Facial attribute transfer: Translating attributes like hair color, gender, age on CelebA dataset.

11.- Comparison to baselines: StarGAN outperforms DIAT, CycleGAN, IcGAN in visual quality.

12.- Multi-task learning effect: StarGAN benefits from learning multiple translations in one model.

13.- Mechanical Turk user study: Users preferred StarGAN over baselines for realism, transfer quality, identity preservation.

14.- Facial expression synthesis: Imposing target facial expression on input face image.

15.- Limited data augmentation effect: StarGAN leverages all domains' data; baselines limited to 2 at a time.

16.- Facial expression classification error: Lowest for StarGAN, indicating most realistic expressions.

17.- Number of parameters: StarGAN needs just 1 model; baselines need many (DIAT: 7, CycleGAN: 14).

18.- Pytorch implementation available.

19.- Dataset bias: CelebA has mostly Western celebrities; performance drops on Eastern faces & makeup.

20.- Applicability beyond faces: Tested on style transfer (e.g. Van Gogh) but results not shown.

Knowledge Vault built byDavid Vivancos 2024