Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi
Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
graph LR
classDef superresolution fill:#f9d4d4, font-weight:bold, font-size:14px
classDef learning fill:#d4f9d4, font-weight:bold, font-size:14px
classDef applications fill:#d4d4f9, font-weight:bold, font-size:14px
classDef srresnet fill:#f9f9d4, font-weight:bold, font-size:14px
classDef srgan fill:#f9d4f9, font-weight:bold, font-size:14px
classDef evaluation fill:#d4f9f9, font-weight:bold, font-size:14px
classDef limitations fill:#f9d4d4, font-weight:bold, font-size:14px
classDef future fill:#d4f9d4, font-weight:bold, font-size:14px
A[Photo-Realistic Single Image
Super-Resolution Using a
Generative Adversarial Network] --> B[Photo-realistic single image super-resolution 1]
B --> C[Increase resolution, add texture 2]
B --> D[Applications: satellite, media, medical, surveillance 3]
A --> E[SRResNet: deep residual CNN 4]
E --> F[Trained on ImageNet 5]
E --> G[Deeper network, residual blocks, skip connections 6]
E --> H[Efficient sub-pixel convolution upscaling 7]
E --> I[More residual blocks improve PSNR 8]
E --> J[Improves bicubic but lacks perceptual quality 9]
E --> K[Regression to mean with MSE loss 10]
A --> L[SRGAN: GAN-based approach 11]
L --> M[Related work: feature, adversarial losses 12]
L --> N[Perceptual loss: MSE in VGG space 13]
L --> O[Adversarial loss: discriminator distinguishes real/fake 14]
O --> P[Generator fools discriminator, reconstructs details 15]
O --> Q[Discriminator based on modified VGG 16]
O --> R[Minimax optimization of cross-entropy 17]
L --> S[Adversarial loss preserves manifold 18]
L --> T[Content loss allows texture freedom 19]
L --> U[SRGAN adds texture, convincing results 20]
A --> V[Evaluation using MOS test 21]
V --> W[SRGAN outperforms in MOS 22]
V --> X[SRResNet excels in PSNR, SRGAN in perception 23]
V --> Y[SRGAN performs well at higher scales 24]
A --> Z[Limitations: text, numbers difficult 25]
Z --> AA[Importance of diverse training data 26]
A --> AB[Interest in improved GAN training 27]
A --> AC[Need for perceptual quality metrics 28]
A --> AD[Acknowledgments to co-authors 29]
A --> AE[Invitation to poster session 30]
class A,B,C superresolution
class E,F,G,H,I,J,K srresnet
class L,M,N,O,P,Q,R,S,T,U srgan
class D applications
class V,W,X,Y evaluation
class Z,AA limitations
class AB,AC future
Resume:
1.- Photo-realistic single image super-resolution using deep learning
2.- Increase spatial resolution and add fine texture detail
3.- Applications in satellite imaging, media content, medical imaging, surveillance
4.- SRResNet: deep residual CNN optimized for PSNR
5.- Trained on 350,000 ImageNet images
6.- Deeper network with identical residual blocks and skip connections
7.- Efficient sub-pixel convolution for upscaling
8.- More residual blocks improve PSNR
9.- SRResNet improves upon bicubic interpolation but lacks perceptual quality
10.- Regression to the mean problem with MSE loss
11.- SRGAN: GAN-based approach to overcome limitations of MSE
12.- Related work: feature space losses and adversarial losses
13.- Perceptual loss functions: MSE in VGG feature space
14.- Adversarial loss: discriminator network to distinguish real/fake images
15.- Generator trained to fool discriminator by reconstructing realistic details
16.- Discriminator architecture based on VGG with modifications
17.- Minimax optimization of cross-entropy loss
18.- Adversarial loss pulls reconstructions back to natural image manifold
19.- Content loss in feature space allows more freedom for texture details
20.- SRGAN adds fine texture details, perceptually convincing results
21.- Evaluation using Mean Opinion Score (MOS) test with human raters
22.- SRGAN outperforms SRResNet and reference methods in MOS
23.- SRResNet excels in PSNR, but SRGAN provides superior perceptual quality
24.- SRGAN performs well for higher upscaling factors (8x, 16x)
25.- Limitations: Difficulty reconstructing text and numbers
26.- Importance of training data diversity
27.- Interest in improved GAN training techniques
28.- Need for better objective functions capturing perceptual quality
29.- Acknowledgments to co-authors, particularly Wenze Shi
30.- Invitation to poster session for further discussion
Knowledge Vault built byDavid Vivancos 2024