Knowledge Vault 5 /79 - CVPR 2022
Imagination inspired AI for Art and Culture
Mohamed Elhoseiny
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef imagination fill:#f9d4d4, font-weight:bold, font-size:14px classDef emotion fill:#d4f9d4, font-weight:bold, font-size:14px classDef creativity fill:#d4d4f9, font-weight:bold, font-size:14px classDef learning fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px A[Imagination inspired AI
for Art and
Culture] --> B[AI for art, culture:
creation, affective expression. 1] A --> C[CANs: GANs create novel art,
deviate from style norms. 2] C --> D[Wundt curve: novelty impacts
creative appreciation. 3] C --> E[Style loss: discriminator aware
of norms, deviates. 4] C --> F[Fashion GANs: novel element mixing. 5] A --> G[Affective AI: emotionally aware
models communicate visually. 6] G --> H[Emotions constructed in moment,
not just retrieved. 7] G --> I[Experience enjoyment depends on context. 8] G --> J[Artemis: paintings with affective
language captions. 9] J --> K[Artemis captions more emotional
than COCO. 10] J --> L[Interface collected Artemis data,
emotions explained. 11] J --> M[Artemis biased positive,
Artemis II corrected. 12] M --> N[Artemis II: similar paintings,
contrasting emotions. 13] N --> O[Captions contrast emotions for
similar images, detail sensitivity. 14] J --> P[Artemis II models: 20% SIDO gain,
73% human preference. 15] C --> Q[CWAN: deviation as random walk
across style classes. 16] Q --> R[Uniform landing probabilities promote
style norm deviation. 17] C --> S[CWAN: more likeable, deviant images
than StyleGAN. 18] C --> T[CWAN constructs diverse emotional
experiences unsupervised. 19] B --> U[Generating paintings from desired
emotional impact text. 20] B --> V[INR-GAN: continuous images from text,
arbitrary resolution. 21] B --> W[Visual GPT: GPT-2 knowledge for
data-efficient captioning. 22] W --> X[Gating regulates GPT-2 influence
for knowledge distillation. 23] W --> Y[Visual GPT outperforms baselines
with limited data. 24] W --> Z[Visual GPT techniques could improve
multilingual affective captioning. 25] B --> AA[Video model: hour-long videos,
5% more compute, 50% quality. 26] AA --> AB[Model enables longer artistic
AI-generated videos. 27] G --> AC[Interfaces collecting contrasting examples
mitigate affective bias. 28] C --> AD[Random walks model and encourage
generative norm deviation. 29] G --> AE[Knowledge distillation from language models:
data-efficient affective captioning. 30] class A,B,U,V,W,X,Y,Z,AA,AB imagination class G,H,I,J,K,L,M,N,O,P,AC,AE emotion class C,D,E,F,Q,R,S,T,AD creativity class W,X,Y,Z learning class AA,AB,Z future

Resume:

1.- The talk covers imagination-inspired AI for art and culture, focusing on creation and affective expression.

2.- Creative Adversarial Networks encourage GANs to produce novel art by deviating from existing style norms.

3.- The Wundt curve suggests novelty that is too high or low reduces appreciation of creative works.

4.- Style classification loss makes the discriminator aware of style norms. High entropy loss encourages the generator to deviate from them.

5.- Creative fashion GANs produced designs mixing elements like vests with pants in novel ways.

6.- Affective AI aims to build emotionally aware models that can effectively communicate about visual stimuli.

7.- The theory of constructed emotion proposes that emotions are constructed in the moment rather than simply retrieved.

8.- Experiences like fear can be enjoyable or not enjoyable depending on the context.

9.- The Artemis dataset pairs paintings with affective language captions focused on emotional experiences.

10.- Artemis captions are more emotionally descriptive compared to datasets like COCO.

11.- An interface was built to collect Artemis data, having participants select emotions evoked and explain them.

12.- The initial Artemis dataset was biased towards positive emotions. Artemis II aimed to correct this.

13.- For Artemis II, each original painting was paired with a visually similar one evoking a different emotion.

14.- This encouraged captions with contrasting emotional experiences for similar images, making models more sensitive to emotion-evoking details.

15.- Models trained on Artemis II showed a 20% gain in SIDO score and were preferred by humans 73% of the time.

16.- Message passing in CreAtiVe Adversarial Networks models deviation as a random walk across art style classes.

17.- Landing probabilities from the random walk are encouraged to be uniform, promoting deviation from style norms.

18.- CWAN produces more likeable images that deviate further from training data compared to StyleGAN models.

19.- CWAN can construct diverse emotional experiences in an unsupervised way based on human evaluations.

20.- Additional work explored generating paintings from textual descriptions of desired emotional impacts.

21.- INR-GAN can generate continuous images from text, allowing arbitrary resolutions and generation outside training image boundaries.

22.- Visual GPT leverages knowledge from GPT-2 to enable data-efficient image captioning when training examples are limited.

23.- A gating mechanism regulates the influence of GPT-2 weights to allow effective knowledge distillation.

24.- Visual GPT outperforms baselines when trained on 0.5%, 0.1% and 0.01% of data on COCO and medical report datasets.

25.- Applying techniques like Visual GPT to distill knowledge from language models could improve affective captioning in more languages.

26.- A video generation model was proposed capable of producing hour-long videos with 5% more compute and 50% better quality.

27.- The video model shows promise for enabling longer artistic AI-generated videos.

28.- Evaluation interfaces designed to collect emotionally contrasting examples help mitigate bias in affective datasets.

29.- Random walks in latent space provide an approach to modeling and encouraging deviation from existing norms in generative models.

30.- Distilling knowledge from large pre-trained language models is a promising approach for data-efficient affective captioning.

Knowledge Vault built byDavid Vivancos 2024