Knowledge Vault 5 /84 - CVPR 2023
Vision, Language, and Creativity
Devi Parikh, Michal Irani, Aaron Hertzmann, Jason Salavon
< Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:

graph LR classDef experts fill:#f9d4d4, font-weight:bold, font-size:14px classDef research fill:#d4f9d4, font-weight:bold, font-size:14px classDef art fill:#d4d4f9, font-weight:bold, font-size:14px classDef intelligence fill:#f9f9d4, font-weight:bold, font-size:14px classDef data fill:#f9d4f9, font-weight:bold, font-size:14px classDef future fill:#d4f9f9, font-weight:bold, font-size:14px A[Vision, Language, and
Creativity] --> B[AI, vision, creativity,
art experts 1] A --> C[Parikh: vision-language,
creativity enhancement 2] A --> D[Salavon: generative art,
culture, tech 3] A --> E[Hertzmann: computational creativity,
historical perspective 4] B --> F[Irani: machines memorize,
humans generalize 5] F --> G[Deep internal learning:
abundant patch recurrence 6] F --> H[Deep external learning:
large datasets 7] C --> I[Vision-language models
lack humor, cognition 8] C --> J[Models lack compositionality,
relations understanding 9] D --> K[Artists feel work
stolen for training 10] K --> L[Historically, artists learn
by copying 11] K --> M[Precarious artists most concerned 12] L --> N[Creative work may
become commodity 13] C --> O[Over-reliance on quantitative
benchmarks problematic 14] O --> P[Creativity, humor hard
to measure 15] C --> Q[Future: multimodal knowledge leveraging 16] Q --> R[New high-quality datasets needed 17] D --> S[CV community should
collaborate with artists 18] S --> T[Rapid iteration tools for artists 19] S --> U[Complex copyright, usage
rights issues 20] C --> V[Unsolved: compositionality,
object relations 21] F --> W[Differences in human
vs machine intelligence 22] F --> X[Mind-reading: learning from
limited brain data 23] E --> Y[Modeling curiosity,
serendipity computationally 24] S --> Z[Making AI tools
controllable, useful 25] S --> AA[AI democratizes high-quality
creative production 26] AA --> AB[Next gen will
innovate with AI 27] Z --> AC[More user control
over outputs needed 28] A --> AD[Embrace large models,
pursue exciting research 29] A --> AE[Interdisciplinary collaboration leads
to novel insights 30] class A,B,C,D,E experts class F,G,H,I,J,O,P,Q,R,V,W,X,Y research class K,L,M,N,S,T,U art class AA,AB,AC,Z future class AD,AE intelligence

Resume:

1.- Panelists introduced: Devi Parikh, Jason Salavon, Aaron Hertzmann, Michal Irani - experts in AI, computer vision, computational creativity, and art.

2.- Devi Parikh's research: Attributes for human-machine communication, vision-language models, systems for enhancing human creativity, multimodal foundation models, generative AI.

3.- Jason Salavon's art: Software-based fine arts at intersection of art, culture, technology. Generative artworks from culturally loaded material.

4.- Aaron Hertzmann's work: Simulating artistic creativity computationally, style transfer, motion style learning, bringing historical perspective to AI art.

5.- Michal Irani's view: Machines memorize better than humans but can't generalize outside training distribution. Humans generalize from few examples.

6.- Deep internal learning: Abundant patch recurrence in an image/video provides enough info to learn. Adapts to image-specific data/degradation.

7.- Deep external learning: Training extensively on large datasets. True intelligence/creativity lies between the two extremes.

8.- Humor and AI: Current vision-language models lack sense of humor and cognition of what's funny in images.

9.- Vision-language model limitations: Models perform bag-of-words captioning without understanding object relations/compositionality. An open challenge.

10.- Training data concerns: Many artists feel work is "stolen" when ingested for training. Complex issue touching ownership, copyright, compensation.

11.- Transformative use: Historically, artists learn by copying. Once art is public, hard to control its use. Cultural adjustment needed.

12.- Threatened artists: Those already in precarious positions are most concerned about AI displacing their creative work and profit.

13.- Commodification of pixels: Concern that creative visual work may become a commodity, with pricing based on output resolution. Complex to navigate.

14.- Evaluation and metrics: Over-reliance on quantitative benchmarks. Reviewers should allow for qualitative arguments. Numbers can encourage lazy reviewing.

15.- Limitations of metrics: Difficult to quantitatively measure aspects like creativity and humor. Human evaluation still most useful.

16.- Future vision systems: Will likely be multimodal, leveraging knowledge across vision, language, speech, audio to expand capabilities.

17.- Data limitations: Expanding multimodal systems requires new high-quality datasets. A key challenge for the vision community.

18.- Engaging artists: CV community should collaborate with artists/designers. Open-source models, make them easy to use by non-programmers.

19.- Rapid iteration tools: Artists want ability to rapidly experiment with new techniques. Waiting for model output can be part of creative process.

20.- Copyright questions: As in other domains like music, complex issues around data ownership and usage rights must be navigated.

21.- Intersection of vision/language: Many unsolved problems remain, e.g. visual compositionality, relations between objects, moving beyond bag-of-words.

22.- Human vs machine intelligence: Differences not fully understood. Adversarial examples fool AI but not humans. An area for study.

23.- Mind-reading research: Decoding visual experiences from brain activity (fMRI). Requires learning from limited data, an interesting challenge.

24.- Modeling curiosity/creativity: Can aspects of human creative process, like open-ended exploration and serendipity, be captured computationally?

25.- Practical model utility: Making generative tools controllable and useful for end-users' actual creative needs is important, e.g. ControlNet.

26.- Democratizing creation: AI tools have potential to make high-quality creative production accessible to the masses. New art forms may result.

27.- Next generation's creativity: Children growing up with these tools as natives will likely produce innovative, hard-to-predict new works and genres.

28.- Control and editability: Giving users more control over generative model outputs is an underexplored but important research direction.

29.- Advice for students: Embrace large models as enabling infrastructure. Pursue research you're personally excited about and rally others around it.

30.- Interdisciplinary collaboration: Combining expertise across computer science, art, cognitive science, etc. can lead to novel insights and impactful work.

Knowledge Vault built byDavid Vivancos 2024