Concept Graph & Resume using Claude 3 Opus | Chat GPT4o | Llama 3:
Resume:
1.- Visual question answering: Answering questions based on an image input.
2.- Neural Module Networks: Dynamically constructing neural networks based on the syntactic structure of the question.
3.- Abstract scenes dataset: Colored shapes used for pedagogical examples.
4.- Question-specific neural networks: Networks built on the fly from modules based on the question's syntactic analysis.
5.- Applying dynamic networks: Using the constructed network to process the input image and produce an answer.
6.- Structured neural models for vision: Related work in vision.
7.- Semantic parsing: Related work in natural language processing.
8.- Neural representation of question-specific computation: Representing the dynamically constructed computation as a neural network.
9.- Capabilities of VQA models: Expectations for what visual question answering models should be able to do.
10.- Understanding "red": Identifying red objects in an image.
11.- Visual attention mechanism: Focusing on relevant parts of the image, used in vision and language models.
12.- "Red" as a function: Mapping an image to an attention map highlighting red objects.
13.- Understanding "above": Transforming attention from one object (circles) to another (objects above circles).
14.- Complex questions: Combining multiple concepts (e.g., "red shape above a circle") to answer a question.
15.- Syntactic analysis: Parsing the structure of the question to guide network construction.
16.- Modules: Small network fragments used to build the question-specific neural network.
17.- Dynamically constructing networks: Building a custom network for each question based on its syntactic structure.
18.- Applying constructed networks to images: Using the dynamically built network to process the input image.
19.- Producing answers: Generating an answer to the question based on the network's output.
20.- Related work in structured neural models: Other research on incorporating structure into neural networks for vision tasks.
21.- Related work in semantic parsing: Other research on mapping natural language to executable representations.
22.- Neural representation of constructed computation: Encoding the dynamically built question-specific computation as a neural network.
23.- Expectations for VQA models: Capabilities that visual question answering models should possess.
24.- Mapping words to visual concepts: Associating words like "red" with their corresponding visual representations.
25.- Transforming attention: Using words like "above" to modify attention from one object to another in the image.
Knowledge Vault built byDavid Vivancos 2024