Knowledge Vault 6 /17 - ICML 2016
Memory Networks for Language Understanding
Jason Weston
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d9c9, font-weight:bold, font-size:14px classDef memory fill:#d4f9d4, font-weight:bold, font-size:14px classDef dialogue fill:#d4d4f9, font-weight:bold, font-size:14px classDef model fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px Main[Memory Networks for
Language Understanding] Main --> A[Memory Networks] Main --> B[Dialogue Systems] Main --> C[Model Architectures] Main --> D[Future Directions] A --> A1[Weston: SVMs, NLP,
neural nets expert
1] A --> A2[Combine memory with
learning component 3] A --> A3[Hard attention finds
supporting dialogue facts
5] A --> A4[Multiple hops improve
multi-fact task performance
6] A --> A5[Continuous attention trains
without fact supervision
7] A --> A6[End-to-end networks improve
with multiple hops
8] B --> B1[Build intelligent agent
learning from dialogue
2] B --> B2[Toy tasks test
dialogue reasoning capabilities
4] B --> B3[Language modeling tests
long-term context use
10] B --> B4[QA datasets assess
long context reasoning
11] B --> B5[Movie dialogue tests
QA, recommendation abilities
15] B --> B6[RNN-CNN excels on
Ubuntu dialogue corpus
16] C --> C1[Related: NTM, RNNs,
attention models, RAM
9] C --> C2[Self-supervision, multi-hop attention
aid performance 12] C --> C3[Memory networks competitive
on QA datasets
13] C --> C4[Key-value separation improves
memory network performance
14] C --> C5[Realistic tasks drive
innovative model architectures
17] C --> C6[Response prediction alternative
to rewards training
19] D --> D1[Reinforcement learning needed
for dialogue mastery
18] D --> D2[Architectures for learning
from interactive feedback
20] D --> D3[Code, data available
questions remain open
21] D --> D4[Building models for
meaningful dialogue engagement
22] D --> D5[Attention enables scaling
to large memories
23] D --> D6[Self-supervised memory improves
answer retrieval performance
24] A --> E[Advanced Concepts] E --> E1[Key-value separation enhances
retrieval, prediction representations
25] E --> E2[Joint model tackles
QA, recommendations challenges
26] E --> E3[Aiming for versatile
open-ended dialogue model
27] E --> E4[Interaction-based learning key
for dialogue agents
28] E --> E5[Rich feedback trains
answer comprehension skills
29] E --> E6[Future work: reasoning,
attention, memory challenges
30] class Main main class A,A1,A2,A3,A4,A5,A6 memory class B,B1,B2,B3,B4,B5,B6 dialogue class C,C1,C2,C3,C4,C5,C6 model class D,D1,D2,D3,D4,D5,D6,E,E1,E2,E3,E4,E5,E6 future

Resume:

1.- Jason Weston got PhD in 2000, co-supervised by Vapnik. Known for work on SVMs, NLP with neural nets, memory nets.

2.- Goal is to build intelligent conversational agent that can learn from dialogue. Challenges include reasoning, long conversations, learning new knowledge.

3.- Memory networks combine large memory with learning component that can read/write to memory. Many possible variations in architecture.

4.- Toy tasks designed to test reasoning capabilities needed for dialogue, like tracking location of objects, counting, deduction, pathfinding.

5.- First memory network used hard attention over memories to find supporting facts, trained with supporting facts as additional supervision.

6.- Increasing memory hops improves performance on toy tasks requiring multiple supporting facts. Some tasks still challenging, like pathfinding.

7.- End-to-end memory networks use continuous attention over memories to train without supervision of supporting facts. Attention is interpretable.

8.- On toy tasks, multiple hops improve accuracy for end-to-end networks, but still fall short of strongly supervised version on some tasks.

9.- Related work includes NTM, stack-augmented RNNs, attention-based models for MT, NLP tasks. RAM workshop at NIPS explores reasoning, attention, memory.

10.- Large language modeling datasets test ability of models to use long-term context. Analysis shows attention hops flip between nearby and faraway words.

11.- New datasets test reasoning over long contexts through cloze-style QA (CBT, CNN/DailyMail). Humans use context to improve accuracy.

12.- Self-supervision on memories (assuming answer is in them) and multi-hop attention help on CBT. Still a gap to human performance.

13.- Memory networks competitive on QA datasets like WebQuestions, WikiQA, but focus has been more on learning algorithms than feature engineering.

14.- Key-value memory networks separate memories into keys for addressing and values for reading. Allows different representations for each to improve performance.

15.- Movie dialogue dataset tests both QA and recommendation abilities in conversations. Baseline models provided, but challenges remain.

16.- Memory networks achieve strong results on Ubuntu dialogue corpus, but best model so far is an RNN-CNN architecture.

17.- More realistic toy dialogue tasks could help drive innovative model architectures. Understanding successes/failures on real data remains challenging.

18.- Supervised datasets exist, but reinforcement learning through interaction may be needed, similar to how children learn language.

19.- Forward prediction of conversational responses provides an alternative training signal to rewards. Textual feedback can be more informative than binary rewards.

20.- Dialogue-based language learning paper proposes architectures and training procedures for learning from various types of interactive feedback without explicit rewards.

21.- Code and data available online for memory networks and related research. Many open questions remain in reasoning, attention and memory.

22.- Motivation is building models that can engage in meaningful dialogue by combining reasoning, attention, and memory.

23.- Attention enables scaling to large memories by retrieving relevant information as needed. Increasing hops allows deeper reasoning.

24.- Self-supervised memory helps performance by assuming answers are present in the input and learning to pick them out.

25.- Separating memory into key/value allows different representations for retrieval and prediction. Improves performance on WikiQA.

26.- Movie dialogue data tests both factual QA and recommendation. Joint model does both but still room for improvement.

27.- Goal is to have one model that can engage in open-ended dialogue, asking and answering questions, making recommendations, etc.

28.- Reinforcement learning from conversational interaction, rather than supervised datasets, may be key to achieving general dialogue agents.

29.- Rich textual feedback provides more than just a reward signal. Predicting feedback trains model to understand answers.

30.- Much future work remains to solve reasoning, attention, memory challenges and build intelligent dialogue agents that can learn.

Knowledge Vault built byDavid Vivancos 2024