The End Of Knowledge - Vault 2 - ICLR (2014-2023)

graph LR classDef semantic fill:#f9d4d4, font-weight:bold, font-size:14px; classDef challenges fill:#d4f9d4, font-weight:bold, font-size:14px; classDef architecture fill:#d4d4f9, font-weight:bold, font-size:14px; classDef decoding fill:#f9f9d4, font-weight:bold, font-size:14px; classDef paraphrasing fill:#f9d4f9, font-weight:bold, font-size:14px; classDef future fill:#d4f9f9, font-weight:bold, font-size:14px; A[Mirella Lapata
ICLR 2019 ] --> B[Map natural to machine
language since 1960s. 1] A --> C[Query databases, instruct robots,
question knowledge bases. 2] A --> D[Digital assistants: popular,
important breakthrough. 3] A --> E[Structural mismatch, well-formed output,
multiple phrasings. 4] A --> F[Encoder-decoder LSTM backbone. 5] F --> G[Attention improves over
final hidden state. 6] F --> H[Sequential hierarchical decoding
for output structure. 7] H --> I[Parentheses errors, stronger
approach needed. 8] A --> J[Two-stage: sketch, then details. 9] J --> K[Sketch preserves core structure,
omits details. 10] J --> L[Separates high-level, low-level
semantics. 11] J --> M[Jointly learns sketches
and full outputs. 12] J --> N[Deterministic sketch templates. 13] J --> O[Improves accuracy across datasets,
languages. 14] A --> P[Paraphrases handle linguistic variation. 15] P --> Q[Jointly train paraphrase scoring
and question-answering. 16] P --> R[Integrated end-to-end system. 17] P --> S[Paraphrase model crucial for success. 18] S --> T[Pivoting generates paraphrases. 19] S --> U[Neural machine translation
for pivoting. 20] P --> V[Handles differently phrased
unseen questions. 21] P --> W[Outperforms previous approaches. 22] A --> X[Encoder-decoder works well,
constrained decoding important. 23] A --> Y[Models general across
datasets, meaning representations. 24] A --> Z[Paraphrase generation alternatives:
GANs vs pivoting. 25] A --> AA[Higher sketch loss weight
improves predictions. 26] A --> AB[Fine-tuning pre-trained models
helps small datasets. 27] A --> AC[BERT embeddings could
improve similarity scoring. 28] A --> AD[RL for training with
only final answer. 29] AD --> AE[RL rewards well-formedness
when many forms correct. 30] class A,B,C semantic; class D,E challenges; class F,G,H,I architecture; class J,K,L,M,N,O decoding; class P,Q,R,S,T,U,V,W paraphrasing; class X,Y,Z,AA,AB,AC,AD,AE future;

Resume:

1.-The goal of semantic parsing is to map natural language to machine-executable language, a challenge since the 1960s.

2.-Examples include querying databases, instructing robots, and asking questions of knowledge bases like Google's Knowledge Graph.

3.-Digital assistants like Alexa, Cortana and Google Home are becoming popular and Bill Gates sees them as an important technological breakthrough.

4.-Three main challenges: structural mismatch between natural and machine language, generating well-formed output, handling different phrasings of the same meaning.

5.-A neural encoder-decoder architecture is used as the backbone - an encoder LSTM represents the input, a decoder LSTM generates the output.

6.-Attention mechanisms allow the decoder to attend to relevant parts of the input representation, improving over just using the final hidden state.

7.-To handle output structure, decoding is modified to generate the output sequentially but hierarchically, using non-terminal tokens to denote hierarchy.

8.-This sequential hierarchical decoding helps but still makes mistakes in parentheses, so a stronger approach is needed to ensure well-formed output.

9.-A two-stage decoding approach first generates an abstract sketch of the output, then fills in the details to get the final output.

10.-The sketch omits low-level details but preserves the core output structure shared by examples with the same basic meaning.

11.-Separating high-level and low-level semantics makes the meaning representation more compact at the sketch level and provides context for the final decoding.

12.-The two-stage model jointly learns to predict sketches and full outputs, maximizing likelihood of meaning representations given natural language inputs.

13.-Templates for sketches are created deterministically by removing variable info, predicate arguments, anonymizing tokens, and collapsing clauses, depending on the meaning representation language.

14.-Two-stage decoding improves accuracy across multiple datasets and meaning representation languages, showing the approach is general.

15.-To handle linguistic variation in how meanings are expressed, paraphrases of the input question are used.

16.-Previous work used paraphrases but generated them separately from question-answering - the two components need to be integrated.

17.-The proposed approach jointly trains a paraphrase scoring model along with the question-answering model for an integrated end-to-end system.

18.-The paraphrase model is crucial - if paraphrases are bad, the whole system fails. Paraphrases are generated via pivoting.

19.-Pivoting translates the input to a foreign language and back to obtain paraphrases. Multiple pivot languages are used for robustness.

20.-A neural machine translation system is used for pivoting, typically based on an encoder-decoder architecture with attention.

21.-Generating paraphrases allows the integrated system to handle questions that are phrased differently from those seen in training.

22.-The integrated paraphrasing and question-answering system outperforms previous approaches on benchmark datasets for semantic parsing.

23.-Key takeaways: encoder-decoder architectures work well for semantic parsing, constrained decoding is important for well-formedness, and paraphrasing improves robustness.

24.-The models are general across datasets and meaning representations. Future work includes testing on more languages/domains and learning from just databases.

25.-Generating paraphrases via GANs is raised as an alternative to pivoting through other languages that could be explored.

26.-Weighting the sketch loss more than the final output loss helps, as predicting good sketches is important.

27.-Fine-tuning large pre-trained language models could help, especially for small datasets, but benefits may diminish for very large training sets.

28.-Using BERT embeddings could potentially improve the similarity scoring between paraphrases and original questions.

29.-Reinforcement learning could be applied when only the final answer, not the full logical form, is available for training.

30.-With only the final answer, many logical forms could be correct, so RL with rewards for well-formedness could guide the search.

Knowledge Vault built byDavid Vivancos 2024