Knowledge Vault 2/22 - ICLR 2014-2023
Chris Dyer ICLR 2016 - Keynote - Should Model Architecture Reflect Linguistic Structure?
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef arbitrariness fill:#f9d4d4, font-weight:bold, font-size:14px; classDef compositionality fill:#d4f9d4, font-weight:bold, font-size:14px; classDef morphology fill:#d4d4f9, font-weight:bold, font-size:14px; classDef hierarchy fill:#f9f9d4, font-weight:bold, font-size:14px; classDef rnng fill:#f9d4f9, font-weight:bold, font-size:14px; classDef grasping fill:#f9d4f9, font-weight:bold, font-size:14px; A[Chris Dyer
ICLR 2016] --> B[No relationship between
word forms and meanings 1] A --> C[Meaning changes systematically
when parts combined 2] C --> D[Memorization via embeddings,
generalization via composition 3] C --> E[Idioms challenge
word-level compositionality 4] A --> F[Morphology:
subword structure matters 5] F --> G[Character-level models capture
arbitrariness and compositionality 6] F --> H[Subword models
require fewer parameters 7] F --> I[Character models generate
plausible nonce embeddings 8] F --> J[FSTs analyze
words into morphemes 9] F --> K[Open-vocabulary LMs model
all possible strings 10] F --> L[FST morphological knowledge
improves neural LMs 11] F --> M[Character/subword models help
for morphology, OOVs 12] A --> N[Hierarchical structure uncontroversial,
details debated 13] N --> O[NPIs follow negation
in structural configuration 14] N --> P[Cross-linguistic evidence supports
hierarchical generalizations 15] N --> Q[RNNGs capture hierarchy
with minimal extensions 16] Q --> R[RNNGs generate terminals
and nonterminal symbols 17] Q --> S[Composition: pop nonterminal
and children, compose 18] Q --> T[RNNGs capture headedness
using bidirectional RNNs 19] Q --> U[RNNGs avoid
marginalization over trees 20] Q --> V[Generative RNNGs outperform
for constituency parsing 21] Q --> W[RNNGs are
strong language models 22] A --> X[Character/subword models and structure:
linguistic approaches 23] X --> Y[Linguistic structure, hierarchy
benefit neural models 24] Y --> Z[Designing models around linguistic
principles improves performance 25] class B arbitrariness; class C,D,E compositionality; class F,G,H,I,J,K,L,M morphology; class N,O,P hierarchy; class Q,R,S,T,U,V,W rnng;

Resume:

1.-Arbitrariness: No relationship between word forms and meanings. Changing a letter results in unpredictable meaning changes (e.g. "car" vs "bar").

2.-Compositionality: Meaning changes systematically when parts are combined (e.g. "John dances" vs "Mary dances"). Allows generalization beyond memorization.

3.-Classical neural language models: Memorization via word embeddings, generalization via learned composition functions over embeddings.

4.-Idioms challenge word-level compositionality, requiring memorization at the sentence level too (e.g. "kicked the bucket").

5.-Morphology shows word forms are not independent - subword structure matters, especially in some languages.

6.-Character-level models can capture arbitrariness and compositionality. Improves over word lookup for morphologically rich languages.

7.-Subword models require fewer parameters to represent a language compared to word-level models. Benefits low-resource settings.

8.-Character models generate plausible embeddings for nonce words, demonstrating generalization ability.

9.-Finite-state transducers can analyze words into morphemes, but have ambiguity when operating on types vs tokens.

10.-Open-vocabulary language models aim to model all possible strings, not a fixed vocabulary. Useful for morphologically rich languages.

11.-Incorporating FST-based morphological knowledge into neural LMs improves perplexity. Shows benefit of explicit linguistic structure.

12.-Summary so far: Character/subword models help for morphology and out-of-vocabulary issues. Explicit structure provides further gains.

13.-Hierarchical structure of language is uncontroversial, though exact details are debated. Supported by phenomena like NPI licensing.

14.-NPIs like "anybody" must follow a negation like "not" in a precise structural configuration, not just linearly.

15.-Cross-linguistic evidence supports hierarchical generalizations based on perceived groupings, not linear order. Unbiased learner could acquire either.

16.-Recurrent Neural Network Grammars (RNNGs) aim to capture hierarchical structure with minimal extensions to RNNs.

17.-RNNGs generate both terminals (words) and nonterminal symbols indicating phrasal groupings. Nonterminals trigger composition operations.

18.-Composition involves popping nonterminal and its child constituents, composing their embeddings, and pushing result as single constituent.

19.-Syntactic composition in RNNGs captures linguistic notion of headedness using bidirectional RNNs over children.

20.-RNNGs avoid marginalization over trees required by symbolic grammars. Importance sampling enables inference.

21.-Generative RNNGs outperform discriminative models for constituency parsing, possibly due to better match with generative nature of underlying syntax.

22.-RNNGs are also strong language models, outperforming LSTM baselines. Single model serves as both parser and LM.

23.-Character/subword models and explicit structure represent two approaches to imbuing neural models with linguistic knowledge.

24.-Results suggest linguistic structure, especially hierarchy, benefits neural models for language processing.

25.-Guiding hypothesis: Designing models around key linguistic principles leads to better language technologies compared to ignoring linguistic structure.

Knowledge Vault built byDavid Vivancos 2024