Knowledge Vault 6 /9 - ICML 2015
Natural Language Understanding: Foundations and State-of-the-Art
Percy Liang
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d4d4, font-weight:bold, font-size:14px classDef intro fill:#d4f9d4, font-weight:bold, font-size:14px classDef linguistics fill:#d4d4f9, font-weight:bold, font-size:14px classDef distributional fill:#f9f9d4, font-weight:bold, font-size:14px classDef models fill:#f9d4f9, font-weight:bold, font-size:14px Main[Natural Language Understanding:
Foundations and State-of-the-Art] Main --> A[Introduction to NLU] A --> A1[NLU: Defining, relating to
Turing test 1] A --> A2[IBM Watson: Jeopardy winner,
NLU applications 2] A --> A3[NLU techniques: Vectors, trees,
frames, forms 3] A --> A4[NLP/ML idea transfer: HMMs,
CRFs, LSTMs 4] A --> A5[Tutorial goals: NL intuitions,
challenges, opportunities 5] Main --> B[Linguistic Analysis] B --> B1[Linguistic analysis: Syntax, semantics,
pragmatics analogies 6] B --> B2[Syntax: Dependency trees,
parts of speech 7] B --> B3[Semantics: Word meanings, composition,
ambiguity 8] B --> B4[Lexical semantics: Similar words,
distance metric 9] B --> B5[Compositional semantics: World reference,
meaning composition 10] B --> B6[Pragmatics: Context-based conveyance,
background assumptions 11] Main --> C[Language Processing Challenges] C --> C1[Challenges: Vagueness, ambiguity,
uncertainty examples 12] C --> C2[Coreference: Context-dependent pronoun resolution,
Winograd 13] C --> C3[Recap: Syntax, semantics,
pragmatics definitions 14] Main --> D[Distributional Semantics] D --> D1[Distributional semantics: Context reveals
word meaning 15] D --> D2[Recipe: Word-context matrix,
dimensionality reduction 16] D --> D3[LSA: Documents as context,
SVD reduction 17] D --> D4[POS induction: Unsupervised learning,
context, SVD 18] D --> D5[SGNS/Word2vec: Word-context prediction,
dense embeddings 19] Main --> E[Word Vectors and Models] E --> E1[Word vectors: Captures relations,
downstream implications 20] E --> E2[Vector analogies: Differences capture
semantic relations 21] E --> E3[Other models: HMMs, LDA,
neural nets 22] E --> E4[Hearst patterns: Hypernymy-revealing lexical
patterns 23] E --> E5[Distributional summary: Context-based, nuanced,
useful 24] Main --> F[Considerations and Limitations] F --> F1[Context considerations: No unsupervised,
model part 25] F --> F2[Limitations: Parameterized, compression view,
intention 26] F --> F3[Examples: Same/different meanings, distributional
handling 27] class Main main class A,A1,A2,A3,A4,A5 intro class B,B1,B2,B3,B4,B5,B6,C,C1,C2,C3 linguistics class D,D1,D2,D3,D4,D5 distributional class E,E1,E2,E3,E4,E5,F,F1,F2,F3 models

Resume:

1.- Introduction to natural language understanding (NLU): Defining NLU, relating it to the Turing test and AI/intelligence.

2.- IBM Watson and real-world NLU: Watson winning at Jeopardy in 2011, ensemble of techniques. NLU creeping into daily life.

3.- Potential techniques/representations for NLU: Word/phrase vectors, dependency trees, frames, logical forms. Interconnections will be explored.

4.- NLP & machine learning - transfer of ideas: HMMs, CRFs, LDA developed for NLP problems. Sequences/LSTMs driven by machine translation.

5.- Goals of tutorial: Provide NL intuitions, describe state-of-the-art, understand challenges & opportunities. Conceptual, not algorithmic.

6.- Syntax, semantics, pragmatics: Levels of linguistic analysis. Analogies with programming languages.

7.- Syntax - constituency & dependency: Dependency parse trees, parts of speech, relations. Capturing meaning requires more than just syntax.

8.- Semantics - lexical & compositional: Figuring out word meanings and composing them. Word sense ambiguity.

9.- Lexical semantics - synonymy: Semantically similar words. A distance metric more than equivalence classes. Other relations like hyponymy, meronymy.

10.- Compositional semantics - model theory & compositionality: Sentences refer to the world. Meaning of the whole composed from parts. Quantifiers, modals, beliefs.

11.- Pragmatics - conversational implicature, presupposition: What the speaker is trying to convey based on context. Background assumptions.

12.- Language processing challenges - vagueness, ambiguity, uncertainty: Definitions and examples of each. Uncertainty due to lack of knowledge.

13.- Coreference and anaphora: Resolving pronouns like "it" depending on context. The Winograd Schema Challenge.

14.- Recap - syntax, semantics, pragmatics: Definitions, key ideas, and challenges in each.

15.- Distributional semantics motivation: Context reveals a lot about word meaning. Historical linguistic basis.

16.- General recipe for distributional semantics: Form word-context matrix of counts, do dimensionality reduction. Context and dimensionality reduction vary.

17.- Latent Semantic Analysis (LSA): Early distributional semantics method using documents as context and SVD for dimensionality reduction.

18.- Part-of-speech induction: Unsupervised learning of parts-of-speech using surrounding word contexts and SVD.

19.- SGNS/Word2vec model: Predicts word-context pairs using logistic regression, optimizes PMI. Produces dense word embeddings.

20.- Probing word vectors: Captures notions of synonymy, co-hyponymy, even antonymy. Downstream sentiment vs instruction-following implications.

21.- Word vector analogies: Vector differences capture relations like gender, plurality. Explanation via context differences.

22.- Other distributional models: HMMs, LDA, neural nets, recursive neural nets for phrase embeddings.

23.- Hearst patterns: Lexico-syntactic patterns like "X such as Y" that reveal hypernymy. Mining them from large corpora.

24.- Summary of distributional semantics: Context as lossless semantics, general recipe, captures nuanced usage information, useful for downstream tasks.

25.- Context selection considerations: No purely unsupervised learning, context is part of model. Global vs local, patterns, non-textual context.

26.- Limitations of current distributional models: Highly parameterized, compression may not be the right view, intention not fully captured.

27.- Examples to ponder: Sentences with same/different meanings, how distributional methods handle or fail on them.

Knowledge Vault built byDavid Vivancos 2024