Knowledge Vault 2/93 - ICLR 2014-2023
Dilek Hakkani-Tur ICLR 2023 - Invited Talk - Dialogue Research in the Era of LLMs
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef NLP fill:#f9d4d4, font-weight:bold, font-size:14px; classDef challenges fill:#d4f9d4, font-weight:bold, font-size:14px; classDef solutions fill:#d4d4f9, font-weight:bold, font-size:14px; classDef evaluation fill:#f9f9d4, font-weight:bold, font-size:14px; classDef future fill:#f9d4f9, font-weight:bold, font-size:14px; A[Dilek Hakkani-Tur
ICLR 2023] --> B[Large language models
generate natural responses 1] A --> C[Challenges remain in
conversational machines 2] C --> D[Historical focus: task-oriented,
open-domain chatbots 3] C --> E[Language models hallucinate,
produce inaccurate responses 4] C --> F[Lack of boundaries: knowledge-seeking,
chatting, task-oriented 5] A --> G[Synthetic data generation
with language models 6] A --> H[Avoid unsafe, abusive,
unethical responses 7] H --> I[Control interactions, take initiative,
pursue agenda 8] A --> J[Automated evaluation of
dialogue remains difficult 9] C --> K[Integrate diverse knowledge sources 10] A --> L[Tools like APIs help
models respond accurately 11] A --> M[Response safety to prevent
unfair, biased content 12] H --> N[Initiative, agenda-based dialogues
require more data 13] J --> O[Standardized protocols, agreement
between communities needed 14] A --> P[Personalization to learn
user preferences 15] A --> Q[Speech poses challenges: disfluencies,
lack of punctuation 16] A --> R[Visual information key
for many applications 17] A --> S[Ingesting context helps
interpretation, coherence 18] C --> T[Language models not
yet dialogue models 19] A --> U[Prompting may benefit
data generation, control 20] A --> V[Symbolic AI with abstraction,
LLMs as meta-programming 21] J --> W[AI may surpass humans
on summarization 22] A --> X[Converting queries to
knowledge graph queries 23] I --> Y[Controlling persuasive dialogues
safely requires data 24] A --> Z[Verifying quality of
prompts and responses 25] A --> AA[Active learning with
human-in-the-loop promising 26] C --> AB[Lack of open-domain
spoken conversation data 27] Q --> AC[Revival of prosody, disfluency,
punctuation prediction research 28] Q --> AD[Integration of neural outputs
with linguistic knowledge 29] C --> AE[Revisit classic dialogue challenges
with modern tools 30] class B,E,F,K,T,AB,AE challenges; class D,G,I,L,M,N,P,R,S,U,V,X,Y,Z,AA solutions; class H,Q NLP; class J,O,W evaluation; class C,AC,AD future;

Resume:

1.-Recent large language models generate natural responses and are being improved, along with cheaper compute and public conversational datasets.

2.-Challenges remain in reaching ultimate conversational machines, despite progress. The talk discusses challenges based on interviews with dialogue researchers.

3.-Historically, research focused on task-oriented and open-domain chatbots. Recent approaches combine knowledge integration and end-to-end methods.

4.-Language models hallucinate, producing inaccurate responses. Knowledge grounding to textual resources during generation can help but has challenges.

5.-Lack of clear boundaries between knowledge-seeking, chatting and task-oriented turns. More work needed on transitions between them.

6.-Generating synthetic conversational data with large language models is promising to augment limited human-annotated datasets.

7.-Unsafe, abusive, unethical responses must be avoided. Progress made with human feedback, reinforcement learning, but more work needed.

8.-Dialogue systems should control interactions, take initiative, and pursue an agenda. Complex developer policies are challenging to enforce.

9.-Automated evaluation of dialogue response generation remains difficult. Human evaluation is recommended but can be expensive and subjective.

10.-Integrating diverse knowledge sources, including structured and unstructured, static and dynamic information, is an open challenge.

11.-Tools like APIs, calculators, translators help models accurately respond. Tool integration can be done in pre-training, fine-tuning or prompting.

12.-Response safety to prevent unfair, unethical, biased content is critical. Filtering, human feedback, learning to rewrite are promising approaches.

13.-Initiative and agenda-based dialogues, beyond just user-driven interactions, are important but require more annotated data and research.

14.-Evaluation through human judgments or automatic metrics remains challenging. Standardized protocols and more agreement between communities is needed.

15.-Personalization to learn user preferences over time through past interactions and user teaching is a broad and important topic.

16.-Speech-based interactions enable new applications but pose challenges due to disfluencies, lack of punctuation, and different noise types.

17.-Visual information from user video, shared content, situational context is key for many applications. Vision-language models show promise.

18.-Ingesting context - conversational history, previous sessions, ambient signals, world events - can help interpretation and coherence but is complex.

19.-Language models are not yet equivalent to dialogue models. Recent progress is exciting but challenges remain to inspire new ideas.

20.-Prompting language models is an empirical approach that may benefit data generation, control, but less clear for speech/vision integration.

21.-Symbolic AI approaches with abstraction, association, treating LLMs as meta-programming platforms is another research direction warranting more investigation.

22.-AI may surpass humans on summarization but lag on empathy. Comparisons depend on the task. More comprehensive evaluation than Turing test needed.

23.-Converting user queries to knowledge graph queries can leverage structured knowledge bases. Data-to-text generation is also important but under-researched.

24.-Controlling persuasive dialogues to take initiative and pursue an agenda safely requires more annotated data or unsupervised learning approaches.

25.-Verifying quality of generated prompts and responses can use top sampling, rejection, or automated metrics, but human screening may still be needed.

26.-Active learning with human-in-the-loop is promising to reduce data needs but can be hard to scale compared to reinforcement learning from feedback.

27.-Lack of open-domain spoken conversation data is a major challenge. Collecting and releasing speech datasets has practical difficulties.

28.-Beyond data, revival of prosody, disfluency, punctuation prediction research from speech signals could help robustness of recent open-domain chatbots.

29.-Integration of neural outputs with syntactic rules and linguistic knowledge to fix speech recognition errors is an open question. Data may still be key.

30.-In the new era of open-domain chatbots, revisiting classic dialogue challenges with modern tools can reveal problems and new solutions.

Knowledge Vault built byDavid Vivancos 2024