Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Importance of modeling feedback details in preference learning systems
2.- Robots maximize objectives controlled by humans, leading to goal misalignment
3.- Context matters when designing AI reward functions
4.- Inverse reward design uses Bayesian inference for better generalization
5.- Incompleteness is fundamental uncertainty in goal specification
6.- Automatic guardrails for inverse prompt engineering prevents misuse
7.- Travel assistant chatbot example demonstrates filtering jailbreak attempts
8.- Self-attack evaluations show increased robustness through preference inference
9.- Missing features affect robot obedience and decision-making
10.- Overconfidence occurs when robots have restricted worldview features
11.- Proxy rewards with fewer features lead to utility misalignment
12.- RLHF attempts to capture subjective preferences through data
13.- Hidden context affects preference data collection
14.- Board count voting mechanism underlies RLHF preference aggregation
15.- Distributional preference learning helps manage uncertainty
16.- Jailbreak robustness improves with uncertainty modeling
17.- Learning affects human preferences over time
18.- Robot assistance must account for human learning process
19.- Win-stay-lose-shift strategy reveals previous reward information
20.- Mutual information bounds team performance in learning
21.- Information-dense preference communication increases brittleness
22.- Pedagogical approaches are more sensitive to errors
23.- Teaching feedback varies based on expected time horizon
24.- Uncertainty about horizons can match known-horizon performance
25.- Information density correlates with model sensitivity
26.- Uncertainty-aware preference learning improves alignment robustness
27.- Unmodeled context requires ongoing management
28.- Information-revealing policies trade optimal performance for robustness
29.- Social choice theory connects to AI alignment
30.- Preference aggregation methods may build in risk aversion
Knowledge Vault built byDavid Vivancos 2024