Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Supervised learning is the bread and butter of machine learning, but ignores valuable interaction data.
2.- Interactive machine learning, including contextual bandits, can leverage interaction data to improve models.
3.- In interactive learning, the algorithm learns from features, actions, and rewards in a continuous loop.
4.- Full reinforcement learning requires special domains and large sample sizes. Active learning has the wrong signal problem.
5.- Contextual bandits provide the right reward signal, handle non-stationarity, and act as economically viable AI agents.
6.- Contextual bandits are a good fit for many real-world problems like recommendations, ads, education, music, robotics and wellness.
7.- The tutorial covers algorithms, theory, evaluation, learning, exploration, practical issues, systems, and experiences.
8.- In contextual bandits, features are observed, an action is chosen, and a reward is received, with the goal of maximizing reward.
9.- Policies map features to actions. Exploration, usually randomized, is critical to gather needed information.
10.- Offline policy evaluation is possible using techniques like inverse propensity scoring, enabling rapid testing of new policies.
11.- Offline learning from exploration data is feasible by reducing the problem to importance weighted multi-class classification.
12.- Exploration algorithms like epsilon-greedy, Thompson sampling, and EXP4 balance exploration and exploitation.
13.- Progressive validation enables unbiased offline evaluation of learning algorithms on streaming data.
14.- Rejection sampling allows offline evaluation of exploration algorithms, considering the full interaction data loop.
15.- Failure modes in practice include mismatched action probabilities, non-stationary features, and delayed or unobserved rewards.
16.- Learning systems rather than just algorithms are needed, with modular, scalable designs, generality, and offline reproducibility.
17.- The Decision Service is an open-source and managed contextual bandit system that addresses many practical issues by design.
18.- Other recent contextual bandit systems include NEXT and StreamingBandit, with some differences in capabilities.
19.- Non-stationarity is a key issue in practice, requiring time-based and ensemble techniques beyond standard theory.
20.- Combinatorial action spaces like rankings require special approaches based on semi-bandits, submodularity, or cascading models.
21.- Reward function specification is critical and complex, often requiring mapping long-term goals to good short-term proxies.
22.- Smart reward encoding, like using infrequent nonzero rewards, can greatly reduce variance and improve data efficiency.
23.- Despite gaps between theory and practice, workable recipes exist for common scenarios in framing contextual bandit problems.
24.- A Complex.com case study demonstrates how contextual bandit approaches can provide substantial real-world benefits.
25.- Offline progressive validation enables rapid evaluation of new models, features, and exploration algorithms on real data.
26.- Contextual bandit techniques have matured to be fit for broad consumption, providing gains over supervised learning with less complexity than RL.
27.- More research is needed on automatic/parameter-free algorithms and expanding the tractable subset of RL problems.
28.- For practitioners, contextual bandits are becoming more reliable, robust and usable for real applications.
29.- An example application is using contextual bandits to personalize EEG-based typing for disabled individuals.
30.- The research has benefited from many collaborators, with slides and references available on hunch.net.
Knowledge Vault built byDavid Vivancos 2024