graph LR classDef tasks fill:#f9d4d4, font-weight:bold, font-size:14px; classDef models fill:#d4f9d4, font-weight:bold, font-size:14px; classDef testing fill:#d4d4f9, font-weight:bold, font-size:14px; classDef future fill:#f9f9d4, font-weight:bold, font-size:14px; classDef community fill:#f9d4f9, font-weight:bold, font-size:14px; A[Antoine Bordes
ICLR 2015] --> B[AI explores artificial
tasks for intelligence 1] B --> C[Artificial tasks control
complexity and reasoning 5] C --> D[Toy problems historically
advanced machine learning 6] D --> E[Well-known AI examples:
Winograd, Hinton 7] B --> F[Facebook: artificial games
generate stories, questions 9] F --> G[Simulations test models'
world state tracking 10] F --> H[20 tasks developed
to test skills 11] H --> I[Task complexity increased
with supporting facts 13] H --> J[Tasks: order, yes/no,
position, tracking, counting 14] A --> K[AI breakthroughs rely
on deep models 2] K --> L[Examples show models'
semantic limitations 3] K --> M[More data can't solve
AI generalization 4] A --> N[Models: LSTMs, memory
networks, SVMs 15] A --> O[Key question: artificial task
transfer capability 16] O --> P[Artificial tasks build
understanding before scaling 17] O --> Q[Goal: improve general systems,
not overfitting 18] A --> R[Simulations tune
linguistic complexity 19] R --> S[Avoid linguistic complexity
becoming confound 20] A --> T[Design tasks that
break models repeat 21] T --> U[Environments enable control
gaming drives RL 22] A --> V[Controlled training and
testing make advances 23] A --> W[Tasks are starting
proposal feedback welcome 24] W --> X[Debates on tasks
testing claimed aspects 25] W --> Y[Post-hoc analysis of
model learning crucial 26] A --> Z[Community openness to
task improvement feedback 27] A --> AA[Test generalization with
different data distributions 28] A --> AB[Annotators can incorporate
realistic language 29] A --> AC[Task-model virtuous cycle
key to progress 30] class B,C,D,E,F,G,H,I,J tasks; class K,L,M,N models; class O,P,Q,R,S,T,U,V,W,X,Y testing; class AA,AB,AC future; class Z community;


1.-Facebook AI is exploring using artificial tasks to build intelligent machines and assess reasoning capabilities of models.

2.-Recent AI breakthroughs rely on deep models with high capacity and lots of labeled data, but reasoning may still be limited.

3.-Examples show deep models don't truly understand semantics and can be fooled, fail on unusual images, make translation errors.

4.-More data alone likely can't solve AI; models may not generalize well to all possible real-world situations.

5.-Artificial tasks allow controlling complexity, required reasoning, and interpretability of results to analyze model capabilities and limitations.

6.-Historically, artificial toy problems have driven advances in machine learning, e.g. for clustering, neural nets, datasets like UCI.

7.-Well-known AI examples: Winograd's 1970s block world for question answering, Hinton's 1986 family tree reasoning - useful but limited.

8.-New evaluation platforms being developed to assess intelligent systems, like Allen AI's project, but don't fully control training.

9.-Facebook's approach: artificial game environments to generate stories, questions and answers; control task difficulty to probe model reasoning.

10.-Simulation outputs sentences describing actions; questions test if models track world state, with answers restricted to yes/no for easy evaluation.

11.-20 tasks developed so far to test different skills; goal is single model that can solve them all, not individual specialized models.

12.-Supporting facts provided with answers to see if models are leveraging the relevant information or just pattern matching.

13.-Task complexity can be increased, e.g. from 1 to 2 to 3 supporting facts required to answer a question.

14.-Other tasks: test word order sensitivity, answering yes/no questions, positional reasoning, object tracking, counting, comparisons, external knowledge.

15.-Models tested include weakly supervised LSTMs, memory networks with strong supervision, and feature-rich SVMs; results summarized in performance dashboard.

16.-Key question: how well can capabilities trained on artificial tasks transfer to real-world language understanding problems?

17.-Artificial tasks are important to build understanding of methods, even if not perfectly realistic; a prerequisite before scaling up.

18.-Models shouldn't be overly tailored just for the tasks; goal is to incrementally improve general systems that can handle new tasks.

19.-Simulations allow "knobs" to tune linguistic complexity: removing language, testing memory requirements, adding coreference, varying time expressions.

20.-Need to be careful that linguistic complexity doesn't become a confound; symbolic versions of tasks help factor that out.

21.-Virtuous cycle envisioned: design tasks that break models, improve models to solve tasks, repeat. But requires careful experiment design.

22.-Artificial environments enable this controlled development, but there are other possible approaches too; gaming environments have also driven RL progress.

23.-Powerful models and big data are valuable, but more controlled training and testing is also needed to make fundamental advances.

24.-The particular tasks shown are a starting proposal, but others may design even better versions; feedback and discussion is welcome.

25.-There are open debates about whether the current tasks are testing what they claim, or if models are learning the right things.

26.-Post-hoc analysis of what models have learned is important; the simplistic tasks so far are only a foundation to build on.

27.-It's crucial that the community is open to feedback on improving the tasks to better achieve the ambitious goals.

28.-A key future direction is testing generalization by training and testing on different distributions over the simulated data.

29.-More realistic language can be incorporated by having annotators interpret the symbolic simulation outputs as natural English.

30.-The paradigm of a virtuous cycle between tasks that break models and models that solve tasks is key to driving progress.

