The End Of Knowledge - Vault 2 - ICLR (2014-2023)

Knowledge Vault 2/59 - ICLR 2014-2023

Yikang Shen · Shawn Tan · Alessandro Sordoni · Aaron Courville ICLR 2019 - Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef on fill:#f9d4d4, font-weight:bold, font-size:14px; classDef results fill:#d4f9d4, font-weight:bold, font-size:14px; classDef tech fill:#d4d4f9, font-weight:bold, font-size:14px; classDef benefits fill:#f9f9d4, font-weight:bold, font-size:14px; A[Yikang Shen et al
ICLR 2019] --> B[ON: inductive bias for
hierarchical RNNs 1] A --> C[ON-LSTM: strong language modeling,
parsing results 4] B --> D[ON: high-ranking neurons
update less frequently 2] B --> E[cumax activation enables
ON inductive bias 3] C --> F[ON-LSTM induces meaningful
tree structures 5] B --> G[ON: neurons allocated to
long-short-term info 6] G --> H[ON-LSTM generalizes better
to longer sequences 7] B --> I[ON induces parse-trees,
hierarchical patterns 8] E --> J[cumax: soft, differentiable
version of mask 9] B --> K[ON: sequential and
hierarchical representations 10] class A,B,D,E,G,I,K on; class C,F results; class H,J tech; class K benefits;

Resume:

1.-The paper proposes Ordered Neurons (ON), an inductive bias for recurrent neural networks to model hierarchical structure in sequential data.

2.-ON enforces an order to the update frequency of neurons, with high-ranking neurons updated less frequently to represent long-term information.

3.-The cumax activation function is introduced which enables the ON inductive bias by controlling how much each neuron is updated.

4.-ON-LSTM, an LSTM variant implementing the ON idea, achieves strong results on language modeling, unsupervised parsing, syntactic evaluation and logical inference.

5.-Results suggest ON-LSTM induces linguistically meaningful tree structures from raw text data, capturing syntax better than previous unsupervised approaches.

6.-ON enables RNNs to separately allocate hidden neurons to short-term and long-term information, improving performance on tasks requiring long-distance dependencies.

7.-Experiments show ON-LSTM generalizes better to longer sequences than standard LSTMs, enabled by the hierarchical separation of long and short-term information.

8.-The inductive bias of ON allows RNNs to implicitly induce parse-tree like structures and model non-sequential hierarchical patterns in sequences.

9.-The cumax activation can be seen as a soft, differentiable version of a binary mask controlling update frequency of chunks of neurons.

10.-ON provides a novel way for RNNs to learn both sequential and hierarchical representations, combining the strengths of RNNs and tree-structured models.

Knowledge Vault built byDavid Vivancos 2024