Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:
Resume:
1.-Learning rules adjust synaptic weights based on local variables available to each synapse in a physical neural system.
2.-A systematic framework for studying local learning rules defines the local variables and the functional form combining them.
3.-Polynomial local learning rules are analyzed in linear and non-linear networks to understand their behavior and capabilities.
4.-The framework enables discovery of new learning rules and reveals connections between learning rules and group symmetries.
5.-Deep local learning by stacking local rules in feedforward networks can learn representations but not complex input-output functions.
6.-Learning complex input-output functions requires local deep learning where target information is propagated to deep layers.
7.-How target information is propagated to deep layers partitions the space of possible learning algorithms.
8.-The capacity of a learning algorithm's feedback channel is defined as bits about the gradient per weight divided by operations per weight.
9.-Calculations show backpropagation outperforms alternatives, achieving the maximum possible feedback channel capacity.
10.-The theory clarifies the concept of Hebbian learning, what it can learn, and the sparsity of learning rules discovered so far.
11.-Hebbian learning should be replaced with a clear definition of local variables and the functional form combining them.
12.-In linear networks, the expectation of weight changes depends only on first and second moments of the data.
13.-When the learning recurrence is linear in the weights, it can be solved exactly in linear networks.
14.-In non-linear networks, expectations of activity-dependent terms can be estimated using a dropout approximation and Taylor expansions.
15.-Many local rules lead to divergent weights in linear networks, with some exceptions like gradient descent on a convex objective.
16.-Local learning in a single linear threshold unit is limited to learning linearly separable functions.
17.-In deep feedforward networks, deep local learning cannot produce weights that are critical points of the error function.
18.-For deep networks to learn complex functions, target information must be fed back to influence the deep weights.
19.-In an optimal system, deep weights must depend on both the inputs and targets/outputs of the system.
20.-Physical implementations of optimal deep learning require a feedback channel to send target information to deep weights.
21.-Feedback to deep weights can either use forward connections in reverse or a separate set of backward connections.
22.-Feedback channel capacity calculations show backpropagation is optimal, achieving the highest possible capacity.
23.-An open question is whether biological neural systems have discovered some form of stochastic gradient descent during evolution.
24.-The simple Hebb rule is the only isometry-invariant learning rule for Hopfield networks.
25.-The gradient descent learning rule is the same for binary units with logistic or tanh activation functions.
26.-Many new convergent learning rules can be derived by adding decay terms to Hebb rules or bounding weights.
27.-Sampling-based deep targets algorithms can train non-differentiable networks reasonably well.
28.-These algorithms sample activations to generate targets that optimize a layer while holding the rest of the network fixed.
29.-Sampling multiple perturbations provides more gradient information at additional computational cost.
30.-Backpropagation is optimal in terms of bits transmitted and improvement in the error function per operation.
Knowledge Vault built byDavid Vivancos 2024