Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- Generalization on the Unseen (GOTU): A strong case of out-of-distribution generalization where part of the distribution domain is unseen during training.
2.- Boolean functions: Functions mapping binary inputs to real outputs, representing discrete and combinatorial tasks like arithmetic or logic.
3.- Min-degree interpolator: An interpolating function with the minimal degree-profile, favoring lower-degree monomials in its Fourier-Walsh expansion.
4.- Degree-profile: A vector representing the energy distribution across different degrees in a function's Fourier-Walsh expansion.
5.- Random features model: A neural network approximation using random projections followed by a nonlinear activation function.
6.- Diagonal linear neural network: A deep neural network with only diagonal weight matrices and a single bias term.
7.- Transformer: A neural network architecture using self-attention mechanisms, commonly used in natural language processing and computer vision.
8.- Mean-field neural network: A two-layer neural network in the mean-field parametrization, studying the limit of infinite width.
9.- Fourier-Walsh transform: A decomposition of Boolean functions into a linear combination of monomials (products of input variables).
10.- Implicit bias: The tendency of learning algorithms to favor certain solutions over others, even without explicit regularization.
11.- Length generalization: The ability of models to generalize to input lengths beyond what was seen during training.
12.- Curriculum learning: A training strategy that gradually increases the complexity of training samples.
13.- Degree-Curriculum algorithm: A curriculum learning approach that incrementally increases the Hamming weight of training samples.
14.- Leaky min-degree bias: When models learn solutions that mostly follow the min-degree bias but retain some higher-degree terms.
15.- Vanishing ideals: A set of polynomials that are zero on a given set of points, used to characterize unseen domains.
16.- Strongly expressive activation: A property of activation functions that allows for effective representation of low-degree monomials.
17.- Boolean influence: A measure of the importance of a variable in a Boolean function.
18.- Spectral bias: The tendency of neural networks to learn lower-frequency components faster in continuous settings.
19.- Parity function: A Boolean function that outputs the product of its input bits.
20.- Majority function: A Boolean function that outputs 1 if more than half of its inputs are 1, and 0 otherwise.
21.- Hamming weight: The number of non-zero elements in a binary vector.
22.- Stochastic gradient descent (SGD): An optimization algorithm that updates parameters using estimated gradients from random subsets of data.
23.- Adam optimizer: An adaptive learning rate optimization algorithm commonly used in deep learning.
24.- Interpolating solution: A function that exactly matches the training data.
25.- Out-of-distribution generalization: The ability of models to perform well on data from a different distribution than the training data.
26.- Invariance: When a function's output remains unchanged under certain transformations of its input.
27.- Equivariance: When a function's output transforms predictably under certain transformations of its input.
28.- Sparse Boolean functions: Boolean functions that depend on only a small subset of their input variables.
29.- Neural tangent kernel (NTK): A kernel that describes the behavior of wide neural networks during training.
30.- Polynomial activation functions: Activation functions in neural networks that are polynomial functions of their input.
Knowledge Vault built byDavid Vivancos 2024