Knowledge Vault 2/6 - ICLR 2014-2023
Arto Klami; Guillaume Bouchard; Abhishek Tripathi ICLR 2014 - Group-sparse Embeddings in Collective Matrix Factorization
<Resume Image >

Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:

graph LR classDef factorization fill:#f9d4d4, font-weight:bold, font-size:14px; classDef data fill:#d4f9d4, font-weight:bold, font-size:14px; classDef methods fill:#d4d4f9, font-weight:bold, font-size:14px; classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px; A[Main] --> B[Data-check technology, Markov
logic-based approaches 1] A --> C[Large biological datasets
across research centers 2] A --> E[Collective matrix factorization
focus, not Markov 4] A --> I[Examples: websites, items,
users, databases 8] A --> R[Alternating optimization,
non-Gaussian data 17] A --> W[Variational Bayesian learning:
no tuning 22] C --> D[Answer questions: genes,
diseases, environment 3] E --> F[Single relation matrix 5] E --> G[Multi-view learning:
concatenated matrices 6] E --> H[Circular relationships
represented 7] E --> L[Factorization: decomposition of
symmetric missing data 11] E --> M[Schemas as low-rank
symmetric matrix 12] E --> N[Fixed-rank matrices
undesirable 13] E --> O[Group sparse embeddings:
Gaussian priors 14] E --> Q[Compression: low-rank
matrix representation 16] E --> V[Flexible, generic,
augmented multi-view setup 21] I --> J[Matrix values: latent
representations, bias 9] I --> AA[Compositional data:
feature matrix concatenation 26] I --> AB[Missing relations as
zeros impact orthogonality 27] I --> AC[Large symmetric matrices:
non-existent relation blocks 28] O --> P[Bayesian relevance
determination prunes dimensions 15] R --> K[Standard model: math
recurrence, improvements 10] R --> S[Multi-view gene expression
benefits from group-sparsity 18] R --> T[Face images: pixel
proximity helps 19] R --> U[Simulations: binary data,
variational Bayes advantages 20] W --> X[Ongoing work: querying,
reasoning, convex Bayes 23] W --> Y[Privacy-preserving learning
via shared embeddings 24] W --> Z[R package available 25] AC --> AD[Treat blocks as
zeros/missing data 29] class E,F,G,H,L,M,N,O,P,Q,V factorization; class B,C,I,AA,AB,AC,AD data; class D,J,K,R,S,T,U,W methods; class Y,Z applications;

Resume:

1.-Working with Artur Klein on data-check technology, applying Markov logic-based approaches to relational databases.

2.-Dealing with large biological datasets distributed across research centers, with tables of genes, individuals, and gene expressions.

3.-Goal is to answer questions about links between genes and diseases, environmental impacts, etc. using a global model.

4.-Focus is on collective matrix factorization rather than Markov logic networks or tensor factorization.

5.-Collective matrix factorization works on a single relation matrix with row and column entity types.

6.-Multi-view learning concatenates several matrices, with m views having m+1 entity types.

7.-Collective matrix factorization allows "circular relationships" that can't be represented by simple matrix concatenation.

8.-Examples are websites linked to items sold to users, or general relational databases with arbitrary schemas.

9.-Mathematically, matrix values are expressed using latent representations with a bias term. Various noise distributions are possible.

10.-Standard model is learned using math recurrence, but improvements are possible for large-scale applications.

11.-Collective matrix factorization can be interpreted as decomposition of a symmetric matrix with missing data.

12.-General database schemas can be represented as a low-rank symmetric matrix to be factorized.

13.-However, fixing the rank of each view's matrix is undesirable. Views may have differing complexities.

14.-Group sparse embeddings are introduced, with Gaussian priors grouped by entity type, allowing view-specific ranks.

15.-Bayesian automatic relevance determination prunes irrelevant dimensions, leading to exact zeros in the embeddings.

16.-Enables compression by representing each matrix in its relevant low-rank form.

17.-Uses alternating closed-form optimization rather than SGD. Special handling for non-Gaussian data.

18.-Experiment on multi-view gene expression data shows benefit of group-sparsity and collective factorization over alternatives.

19.-Face image experiment demonstrates utility of incorporating pixel proximity information when data is limited.

20.-Simulations confirm advantages of handling binary data properly and using variational Bayes over maximum a posteriori.

21.-Collective matrix factorization is flexible and generic. Many datasets have an augmented multi-view setup.

22.-Variational Bayesian learning works well with no tuning parameters, unlike optimization-based methods.

23.-Ongoing work includes querying the model, approximate reasoning, convex Bayesian approaches, and handling missing links.

24.-Exciting potential application to privacy-preserving learning by sharing embeddings rather than raw data.

25.-R package available to use the method.

26.-Compositional data like images can be handled by creating a feature matrix and concatenating it.

27.-Open question about treating missing relations as zeros and the impact on embedding orthogonality.

28.-Large symmetric matrices representing the full database can have many blocks corresponding to non-existent relations.

29.-Presenter is interested in experimenting with the impact of treating these as zeros or missing data.

Knowledge Vault built byDavid Vivancos 2024