Arto Klami; Guillaume Bouchard; Abhishek Tripathi ICLR 2014 - Group-sparse Embeddings in Collective Matrix Factorization

**Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:**

graph LR
classDef factorization fill:#f9d4d4, font-weight:bold, font-size:14px;
classDef data fill:#d4f9d4, font-weight:bold, font-size:14px;
classDef methods fill:#d4d4f9, font-weight:bold, font-size:14px;
classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px;
A[Main] --> B[Data-check technology, Markov

logic-based approaches 1] A --> C[Large biological datasets

across research centers 2] A --> E[Collective matrix factorization

focus, not Markov 4] A --> I[Examples: websites, items,

users, databases 8] A --> R[Alternating optimization,

non-Gaussian data 17] A --> W[Variational Bayesian learning:

no tuning 22] C --> D[Answer questions: genes,

diseases, environment 3] E --> F[Single relation matrix 5] E --> G[Multi-view learning:

concatenated matrices 6] E --> H[Circular relationships

represented 7] E --> L[Factorization: decomposition of

symmetric missing data 11] E --> M[Schemas as low-rank

symmetric matrix 12] E --> N[Fixed-rank matrices

undesirable 13] E --> O[Group sparse embeddings:

Gaussian priors 14] E --> Q[Compression: low-rank

matrix representation 16] E --> V[Flexible, generic,

augmented multi-view setup 21] I --> J[Matrix values: latent

representations, bias 9] I --> AA[Compositional data:

feature matrix concatenation 26] I --> AB[Missing relations as

zeros impact orthogonality 27] I --> AC[Large symmetric matrices:

non-existent relation blocks 28] O --> P[Bayesian relevance

determination prunes dimensions 15] R --> K[Standard model: math

recurrence, improvements 10] R --> S[Multi-view gene expression

benefits from group-sparsity 18] R --> T[Face images: pixel

proximity helps 19] R --> U[Simulations: binary data,

variational Bayes advantages 20] W --> X[Ongoing work: querying,

reasoning, convex Bayes 23] W --> Y[Privacy-preserving learning

via shared embeddings 24] W --> Z[R package available 25] AC --> AD[Treat blocks as

zeros/missing data 29] class E,F,G,H,L,M,N,O,P,Q,V factorization; class B,C,I,AA,AB,AC,AD data; class D,J,K,R,S,T,U,W methods; class Y,Z applications;

logic-based approaches 1] A --> C[Large biological datasets

across research centers 2] A --> E[Collective matrix factorization

focus, not Markov 4] A --> I[Examples: websites, items,

users, databases 8] A --> R[Alternating optimization,

non-Gaussian data 17] A --> W[Variational Bayesian learning:

no tuning 22] C --> D[Answer questions: genes,

diseases, environment 3] E --> F[Single relation matrix 5] E --> G[Multi-view learning:

concatenated matrices 6] E --> H[Circular relationships

represented 7] E --> L[Factorization: decomposition of

symmetric missing data 11] E --> M[Schemas as low-rank

symmetric matrix 12] E --> N[Fixed-rank matrices

undesirable 13] E --> O[Group sparse embeddings:

Gaussian priors 14] E --> Q[Compression: low-rank

matrix representation 16] E --> V[Flexible, generic,

augmented multi-view setup 21] I --> J[Matrix values: latent

representations, bias 9] I --> AA[Compositional data:

feature matrix concatenation 26] I --> AB[Missing relations as

zeros impact orthogonality 27] I --> AC[Large symmetric matrices:

non-existent relation blocks 28] O --> P[Bayesian relevance

determination prunes dimensions 15] R --> K[Standard model: math

recurrence, improvements 10] R --> S[Multi-view gene expression

benefits from group-sparsity 18] R --> T[Face images: pixel

proximity helps 19] R --> U[Simulations: binary data,

variational Bayes advantages 20] W --> X[Ongoing work: querying,

reasoning, convex Bayes 23] W --> Y[Privacy-preserving learning

via shared embeddings 24] W --> Z[R package available 25] AC --> AD[Treat blocks as

zeros/missing data 29] class E,F,G,H,L,M,N,O,P,Q,V factorization; class B,C,I,AA,AB,AC,AD data; class D,J,K,R,S,T,U,W methods; class Y,Z applications;

**Resume: **

**1.-**Working with Artur Klein on data-check technology, applying Markov logic-based approaches to relational databases.

**2.-**Dealing with large biological datasets distributed across research centers, with tables of genes, individuals, and gene expressions.

**3.-**Goal is to answer questions about links between genes and diseases, environmental impacts, etc. using a global model.

**4.-**Focus is on collective matrix factorization rather than Markov logic networks or tensor factorization.

**5.-**Collective matrix factorization works on a single relation matrix with row and column entity types.

**6.-**Multi-view learning concatenates several matrices, with m views having m+1 entity types.

**7.-**Collective matrix factorization allows "circular relationships" that can't be represented by simple matrix concatenation.

**8.-**Examples are websites linked to items sold to users, or general relational databases with arbitrary schemas.

**9.-**Mathematically, matrix values are expressed using latent representations with a bias term. Various noise distributions are possible.

**10.-**Standard model is learned using math recurrence, but improvements are possible for large-scale applications.

**11.-**Collective matrix factorization can be interpreted as decomposition of a symmetric matrix with missing data.

**12.-**General database schemas can be represented as a low-rank symmetric matrix to be factorized.

**13.-**However, fixing the rank of each view's matrix is undesirable. Views may have differing complexities.

**14.-**Group sparse embeddings are introduced, with Gaussian priors grouped by entity type, allowing view-specific ranks.

**15.-**Bayesian automatic relevance determination prunes irrelevant dimensions, leading to exact zeros in the embeddings.

**16.-**Enables compression by representing each matrix in its relevant low-rank form.

**17.-**Uses alternating closed-form optimization rather than SGD. Special handling for non-Gaussian data.

**18.-**Experiment on multi-view gene expression data shows benefit of group-sparsity and collective factorization over alternatives.

**19.-**Face image experiment demonstrates utility of incorporating pixel proximity information when data is limited.

**20.-**Simulations confirm advantages of handling binary data properly and using variational Bayes over maximum a posteriori.

**21.-**Collective matrix factorization is flexible and generic. Many datasets have an augmented multi-view setup.

**22.-**Variational Bayesian learning works well with no tuning parameters, unlike optimization-based methods.

**23.-**Ongoing work includes querying the model, approximate reasoning, convex Bayesian approaches, and handling missing links.

**24.-**Exciting potential application to privacy-preserving learning by sharing embeddings rather than raw data.

**25.-**R package available to use the method.

**26.-**Compositional data like images can be handled by creating a feature matrix and concatenating it.

**27.-**Open question about treating missing relations as zeros and the impact on embedding orthogonality.

**28.-**Large symmetric matrices representing the full database can have many blocks corresponding to non-existent relations.

**29.-**Presenter is interested in experimenting with the impact of treating these as zeros or missing data.

Knowledge Vault built byDavid Vivancos 2024