Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Gemini Adv | Llama 3:
Resume:
1.-Working with Artur Klein on data-check technology, applying Markov logic-based approaches to relational databases.
2.-Dealing with large biological datasets distributed across research centers, with tables of genes, individuals, and gene expressions.
3.-Goal is to answer questions about links between genes and diseases, environmental impacts, etc. using a global model.
4.-Focus is on collective matrix factorization rather than Markov logic networks or tensor factorization.
5.-Collective matrix factorization works on a single relation matrix with row and column entity types.
6.-Multi-view learning concatenates several matrices, with m views having m+1 entity types.
7.-Collective matrix factorization allows "circular relationships" that can't be represented by simple matrix concatenation.
8.-Examples are websites linked to items sold to users, or general relational databases with arbitrary schemas.
9.-Mathematically, matrix values are expressed using latent representations with a bias term. Various noise distributions are possible.
10.-Standard model is learned using math recurrence, but improvements are possible for large-scale applications.
11.-Collective matrix factorization can be interpreted as decomposition of a symmetric matrix with missing data.
12.-General database schemas can be represented as a low-rank symmetric matrix to be factorized.
13.-However, fixing the rank of each view's matrix is undesirable. Views may have differing complexities.
14.-Group sparse embeddings are introduced, with Gaussian priors grouped by entity type, allowing view-specific ranks.
15.-Bayesian automatic relevance determination prunes irrelevant dimensions, leading to exact zeros in the embeddings.
16.-Enables compression by representing each matrix in its relevant low-rank form.
17.-Uses alternating closed-form optimization rather than SGD. Special handling for non-Gaussian data.
18.-Experiment on multi-view gene expression data shows benefit of group-sparsity and collective factorization over alternatives.
19.-Face image experiment demonstrates utility of incorporating pixel proximity information when data is limited.
20.-Simulations confirm advantages of handling binary data properly and using variational Bayes over maximum a posteriori.
21.-Collective matrix factorization is flexible and generic. Many datasets have an augmented multi-view setup.
22.-Variational Bayesian learning works well with no tuning parameters, unlike optimization-based methods.
23.-Ongoing work includes querying the model, approximate reasoning, convex Bayesian approaches, and handling missing links.
24.-Exciting potential application to privacy-preserving learning by sharing embeddings rather than raw data.
25.-R package available to use the method.
26.-Compositional data like images can be handled by creating a feature matrix and concatenating it.
27.-Open question about treating missing relations as zeros and the impact on embedding orthogonality.
28.-Large symmetric matrices representing the full database can have many blocks corresponding to non-existent relations.
29.-Presenter is interested in experimenting with the impact of treating these as zeros or missing data.
Knowledge Vault built byDavid Vivancos 2024