Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:
Resume:
1.- CRISPR is a bacterial immune system that captures viral DNA sequences and uses them to target and cut matching viral DNA.
2.- CRISPR arrays expand over time as bacteria acquire new viral DNA sequences, forming a memory of past infections.
3.- CRISPR arrays are transcribed into RNA, which combines with Cas proteins to search for and cut matching DNA sequences.
4.- Jennifer Doudna and Emmanuelle Charpentier showed that the Cas9 protein uses RNA guides to unwind and cut targeted DNA.
5.- They simplified the system to a single guide RNA, allowing Cas9 to be programmed to cut any desired DNA sequence.
6.- Cutting DNA at specific sites can induce repair, allowing precise changes or insertion of new genetic information into genomes.
7.- CRISPR has enabled rapid development of new therapies, such as a one-time treatment for sickle cell disease.
8.- Efforts are underway to reduce the cost and expand access to CRISPR-based therapies, which are currently very expensive.
9.- CRISPR has many potential applications beyond healthcare, including in addressing climate change challenges.
10.- Biological data are often limiting compared to data sets in other fields, posing challenges for machine learning applications.
11.- The Protein Data Bank (PDB) is a prime example of a highly curated, high-quality biological data set.
12.- The PDB has grown from 7 to over 200,000 structures since 1971, mostly from X-ray crystallography.
13.- Structure quality in the PDB is assessed using R-free values, which measure how well models match experimental data.
14.- Introduction of R-free greatly improved the quality of structures in the PDB by reducing overfitting of data.
15.- Machine learning models like AlphaFold2 rely on high-quality data like the PDB to accurately predict protein structures.
16.- Predicting protein function remains challenging, as similar structures can have different functions and annotations are often incomplete or inaccurate.
17.- Even in simple organisms, a large percentage of essential genes have unknown functions that can't be predicted from structure alone.
18.- Ron Boga is developing improved methods for using protein structures to predict function, which he will present at the conference.
19.- Determining what proteins actually do biologically still requires experimental validation, not just structural predictions.
20.- Biological questions that require machine learning include understanding genetic interactions, discovering protein and RNA functions, and predicting RNA structures.
21.- CRISPR can be used to generate large data sets by simultaneously targeting many genes to assess their functions and interactions.
22.- These multiplexed CRISPR screens can be done in cells, tissues, or whole animals to study gene function, drug responses, etc.
23.- Automation allows rapid generation of large CRISPR screening data sets, but library sizes are still relatively small.
24.- Machine learning could help answer questions like why some people with disease-related mutations develop the disease while others don't.
25.- Developing machine learning infrastructure for biology should consider lessons learned from successful data resources like the PDB.
26.- Key challenges include curating data from different sources, assessing data quality, and combining data sets in a meaningful way.
27.- Many CRISPR screening data sets are already publicly available, but lack standardized quality metrics akin to crystallographic R-free values.
28.- Efforts are underway to generate larger, more standardized CRISPR screening data sets that could enable more powerful machine learning analyses.
29.- Careful design of guide RNAs is critical for ensuring precise targeting and minimizing off-target effects in CRISPR-based therapies and screens.
30.- Given CRISPR's power and potential for unintended consequences, responsible development and use of the technology is an active area of discussion.
Knowledge Vault built byDavid Vivancos 2024