Knowledge Vault 6 /83 - ICML 2023
The Future of ML in Biology: CRISPR for Health and Climate
Jennifer Doudna
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef crispr fill:#f9d4d4, font-weight:bold, font-size:14px classDef data fill:#d4f9d4, font-weight:bold, font-size:14px classDef challenges fill:#d4d4f9, font-weight:bold, font-size:14px classDef development fill:#f9f9d4, font-weight:bold, font-size:14px A[The Future of
ML in Biology:
CRISPR for Health
and Climate] --> B[CRISPR
Technology] A --> C[Biological
Data] A --> D[Challenges
and
Limitations] A --> E[Development
and
Applications] B --> B1[Bacterial immune system
targeting viral
DNA. 1] B --> B2[CRISPR arrays expand
forming infection
memory. 2] B --> B3[CRISPR transcribed RNA
combines with Cas
proteins. 3] B --> B4[Cas9 protein uses
RNA guides to
cut DNA. 4] B --> B5[Simplified Cas9 system
for programmable
cuts. 5] B --> B6[Cutting DNA allows
precise genetic
changes. 6] C --> C1[High-quality biological
data in Protein
Data Bank. 11] C --> C2[PDB growth from
7 to 200,000
structures. 12] C --> C3[Structure quality
assessed using R-free
values. 13] C --> C4[R-free improves PDB
structure
quality. 14] C --> C5[AlphaFold2 relies
on high-quality
PDB data. 15] C --> C6[Experimental validation
needed for
predictions. 19] D --> D1[Biological data often
limits ML
applications. 10] D --> D2[Predicting protein
function remains
challenging. 16] D --> D3[Essential genes have
unknown
functions. 17] D --> D4[Improved methods
to predict protein
function. 18] D --> D5[Develop ML
infrastructure considering
PDB lessons. 25] D --> D6[Challenges include curating
and combining
data. 26] E --> E1[Enabled rapid development
of new
therapies. 7] E --> E2[Efforts to reduce
CRISPR therapy
costs. 8] E --> E3[CRISPRs potential
applications beyond
healthcare. 9] E --> E4[ML for genetic
interactions, RNA
structures. 20] E --> E5[CRISPR generates large
functional data
sets. 21] E --> E6[Multiplexed CRISPR screens
for gene
studies. 22] class A,B,B1,B2,B3,B4,B5,B6 crispr class C,C1,C2,C3,C4,C5,C6 data class D,D1,D2,D3,D4,D5,D6 challenges class E,E1,E2,E3,E4,E5,E6 development

Resume:

1.- CRISPR is a bacterial immune system that captures viral DNA sequences and uses them to target and cut matching viral DNA.

2.- CRISPR arrays expand over time as bacteria acquire new viral DNA sequences, forming a memory of past infections.

3.- CRISPR arrays are transcribed into RNA, which combines with Cas proteins to search for and cut matching DNA sequences.

4.- Jennifer Doudna and Emmanuelle Charpentier showed that the Cas9 protein uses RNA guides to unwind and cut targeted DNA.

5.- They simplified the system to a single guide RNA, allowing Cas9 to be programmed to cut any desired DNA sequence.

6.- Cutting DNA at specific sites can induce repair, allowing precise changes or insertion of new genetic information into genomes.

7.- CRISPR has enabled rapid development of new therapies, such as a one-time treatment for sickle cell disease.

8.- Efforts are underway to reduce the cost and expand access to CRISPR-based therapies, which are currently very expensive.

9.- CRISPR has many potential applications beyond healthcare, including in addressing climate change challenges.

10.- Biological data are often limiting compared to data sets in other fields, posing challenges for machine learning applications.

11.- The Protein Data Bank (PDB) is a prime example of a highly curated, high-quality biological data set.

12.- The PDB has grown from 7 to over 200,000 structures since 1971, mostly from X-ray crystallography.

13.- Structure quality in the PDB is assessed using R-free values, which measure how well models match experimental data.

14.- Introduction of R-free greatly improved the quality of structures in the PDB by reducing overfitting of data.

15.- Machine learning models like AlphaFold2 rely on high-quality data like the PDB to accurately predict protein structures.

16.- Predicting protein function remains challenging, as similar structures can have different functions and annotations are often incomplete or inaccurate.

17.- Even in simple organisms, a large percentage of essential genes have unknown functions that can't be predicted from structure alone.

18.- Ron Boga is developing improved methods for using protein structures to predict function, which he will present at the conference.

19.- Determining what proteins actually do biologically still requires experimental validation, not just structural predictions.

20.- Biological questions that require machine learning include understanding genetic interactions, discovering protein and RNA functions, and predicting RNA structures.

21.- CRISPR can be used to generate large data sets by simultaneously targeting many genes to assess their functions and interactions.

22.- These multiplexed CRISPR screens can be done in cells, tissues, or whole animals to study gene function, drug responses, etc.

23.- Automation allows rapid generation of large CRISPR screening data sets, but library sizes are still relatively small.

24.- Machine learning could help answer questions like why some people with disease-related mutations develop the disease while others don't.

25.- Developing machine learning infrastructure for biology should consider lessons learned from successful data resources like the PDB.

26.- Key challenges include curating data from different sources, assessing data quality, and combining data sets in a meaningful way.

27.- Many CRISPR screening data sets are already publicly available, but lack standardized quality metrics akin to crystallographic R-free values.

28.- Efforts are underway to generate larger, more standardized CRISPR screening data sets that could enable more powerful machine learning analyses.

29.- Careful design of guide RNAs is critical for ensuring precise targeting and minimizing off-target effects in CRISPR-based therapies and screens.

30.- Given CRISPR's power and potential for unintended consequences, responsible development and use of the technology is an active area of discussion.

Knowledge Vault built byDavid Vivancos 2024