Knowledge Vault 6 /23 - ICML 2017
Genomics, Big Data, and Machine Learning
Peter Donnelly
< Resume Image >

Concept Graph & Resume using Claude 3.5 Sonnet | Chat GPT4o | Llama 3:

graph LR classDef main fill:#f9d9c9, font-weight:bold, font-size:14px classDef basics fill:#d4f9d4, font-weight:bold, font-size:14px classDef analysis fill:#d4d4f9, font-weight:bold, font-size:14px classDef applications fill:#f9f9d4, font-weight:bold, font-size:14px classDef future fill:#f9d4f9, font-weight:bold, font-size:14px Main[Genomics, Big Data,
and Machine Learning] Main --> A[Genetics Basics] Main --> B[Genetic Analysis] Main --> C[Applications in Healthcare] Main --> D[Future Directions] A --> A1[Genetics excites, insights biology,
transforms healthcare 1] A --> A2[DNA, genes, chromosomes, SNPs
basics explained 2] A --> A3[Human DNA inheritance patterns
show correlation 7] A --> A4[Genome sequencing cost decreased,
data expanding 20] B --> B1[Platypus genetics reveal population
differences 3] B --> B2[PCA separates Tasmanian, mainland
platypus populations 4] B --> B3[STRUCTURE reveals finer-scale platypus
genetic structure 5] B --> B4[STRUCTURE model similar to
topic model 6] B --> B5[Statistical methods impute markers,
infer chromosomes 8] B --> B6[HMMs capture genetic data
correlation patterns 9] C --> C1[UK genetic differences studied,
2000 individuals 10] C --> C2[UK genetic clusters match
historical patterns 11] C --> C3[Genetic data reveals unknown
England migration 12] C --> C4[Genetics influences simple, complex
disease risk 13] C --> C5[GWAS compares variant frequencies,
finds risk 14] C --> C6[GWAS variants have small
complex disease effects 15] D --> D1[Genetic variants inform drug
development, effects 16] D --> D2[Mendelian randomization establishes causal
relationships 17] D --> D3[Genetics identifies disease-relevant tissue
types 18] D --> D4[Genetic correlations provide biological
insights 19] D --> D5[App predicts disease risks,
ancestry breakdown 21] D --> D6[Partitioned risk scores inform
interventions 22] E[Additional Considerations] --> E1[Genetics plays smaller disease
prediction role 23] E --> E2[Genetic, health data improve
risk prediction 24] E --> E3[Royal Society report: data,
education recommendations 25] E --> E4[Genomics plc: genetic data
improves healthcare 26] E --> E5[Genetic discrimination prohibited or
avoided 27] E --> E6[Balance privacy, research benefits
crucial 28] Main --> E class Main main class A,A1,A2,A3,A4 basics class B,B1,B2,B3,B4,B5,B6 analysis class C,C1,C2,C3,C4,C5,C6 applications class D,D1,D2,D3,D4,D5,D6,E,E1,E2,E3,E4,E5,E6 future

Resume:

1.- Peter Donnelly discusses the excitement in genetics and genomics, providing insights on human biology and transforming healthcare.

2.- He gives an overview of genetics basics, including DNA, genes, chromosomes, and single nucleotide polymorphisms (SNPs).

3.- Donnelly studies platypuses, unique mammals found in Australia, using genetic information to understand their natural history and population differences.

4.- Principal component analysis of platypus genetic data separates Tasmanian platypuses from mainland ones and shows a north-to-south gradient.

5.- STRUCTURE, a model-based clustering approach, reveals finer-scale genetic structure in platypus populations across Australia.

6.- The STRUCTURE model is similar to the topic model/LDA, independently developed in genetics and machine learning.

7.- Human genetic data shows patterns of correlation due to inheritance of DNA chunks and recombination events.

8.- Statistical methods can impute missing genetic markers and infer separate chromosomes (phasing) from genetic data.

9.- Hidden Markov models can capture mosaic structure and correlation patterns in genetic data.

10.- Donnelly studied genetic differences in the UK, sampling 2000 individuals at 500,000 SNPs.

11.- Fine-scale genetic clusters in the UK match historical migration patterns, kingdoms, and cultural/linguistic divisions.

12.- Genetic data reveals a previously unknown substantial migration into England that best matches DNA from modern-day France.

13.- Genetics plays a role in both simple Mendelian diseases and complex disorders influenced by many genetic variants and environment.

14.- Genome-wide association studies (GWAS) compare genetic variant frequencies between disease cases and controls to find risk loci.

15.- Most GWAS variants have small effects on complex diseases, as natural selection limits the frequency of large-effect variants.

16.- Genetic variants can inform drug development by mimicking the effect of drugs and predicting efficacy and side effects.

17.- Genetics can establish causal relationships between risk factors and diseases using Mendelian randomization.

18.- Genetic data can identify which tissue types are relevant for a disease, like the importance of immune tissues in Alzheimer's.

19.- Genetic correlations between traits, like fluid intelligence and educational attainment or lifespan, can provide biological insights.

20.- The cost of genome sequencing has dramatically decreased, and the amount of available genetic data is rapidly expanding.

21.- Donnelly demonstrates an app prototype that predicts disease risks and ancestry breakdowns from an individual's genetic data.

22.- Partitioned risk scores can inform interventions by separating modifiable and non-modifiable risk factors.

23.- Genetics alone will likely play a smaller role in disease risk prediction compared to other health data.

24.- The combination of genetic and other health data can improve disease risk prediction at an individual and population level.

25.- A Royal Society machine learning report makes recommendations on open data, digital skills education, research priorities, and ethical algorithm development.

26.- Donnelly's company, Genomics plc, aims to use genetic data to understand biology and transform healthcare.

27.- Genetic discrimination in insurance is prohibited by law in the US and voluntarily avoided by UK insurers.

28.- Balancing privacy concerns with the benefits of large-scale genetic research for drug development and risk prediction is crucial.

29.- Integrating genetic data with other omics data (mitochondrial DNA, microbiome) can provide a more complete biological picture.

30.- Rigorous statistical practices, like replication in independent datasets, are essential to avoid false discoveries in large-scale genetic studies.

Knowledge Vault built byDavid Vivancos 2024