Overview

This project showcases various assignments and analyses completed as part of my statistical genetics coursework. The collection demonstrates my ability to apply statistical methods to genomic data and extract meaningful biological insights.

GWAS Analysis

For this assignment, we conducted a genome-wide association study (GWAS) to identify genetic variants associated with a simulated quantitative trait using European data from the 1000 Genomes Project. We implemented the analysis in Python using the statsmodels and sklearn libraries for linear regression analysis. We used age, sex, and the first 4 principal components (PC1-4) as covariates, and generated association statistics for approximately 700,000 SNPs.

The analysis identified several suggestive associations primarily on chromosome 12, with the strongest signal at rs4348470 (p = 1.37×10⁻⁷). We visualized the results through Manhattan and QQ plots, which confirmed the absence of genome-wide significant associations but revealed multiple variants approaching significance.

Manhattan Plot

Future Assignments

This section will be expanded with additional statistical genetics assignments and projects as they are completed.