This project contains scripts for performing quantitative trait association analysis and meta-analysis of GWAS summary statistics.
requirements.txt
Install the required packages using pip:
pip install -r requirements.txt
The main script is association_analysis.py
. It can be run with default parameters:
python association_analysis.py
Or with custom parameters:
python association_analysis.py --pheno path/to/phenotype.txt.gz --covar path/to/covariates.txt.gz --bed path/to/genotype.bed.gz --bim path/to/genotype.bim.gz --fam path/to/genotype.fam.gz --out results.txt --chunk-size 500 --manhattan manhattan.png --qq qq.png
--pheno
: Path to phenotype file (default: 'Data/Problem 1/eur_phenotype.txt.gz')--covar
: Path to covariate file (default: 'Data/Problem 1/eur_covariates.txt.gz')--bed
: Path to BED file (default: 'Data/Problem 1/genotype_data/P1_data.bed.gz')--bim
: Path to BIM file (default: 'Data/Problem 1/genotype_data/P1_data.bim.gz')--fam
: Path to FAM file (default: 'Data/Problem 1/genotype_data/P1_data.fam.gz')--out
: Path to output file (default: 'association_results.txt')--chunk-size
: Number of SNPs to process in each chunk (default: 1000)--manhattan
: Path to save Manhattan plot (default: 'manhattan_plot.png')--qq
: Path to save QQ plot (default: 'qq_plot.png')The script produces the following outputs:
A tab-separated text file containing association results with columns:
A Manhattan plot showing -log10(p-value) across the genome
A QQ plot comparing observed vs. expected p-values
The script performs the following steps:
The association analysis uses a linear regression model with the following covariates:
This repository also includes scripts for performing fixed-effect meta-analysis on GWAS summary statistics.
Rscript meta_analysis.R
The meta-analysis scripts expect two summary statistics files with the following columns:
Default file paths:
The meta-analysis produces a tab-separated text file containing:
The meta-analysis implements a fixed-effect model using inverse-variance weighting:
The script determines which associations are strengthened by meta-analysis by identifying SNPs where the meta-analysis p-value is smaller than both original p-values.