Quantitative Trait Association Analysis

This project contains scripts for performing quantitative trait association analysis and meta-analysis of GWAS summary statistics.

Association Analysis

Requirements

Installation

Install the required packages using pip:

pip install -r requirements.txt

Usage

The main script is association_analysis.py. It can be run with default parameters:

python association_analysis.py

Or with custom parameters:

python association_analysis.py --pheno path/to/phenotype.txt.gz --covar path/to/covariates.txt.gz --bed path/to/genotype.bed.gz --bim path/to/genotype.bim.gz --fam path/to/genotype.fam.gz --out results.txt --chunk-size 500 --manhattan manhattan.png --qq qq.png

Command-line Arguments

Output

The script produces the following outputs:

  1. A tab-separated text file containing association results with columns:

  2. A Manhattan plot showing -log10(p-value) across the genome

  3. A QQ plot comparing observed vs. expected p-values

Implementation Details

The script performs the following steps:

  1. Loads phenotype and covariate data
  2. Reads SNP and sample information from BIM and FAM files
  3. Processes the BED file in chunks to avoid memory issues
  4. For each SNP, performs linear regression with the specified covariates
  5. Generates summary statistics and plots

The association analysis uses a linear regression model with the following covariates:

Meta-Analysis

This repository also includes scripts for performing fixed-effect meta-analysis on GWAS summary statistics.

Requirements

Usage

R Implementation

Rscript meta_analysis.R

Input Files

The meta-analysis scripts expect two summary statistics files with the following columns:

Default file paths:

Output

The meta-analysis produces a tab-separated text file containing:

Implementation Details

The meta-analysis implements a fixed-effect model using inverse-variance weighting:

  1. Each study is weighted by the inverse of its variance (1/SE²)
  2. The combined effect size is the weighted average of individual effect sizes
  3. The combined standard error is calculated from the sum of precisions
  4. P-values are derived from the normal distribution

The script determines which associations are strengthened by meta-analysis by identifying SNPs where the meta-analysis p-value is smaller than both original p-values.

Notes