Author: Noah Nicol
Date: March 2025
This document outlines the RNA-seq analysis process for three zebrafish normoxia samples (SRR19627923, SRR19627924, SRR19627925). The analysis includes data acquisition, quality control, alignment, expression quantification, and visualization, with a specific focus on NOD2 and housekeeping genes for comparison.
The analysis extracts raw gene counts from featureCounts output files, calculates FPKM (Fragments Per Kilobase Million) values, and compares expression levels across the RNA-seq replicates.
conda install -c bioconda sra-tools
conda install -c bioconda fastqc
conda install -c bioconda trimmomatic
conda install -c bioconda hisat2
conda install -c bioconda samtools
conda install -c bioconda subread
Install using pip or conda:
pip install matplotlib seaborn pandas numpy jupyter
prefetch SRR19627925
fasterq-dump SRR19627925
fastqc SRR19627925_1.fastq SRR19627925_2.fastq
Note: Because these samples are standard bulk RNA reads, I did no
deduplication (generally used to reduce PCR amplification bias). The
applicable warnings were ‘Per base sequence content’ (which looked like
what I would expect) and ‘Adapter Content’ (which I tried to adjust, but
was challenged in finding the specific sequence corresponding to the
Illumina Universal Adapter).
trimmomatic PE SRR19627925_1.fastq SRR19627925_2.fastq \
\
SRR19627925_1_paired_trimmed.fastq SRR19627925_1_unpaired_trimmed.fastq \
SRR19627925_2_paired_trimmed.fastq SRR19627925_2_unpaired_trimmed.fastq AVGQUAL:20 TRAILING:20 MINLEN:50
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11/GCF_000002035.6_GRCz11_genomic.fna.gz
gunzip GCF_000002035.6_GRCz11_genomic.fna.gz
hisat2-build GCF_000002035.6_GRCz11_genomic.fna zebrafish_index
hisat2 -x zebrafish_index \
-1 SRR19627925_1_paired_trimmed.fastq \
-2 SRR19627925_2_paired_trimmed.fastq \
-S SRR19627925.sam \
--summary-file SRR19627925_alignment_summary.txt
samtools view -bS SRR19627925.sam -o SRR19627925.bam
samtools sort SRR19627925.bam -o SRR19627925_sorted.bam
samtools index SRR19627925_sorted.bam
featureCounts -p -t exon -g gene_id \
-a genomic.gtf -o gene_counts.txt SRR19627925_sorted.bam
After generating the gene count files for each replicate, I created custom scripts to extract and analyze the expression data:
get_gene_counts.py
: Extracts specific
raw gene counts from the featureCounts output filespython get_gene_counts.py
calculate_fpkm.py
: Calculates FPKM
values from gene counts and creates comparison tables with
visualizationspython calculate_fpkm.py
NOD2_expression.ipynb
: Jupyter
notebook for exploring and visualizing the resultsjupyter notebook NOD2_expression.ipynb
The analysis focused on the following genes:
FPKM (Fragments Per Kilobase Million) values were calculated using the standard formula:
# Example FPKM calculation for NOD2
= 220
NOD2reads = 4561
length_bases = length_bases / 1000 # Convert to kilobases
length_kb = 29743387
total_reads
# FPKM formula
= (NOD2reads * 1e9) / (length_kb * total_mapped) fpkm
The genes were categorized based on their FPKM values: - Very High: FPKM > 100 - High: 10 < FPKM ≤ 100 - Moderate: 1 < FPKM ≤ 10 - Low: FPKM ≤ 1
Gene | Mean FPKM | Expression Level |
---|---|---|
actb2 | 3221.67 | Very High |
b2m | 767.95 | Very High |
actb1 | 324.06 | Very High |
rpl13a | 402.77 | Very High |
sqstm1 | 81.74 | High |
hif1ab | 28.79 | High |
map1lc3b | 25.64 | High |
tbp | 21.50 | High |
map1lc3a | 15.93 | High |
nod2 | 1.88 | Moderate |
tnfa | 0.23 | Low |
hif1aa | 0.20 | Low |
il6 | 0.03 | Low |
il1b | 0.03 | Low |
gapdh | 0.02 | Low |
The analysis included several visualizations: - Bar plots of FPKM values across genes - Heatmap showing expression patterns across replicates - Coefficient of variation analysis for replicate consistency
This analysis provides a comprehensive analysis of NOD2 expression in zebrafish under normoxic conditions. The moderate expression level of NOD2 (FPKM ~1.9) suggests that it plays a functional role in the ZFL line under atmospheric conditions.
The expression patterns observed align with expectations: housekeeping genes are highly expressed, inflammatory genes show low expression in normoxia, and autophagy markers show basal expression.
[1] Eltzschig HK, Carmeliet P. Hypoxia and inflammation. N Engl J Med. 2011 Feb 17;364(7):656-65. doi: 10.1056/NEJMra0910283. PMID: 21323543; PMCID: PMC3930928.