Author: Noah Nicol
Date: March 2025
This document outlines the RNA-seq analysis process for three zebrafish normoxia samples (SRR19627923, SRR19627924, SRR19627925). The analysis includes data acquisition, quality control, alignment, expression quantification, and visualization, with a specific focus on NOD2 and housekeeping genes for comparison.
The analysis extracts raw gene counts from featureCounts output files, calculates FPKM (Fragments Per Kilobase Million) values, and compares expression levels across the RNA-seq replicates.
conda install -c bioconda sra-toolsconda install -c bioconda fastqcconda install -c bioconda trimmomaticconda install -c bioconda hisat2conda install -c bioconda samtoolsconda install -c bioconda subreadInstall using pip or conda:
pip install matplotlib seaborn pandas numpy jupyterprefetch SRR19627925
fasterq-dump SRR19627925fastqc SRR19627925_1.fastq SRR19627925_2.fastqNote: Because these samples are standard bulk RNA reads, I did no
deduplication (generally used to reduce PCR amplification bias). The
applicable warnings were ‘Per base sequence content’ (which looked like
what I would expect) and ‘Adapter Content’ (which I tried to adjust, but
was challenged in finding the specific sequence corresponding to the
Illumina Universal Adapter).
trimmomatic PE SRR19627925_1.fastq SRR19627925_2.fastq \
SRR19627925_1_paired_trimmed.fastq SRR19627925_1_unpaired_trimmed.fastq \
SRR19627925_2_paired_trimmed.fastq SRR19627925_2_unpaired_trimmed.fastq \
AVGQUAL:20 TRAILING:20 MINLEN:50wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11/GCF_000002035.6_GRCz11_genomic.fna.gz
gunzip GCF_000002035.6_GRCz11_genomic.fna.gz
hisat2-build GCF_000002035.6_GRCz11_genomic.fna zebrafish_indexhisat2 -x zebrafish_index \
-1 SRR19627925_1_paired_trimmed.fastq \
-2 SRR19627925_2_paired_trimmed.fastq \
-S SRR19627925.sam \
--summary-file SRR19627925_alignment_summary.txtsamtools view -bS SRR19627925.sam -o SRR19627925.bam
samtools sort SRR19627925.bam -o SRR19627925_sorted.bam
samtools index SRR19627925_sorted.bamfeatureCounts -p -t exon -g gene_id \
-a genomic.gtf -o gene_counts.txt SRR19627925_sorted.bamAfter generating the gene count files for each replicate, I created custom scripts to extract and analyze the expression data:
get_gene_counts.py: Extracts specific
raw gene counts from the featureCounts output filespython get_gene_counts.pycalculate_fpkm.py: Calculates FPKM
values from gene counts and creates comparison tables with
visualizationspython calculate_fpkm.pyNOD2_expression.ipynb: Jupyter
notebook for exploring and visualizing the resultsjupyter notebook NOD2_expression.ipynbThe analysis focused on the following genes:
FPKM (Fragments Per Kilobase Million) values were calculated using the standard formula:
# Example FPKM calculation for NOD2
NOD2reads = 220
length_bases = 4561
length_kb = length_bases / 1000 # Convert to kilobases
total_reads = 29743387
# FPKM formula
fpkm = (NOD2reads * 1e9) / (length_kb * total_mapped)The genes were categorized based on their FPKM values: - Very High: FPKM > 100 - High: 10 < FPKM ≤ 100 - Moderate: 1 < FPKM ≤ 10 - Low: FPKM ≤ 1
| Gene | Mean FPKM | Expression Level |
|---|---|---|
| actb2 | 3221.67 | Very High |
| b2m | 767.95 | Very High |
| actb1 | 324.06 | Very High |
| rpl13a | 402.77 | Very High |
| sqstm1 | 81.74 | High |
| hif1ab | 28.79 | High |
| map1lc3b | 25.64 | High |
| tbp | 21.50 | High |
| map1lc3a | 15.93 | High |
| nod2 | 1.88 | Moderate |
| tnfa | 0.23 | Low |
| hif1aa | 0.20 | Low |
| il6 | 0.03 | Low |
| il1b | 0.03 | Low |
| gapdh | 0.02 | Low |
The analysis included several visualizations: - Bar plots of FPKM values across genes - Heatmap showing expression patterns across replicates - Coefficient of variation analysis for replicate consistency
This analysis provides a comprehensive analysis of NOD2 expression in zebrafish under normoxic conditions. The moderate expression level of NOD2 (FPKM ~1.9) suggests that it plays a functional role in the ZFL line under atmospheric conditions.
The expression patterns observed align with expectations: housekeeping genes are highly expressed, inflammatory genes show low expression in normoxia, and autophagy markers show basal expression.
[1] Eltzschig HK, Carmeliet P. Hypoxia and inflammation. N Engl J Med. 2011 Feb 17;364(7):656-65. doi: 10.1056/NEJMra0910283. PMID: 21323543; PMCID: PMC3930928.