Overview

This project investigates the DNA binding patterns of GRHL2 (Grainyhead-like protein 2), a transcription factor implicated in cancer progression. Using microarray data and unsupervised machine learning techniques, I analyzed binding motifs to better understand how GRHL2 interacts with DNA.

Key Components

  • Data Processing: Cleaned and normalized microarray data to identify binding regions
  • Motif Analysis: Applied unsupervised learning to discover potential binding motifs
  • Visualization: Created interactive visualizations of binding patterns and motif clusters
  • Integration: Connected findings with known GRHL2 functions in cancer pathways

Technical Details

  • Languages: Python, R
  • Key Libraries: scikit-learn, BioPython, ggplot2
  • Analysis Methods:
    • K-means clustering
    • Position Weight Matrices (PWM)
    • Sequence logo generation
    • Statistical significance testing

Results

The analysis revealed several interesting patterns in GRHL2 binding preferences, including:

  • Correlation between binding strength and specific sequence features
  • Clustering of binding sites with similar characteristics