HiC-TE: a Nextflow pipeline for HiC data analysis sheds light on the role of repeats in genome organization
1 Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, Brno, Czech Republic
2 Department of Plant Genomics, Biophysical Institute of the Czech Academy of Sciences, Brno, Czech Republic
Abstract
- High-throughput chromosome conformation capture (Hi-C) detects physical proximity of DNA segments
- Hi-C experiments now available in public repositories (e.g. NCBI SRA)
- We combined this data with Tandem Repeat Finder, PlantSat database, TE-greedy-nester, Repeat Explorer 2
- We built a Nextflow pipeline that maps and clusters reads to identify Hi-C pairs in specific repeat classes
- Results are conveniently visualized as heatmaps, circular chromosome plots and exported as data tables
- First experiments show biologically important interactions of ribosomal DNA clusters or centromeric repeats that are clearly visible in most plant species
- LTR retrotransposon families with high interaction rates are often species-specific
- Pipeline represents a novel and reproducible way to analyze the role of repetitive elements in the 3D organization of genomes
Objectives
- Map repeats in reference (TE-greedy-nester) or reads (Repeat Explorer 2).
- Find HiC reads that overlap with annotated repeats.
- Count them by repeat families, bin by distance (local contacts, far contacts and interchromosomal)
- Generate heatmaps and circos plots showing repeat families.
HiC data analysis with focus on TEs: contacts between repeats can be visualized with our Nextflow pipeline