HiC-TE: a Nextflow pipeline for HiC data analysis sheds light on the role of repeats in genome organization


Matej Lexa 1,


Monika Cechova1 Son Hoang Nguyen2 Pavel Jedlicka2 Zdenek Kubat2 Roman Hobza2 Eduard Kejnovsky2

1 Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, Brno, Czech Republic
2 Department of Plant Genomics, Biophysical Institute of the Czech Academy of Sciences, Brno, Czech Republic

Abstract

  • High-throughput chromosome conformation capture (Hi-C) detects physical proximity of DNA segments
  • Hi-C experiments now available in public repositories (e.g. NCBI SRA)
  • We combined this data with Tandem Repeat Finder, PlantSat database, TE-greedy-nester, Repeat Explorer 2
  • We built a Nextflow pipeline that maps and clusters reads to identify Hi-C pairs in specific repeat classes
  • Results are conveniently visualized as heatmaps, circular chromosome plots and exported as data tables
  • First experiments show biologically important interactions of ribosomal DNA clusters or centromeric repeats that are clearly visible in most plant species
  • LTR retrotransposon families with high interaction rates are often species-specific
  • Pipeline represents a novel and reproducible way to analyze the role of repetitive elements in the 3D organization of genomes

 

Objectives

  1. Map repeats in reference (TE-greedy-nester) or reads (Repeat Explorer 2).
  2. Find HiC reads that overlap with annotated repeats.
  3. Count them by repeat families, bin by distance (local contacts, far contacts and interchromosomal)
  4. Generate heatmaps and circos plots showing repeat families.

Methods

 

Nextflow pipeline execution

Nextflow pipeline execution

 

 

Nextflow pipeline graph

Nextflow pipeline graph

 

 

Results

Repeat family heatmap

Repeat family heatmap

Repeat family pair circos plot

Repeat family pair circos plot

HiC data analysis with focus on TEs: contacts between repeats can be visualized with our Nextflow pipeline