Commit 52eac3cb authored by Matej Lexa's avatar Matej Lexa
Added Repeat Masker notes and extra heatmap visualization parameters after manuscript revision

parent 6511372f
......@@ -86,6 +86,12 @@ nextflow run -profile test,singularity"
**TE-greedy-nester settings**
gff_suffix = "_genome_browser"
**Use Repeat Masker output instead**
repeat_masker_gff = ""
repeat_masker_out = ""
......@@ -116,6 +122,11 @@ nextflow run -profile test,singularity"
clustering_threshold = 0.01
RE_run = "RE_output_" + params.sra_run + "_" + params.clustering_threshold
**heatmap vizualization**
norm_ratio_threshold = 2
min_fam_pair_count = 30
......@@ -147,6 +158,10 @@ While creating the HiC pair table, a parallel table is made, which uses chromoso
While the pipeline will happily run on any HiC data and the corresponding reference genome, there are some limitation when running the vanilla gitlab version in such manner. Repeat classification done by Repeat Explorer, TE-greedy-nester and inner blast annotation scripts is plant-oriented, using the Neumann et al. classification scheme. TE-greedy nester enriches the annotations for LTR retrotransposons. Alternative repeat annotations may be prefered for other organisms. Tandem repeats are collected as input in the PlantSat(other_annotations) file. This file is mapped onto the reference genome, so any sequences can be added in FASTA format.
**using Repeat Masker for reference-based annotation instead of TE-greedy-nester**
To make analysis more meaningful for animal species were LTR-retrotransposons are not the main category of repeats, or to provide annotation of additional repeat classes, comparet to only LTR-retrotransposons annotated by TE-greedy-nester, we allow the main reference-based repeat annotation to be provided in a GFF3 file. The pipeline is specifically tuned to accept a combination of *.out and *.gff files from RepeatMasker, but can be adapted to other sources of annotation. The main requirement is for the GFF3 file to contain an annot="repeat_family" variable and for the corresponding Perl script (here to be able to add that name from available output (here *.out and *.gff produced by Repeat Masker and UCSC Genome Browser bed_to_gff3 or Genome Tools gt bed2gff3.
**envisioned modifications**
+ adding tandem repeats to PlantSat.fa
