|
|
This project explores the possibilities to evaluate repeat contacts in genomes from Hi-C data. Traditional Hi-C data processing relies on the ability to map paired Hi-C reads to a reference genome and make inferences about its 3D organization from the resulting contact maps. Repeat sequences and multi-mapping reads are traditionally ignored in these approaches.
|
|
|
|
|
|
Our first idea was to group Hi-C reads by repeat family to see if these families interact. Many, of course, do. However, if this is to be biologically interesting, there should be more (or less) interaction than whatever the background rate is. We therefore try to normalize the number of interaction and estimate how much the rate is elevated (or decreased). Although we used consensus sequences initially to classify reads by their repeat families, we later turned to Repeat Explorer to do this.
|
|
|
|
|
|
This approach seems to provide valid biological insights, as exemplified, for example, by interactions uncovered between 18SrDNA and 25SrDNA. However, it relies only on a handful of Hi-C reads that happen to be repetitive enough to allow classification, and at the same time, unique enough to allow mapping to reference genome. This may lead to biased results, among other things. To improve things, we may actually be able to use Hi-C reads in close proximity of repetitive sequences to achieve the same with a higher number of valid read pairs.
|
|
|
|
|
|
An extreme version of the above improvement would be to take the traditional Hi-C experiment results and assign the contacts to all repetitive sequences in close proximity (in sequence, not 3D). This approach would not require read clustering, only annotation of the reference genome. and would be a radical change from what we set out to do initially (see above). |
|
|
\ No newline at end of file |