Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA

Gergely Nagy, Erik Czipa, László Steiner, Tibor Nagy, Sándor Pongor, L. Nagy, E. Barta

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: ChIP-seq provides a wealth of information on the approximate location of DNA-binding proteins genome-wide. It is known that the targeted motifs in most cases can be found at the peak centers. A high resolution mapping of ChIP-seq peaks could in principle allow the fine mapping of the protein constituents within protein complexes, but the current ChIP-seq analysis pipelines do not target the basepair resolution strand specific mapping of peak summits. Results: The approach proposed here is based on i) locating regions that are bound by a sufficient number of proteins constituting a complex; ii) determining the position of the underlying motif using either a direct or a de novo motif search approach; and iii) determining the exact location of the peak summits with respect to the binding motif in a strand specific manner. We applied this method for analyzing the CTCF/cohesin complex, which holds together DNA loops. The relative positions of the constituents of the complex were determined with one-basepair estimated accuracy. Mapping the positions on a 3D model of DNA made it possible to deduce the approximate local topology of the complex that allowed us to predict how the CTCF/cohesin complex locks the DNA loops. As the positioning of the proteins was not compatible with previous models of loop closure, we proposed a plausible "double embrace" model in which the DNA loop is held together by two adjacent cohesin rings in such a way that the ring anchored by CTCF to one DNA duplex encircles the other DNA double helix and vice versa. Conclusions: A motif-centered, strand specific analysis of ChIP-seq data improves the accuracy of determining peak positions. If a genome contains a large number of binding sites for a given protein complex, such as transcription factor heterodimers or transcription factor/cofactor complexes, the relative position of the constituent proteins on the DNA can be established with an accuracy that allow one to deduce the local topology of the protein complex. The proposed high resolution mapping approach of ChIP-seq data is applicable for detecting the contact topology of DNA-binding protein complexes.

Original languageEnglish
Article number637
JournalBMC Genomics
Volume17
Issue number1
DOIs
Publication statusPublished - Aug 15 2016

Fingerprint

DNA
Proteins
DNA-Binding Proteins
Transcription Factors
Genome
cohesins
CCCTC-binding factor
Binding Sites

Keywords

  • ChIP-seq
  • Cohesin
  • CTCF
  • DNA loop

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA. / Nagy, Gergely; Czipa, Erik; Steiner, László; Nagy, Tibor; Pongor, Sándor; Nagy, L.; Barta, E.

In: BMC Genomics, Vol. 17, No. 1, 637, 15.08.2016.

Research output: Contribution to journalArticle

Nagy, Gergely ; Czipa, Erik ; Steiner, László ; Nagy, Tibor ; Pongor, Sándor ; Nagy, L. ; Barta, E. / Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA. In: BMC Genomics. 2016 ; Vol. 17, No. 1.
@article{ba1ae7a979b94e0d9e3a58a26148607b,
title = "Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA",
abstract = "Background: ChIP-seq provides a wealth of information on the approximate location of DNA-binding proteins genome-wide. It is known that the targeted motifs in most cases can be found at the peak centers. A high resolution mapping of ChIP-seq peaks could in principle allow the fine mapping of the protein constituents within protein complexes, but the current ChIP-seq analysis pipelines do not target the basepair resolution strand specific mapping of peak summits. Results: The approach proposed here is based on i) locating regions that are bound by a sufficient number of proteins constituting a complex; ii) determining the position of the underlying motif using either a direct or a de novo motif search approach; and iii) determining the exact location of the peak summits with respect to the binding motif in a strand specific manner. We applied this method for analyzing the CTCF/cohesin complex, which holds together DNA loops. The relative positions of the constituents of the complex were determined with one-basepair estimated accuracy. Mapping the positions on a 3D model of DNA made it possible to deduce the approximate local topology of the complex that allowed us to predict how the CTCF/cohesin complex locks the DNA loops. As the positioning of the proteins was not compatible with previous models of loop closure, we proposed a plausible {"}double embrace{"} model in which the DNA loop is held together by two adjacent cohesin rings in such a way that the ring anchored by CTCF to one DNA duplex encircles the other DNA double helix and vice versa. Conclusions: A motif-centered, strand specific analysis of ChIP-seq data improves the accuracy of determining peak positions. If a genome contains a large number of binding sites for a given protein complex, such as transcription factor heterodimers or transcription factor/cofactor complexes, the relative position of the constituent proteins on the DNA can be established with an accuracy that allow one to deduce the local topology of the protein complex. The proposed high resolution mapping approach of ChIP-seq data is applicable for detecting the contact topology of DNA-binding protein complexes.",
keywords = "ChIP-seq, Cohesin, CTCF, DNA loop",
author = "Gergely Nagy and Erik Czipa and L{\'a}szl{\'o} Steiner and Tibor Nagy and S{\'a}ndor Pongor and L. Nagy and E. Barta",
year = "2016",
month = "8",
day = "15",
doi = "10.1186/s12864-016-2940-7",
language = "English",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Motif oriented high-resolution analysis of ChIP-seq data reveals the topological order of CTCF and cohesin proteins on DNA

AU - Nagy, Gergely

AU - Czipa, Erik

AU - Steiner, László

AU - Nagy, Tibor

AU - Pongor, Sándor

AU - Nagy, L.

AU - Barta, E.

PY - 2016/8/15

Y1 - 2016/8/15

N2 - Background: ChIP-seq provides a wealth of information on the approximate location of DNA-binding proteins genome-wide. It is known that the targeted motifs in most cases can be found at the peak centers. A high resolution mapping of ChIP-seq peaks could in principle allow the fine mapping of the protein constituents within protein complexes, but the current ChIP-seq analysis pipelines do not target the basepair resolution strand specific mapping of peak summits. Results: The approach proposed here is based on i) locating regions that are bound by a sufficient number of proteins constituting a complex; ii) determining the position of the underlying motif using either a direct or a de novo motif search approach; and iii) determining the exact location of the peak summits with respect to the binding motif in a strand specific manner. We applied this method for analyzing the CTCF/cohesin complex, which holds together DNA loops. The relative positions of the constituents of the complex were determined with one-basepair estimated accuracy. Mapping the positions on a 3D model of DNA made it possible to deduce the approximate local topology of the complex that allowed us to predict how the CTCF/cohesin complex locks the DNA loops. As the positioning of the proteins was not compatible with previous models of loop closure, we proposed a plausible "double embrace" model in which the DNA loop is held together by two adjacent cohesin rings in such a way that the ring anchored by CTCF to one DNA duplex encircles the other DNA double helix and vice versa. Conclusions: A motif-centered, strand specific analysis of ChIP-seq data improves the accuracy of determining peak positions. If a genome contains a large number of binding sites for a given protein complex, such as transcription factor heterodimers or transcription factor/cofactor complexes, the relative position of the constituent proteins on the DNA can be established with an accuracy that allow one to deduce the local topology of the protein complex. The proposed high resolution mapping approach of ChIP-seq data is applicable for detecting the contact topology of DNA-binding protein complexes.

AB - Background: ChIP-seq provides a wealth of information on the approximate location of DNA-binding proteins genome-wide. It is known that the targeted motifs in most cases can be found at the peak centers. A high resolution mapping of ChIP-seq peaks could in principle allow the fine mapping of the protein constituents within protein complexes, but the current ChIP-seq analysis pipelines do not target the basepair resolution strand specific mapping of peak summits. Results: The approach proposed here is based on i) locating regions that are bound by a sufficient number of proteins constituting a complex; ii) determining the position of the underlying motif using either a direct or a de novo motif search approach; and iii) determining the exact location of the peak summits with respect to the binding motif in a strand specific manner. We applied this method for analyzing the CTCF/cohesin complex, which holds together DNA loops. The relative positions of the constituents of the complex were determined with one-basepair estimated accuracy. Mapping the positions on a 3D model of DNA made it possible to deduce the approximate local topology of the complex that allowed us to predict how the CTCF/cohesin complex locks the DNA loops. As the positioning of the proteins was not compatible with previous models of loop closure, we proposed a plausible "double embrace" model in which the DNA loop is held together by two adjacent cohesin rings in such a way that the ring anchored by CTCF to one DNA duplex encircles the other DNA double helix and vice versa. Conclusions: A motif-centered, strand specific analysis of ChIP-seq data improves the accuracy of determining peak positions. If a genome contains a large number of binding sites for a given protein complex, such as transcription factor heterodimers or transcription factor/cofactor complexes, the relative position of the constituent proteins on the DNA can be established with an accuracy that allow one to deduce the local topology of the protein complex. The proposed high resolution mapping approach of ChIP-seq data is applicable for detecting the contact topology of DNA-binding protein complexes.

KW - ChIP-seq

KW - Cohesin

KW - CTCF

KW - DNA loop

UR - http://www.scopus.com/inward/record.url?scp=84982091042&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84982091042&partnerID=8YFLogxK

U2 - 10.1186/s12864-016-2940-7

DO - 10.1186/s12864-016-2940-7

M3 - Article

C2 - 27526722

AN - SCOPUS:84982091042

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 637

ER -