
Transcription
Ay et al. BMC Genomics (2015) 16:121DOI 10.1186/s12864-015-1236-7RESEARCH ARTICLEOpen AccessIdentifying multi-locus chromatin contacts inhuman cells using tethered multiple 3CFerhat Ay1 , Thanh H Vu2 , Michael J Zeitz2 , Nelle Varoquaux3,4,5 , Jan E Carette6 , Jean-Philippe Vert3,4,5 ,Andrew R Hoffman2* and William S Noble1,7*AbstractBackground: Several recently developed experimental methods, each an extension of the chromatin conformationcapture (3C) assay, have enabled the genome-wide profiling of chromatin contacts between pairs of genomic loci in3D. Especially in complex eukaryotes, data generated by these methods, coupled with other genome-wide datasets,demonstrated that non-random chromatin folding correlates strongly with cellular processes such as gene expressionand DNA replication.Results: We describe a genome architecture assay, tethered multiple 3C (TM3C), that maps genome-wide chromatincontacts via a simple protocol of restriction enzyme digestion and religation of fragments upon agarose gel beadsfollowed by paired-end sequencing. In addition to identifying contacts between pairs of loci, TM3C enablesidentification of contacts among more than two loci simultaneously. We use TM3C to assay the genome architecturesof two human cell lines: KBM7, a near-haploid chronic leukemia cell line, and NHEK, a normal diploid humanepidermal keratinocyte cell line. We confirm that the contact frequency maps produced by TM3C exhibit featurescharacteristic of existing genome architecture datasets, including the expected scaling of contact probabilities withgenomic distance, megabase scale chromosomal compartments and sub-megabase scale topological domains. Wealso confirm that TM3C captures several known cell type-specific contacts, ploidy shifts and translocations, such asPhiladelphia chromosome formation (Ph ) in KBM7. We confirm a subset of the triple contacts involving the IGF2-H19imprinting control region (ICR) using PCR analysis for KBM7 cells. Our genome-wide analysis of pairwise and triplecontacts demonstrates their preference for linking open chromatin regions to each other and for linking regions withhigher numbers of DNase hypersensitive sites (DHSs) to each other. For near-haploid KBM7 cells, we infer wholegenome 3D models that exhibit clustering of small chromosomes with each other and large chromosomes with eachother, consistent with previous studies of the genome architectures of other human cell lines.Conclusion: TM3C is a simple protocol for ascertaining genome architecture and can be used to identifysimultaneous contacts among three or four loci. Application of TM3C to a near-haploid human cell line revealedlarge-scale features of chromosomal organization and multi-way chromatin contacts that preferentially link regions ofopen chromatin.Keywords: Genome architecture, Chromatin conformation capture, Multi-locus chromatin contacts, Near-haploidhuman cells, Leukemia, Three-dimensional modeling*Correspondence: [email protected]; [email protected] Department of Genome Sciences, University of Washington, Seattle, WA98195, USA2 Veterans Affairs Palo Alto Health Care System, Stanford University MedicalSchool, Palo Alto, CA 94304, USAFull list of author information is available at the end of the article 2015 Ay et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproductionin any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication o/1.0/) applies to the data made available in this article, unless otherwise stated.
Ay et al. BMC Genomics (2015) 16:121BackgroundA variety of microscopic imaging techniques have longbeen used to study chromatin architecture and nuclearorganization [1-3]. Recent advances triggered by theinvention of chromatin conformation capture (3C) enableascertainment of genome architecture on a genome-widescale for virtually any genome, including human [4-6],mouse [5,7], budding yeast [8], bacteria [9], fruit fly[10] and a malarial parasite [11]. These studies haverevealed that the three-dimensional form of the genomein vivo is highly related to genome function through processes such as gene expression and replication timing.Therefore, understanding how chromosomes fold and fitwithin nuclei and how this folding relates to functionand fitness is crucial in gathering a thorough pictureof epigenetic control of gene regulation for eukaryoticorganisms.Hi-C was the first molecular assay to measure genomearchitecture on a genome-wide scale [4], and the assaycontinues to be widely used [6,11,12]. Hi-C involvesseven steps: (1) crosslinking cells with formaldehyde, (2)digesting the DNA with a six-cutter restriction enzyme,(3) filling overhangs with biotinylated residues, (4) ligating the fragments, (5) creating a sequence libraryusing streptavidin pull-down, (6) high-throughput pairedend sequencing, and (7) mapping paired ends independently to the genome to infer contacts. A subsequentlydescribed assay by Duan et al. [8] is more complex,involving a pair of restriction enzymes (REs) applied inthree separate steps (RE1, RE2, circularization, then RE1again), as well as the introduction of EcoP151 restriction sites to produce paired tags of 25–27bp. Morerecently, the tethered conformation capture (TCC) assayenhances the signal-to-noise ratio by carrying out a HiC-like protocol using DNA that is tethered to a solidsubstrate [13].One limitation of current genome architecture assaysis their inability to identify simultaneous interactionsamong multiple loci. Chromosomes are composed ofcomplex higher order chromatin structures that bringmany distal loci into close proximity. In particular, evidence suggests that eukaryotic transcription occurs infactories containing many genes [14]. Recently, multiplegene interaction complexes associated with promoterswere found to contain an average of nearly nine genes[15]. However, currently available experimental data cannot ascertain to what extent these multiple gene interactions occur simultaneously or are confined to differentsub-populations of nuclei. This distinction is analogousto the distinction between “party hubs” and “date hubs”in protein-protein interaction networks, in which a hubprotein interacts either simultaneously or in a serial fashion with a series of partner proteins [16]. In the contextof genome architecture assays, distinguishing betweenPage 2 of 17“party loci” and “date loci” will be a crucial first step inelucidating the role of combinatorial regulation of geneexpression.A molecular colony technique recently developed byGavrilov et al. [17] investigated multicomponent interactions among remote enhancers and active β-globingenes in mouse erythroid cells. This assay, however, isPCR-based and requires a primer design step, which prevents it from providing a genome-wide picture of potential multicomponent contacts. An earlier genome-wideassay by Sexton et al., which is adapted from the traditional Hi-C protocol and is similar to the assay wepresent here, acknowledged the existence of multi-locuscontacts that can be identified from paired-end readsin their data [10]. However, due to a number of differences in that protocol compared to TM3C (e.g., sizeselection for larger fragments, shorter read lengths andno in-gel ligation step), identifying a substantial number of multi-locus contacts was not possible when weapply our two-phase mapping pipeline to the Sextonet al. data ( 0.0004% triples and no quadruples). Therefore, genome-wide methods that distinguish betweensimultaneous contacts among multiple loci and pairwisecontacts that happen in different sub-populations of cellsare still necessary.To address this issue, we developed the tethered multiple chromosome conformation capture assay (TM3C),which involves a simple protocol of restriction enzymedigestion and religation of fragments within agarosegel beads (tethering step) followed by high throughput paired-end sequencing (Figure 1, steps 1–4). Weapply TM3C to two human cell lines and confirm thatthe DNA–DNA contact matrices produced by TM3Cexhibit features characteristic of existing genome architecture datasets, including the expected scaling of contact probabilities with genomic distance, enrichment ofintrachromosomal contacts, megabase scale chromosomal compartments and sub-megabase scale topologicaldomains. We confirm that TM3C in KBM7 cells capturesseveral known cell type-specific contacts, ploidy shifts andtranslocations, such as Ph formation. In addition, wedemonstrate that TM3C enables genome-wide identification of contacts among more than two loci simultaneously.We identify multi-locus contacts involving three (triple)or four (quadruple) loci by a two-phase mapping strategy that separately maps chimeric subsequences within asingle read (Figure 1, steps 5–8). This mapping strategypotentially allows us to identify co-regulation or combinatorial regulation events, while also greatly increasing thenumber of distinct pairwise contacts (doubles) identified.We also validate a subset of the triple contacts involvingthe IGF2-H19 imprinting control region (ICR) using PCRfor KBM7 cells. We demonstrate that pairwise and triplecontacts prefer to link open chromatin regions to each
Ay et al. BMC Genomics (2015) 16:121Page 3 of 17Figure 1 Overview of TM3C experimental protocol and mapping of paired-end reads to human genome. 1. Cells are treated withformaldehyde, covalently crosslinking proteins to one another and to the DNA. The DNA is then digested with either a single 4-cutter enzyme(DpnII) or a cocktail of enzymes (AluI, DpnII, MspI, and NlaIII). 2. Melted low-melting agarose solution is added to the digested nuclei to tether theDNA to agarose beads. Thin strings of the hot nuclei plus agarose solution is then transferred to an ice-cold ligation cocktail overnight. 3. Afterreversal of formaldehyde crosslinks and purification via gel extraction, the TM3C molecules are sonicated and size-selected for 250 bp fragments. 4.Size-selected fragments are paired-end sequenced (100 bp per end) after addition of sequencing adaptors. 5. Each end of paired-end reads aremapped to human reference genome. If both ends are mapped then the pair is considered a double and retained because it is informative forgenome architecture. 6. Read ends that do not map to the reference genome are identified and segregated according to the number of cleavagesites they contain for the restriction enzyme(s) used for digestion. 7. Reads with exactly one cleavage site are considered for the second phase ofmapping. These reads are split into two from the cleavage site and each of these two pieces are mapped back to the reference genome. 8. Readpairs with either one or both ends not mapped in the first mapping phase are reconsidered after second phase. Depending on how many piecesstemming from the original reads are mapped in the second phase, such pairs lead to either no informative contacts, doubles, triples or quadruples.other and regions with higher numbers of DHSs to eachother.Finally, we use the contact maps gathered from TM3Cto infer a local 3D structure of the IGF2-H19 region at40 kb resolution and a whole genome 3D model at 1 Mbresolution for the near-haploid KBM7 genome. Our 3Dmodels place H19 and IGF2 genes far away from eachother, consistent with their opposite transcriptional status, and place gene-rich small chromosomes (chrs. 16,17, 19–22) and large chromosomes (chrs. 1–5) near eachother, confirming previous observations of gene-density-correlated arrangements of higher-order chromatin inhuman cells [18].ResultsTethered multiple chromatin conformation capture (TM3C)To identify simultaneous chromatin contacts among twoor more loci, we digest crosslinked chromatin with oneor more 4-cutter restriction enzymes (REs) (Step 1 ofFigure 1). When using multiple REs, we select a setof enzymes such that sticky or blunt ends left by one
Ay et al. BMC Genomics (2015) 16:121Page 4 of 17enzyme are incompatible with the ends left by any other,thereby preventing ligation between fragments generatedby different enzymes. We then encapsulate and ligate thedigested DNA within agarose beads (Step 2 of Figure 1),which replaces the tethering step of Kalhor et al. [13]. Wethen size-select DNA fragments of around 250 bp andsubject the selected fragments to high throughput pairedend sequencing (Steps 3, 4 of Figure 1). Our assay differsfrom the original Hi-C assay in three primary ways: (i)TM3C can use multiple REs simultaneously, (ii) TM3Cdoes not include a step where sticky ends of restriction fragments are biotinylated, and (iii) TM3C carriesout the ligation step within agarose gel beads. Digestion using multiple REs greatly increases the resolutionthat can be achieved via these genome-wide 3C-basedtechniques (Additional file 1: Figure S1). However, comparison of two libraries, one generated with four 4-cuttersand the other with only one, suggests that the noise-tosignal ratio is much higher for the multiple 4-cutters case.Our second modification, elimination of the biotinylationstep, greatly reduces the complexity of the overall protocol and has already been applied successfully by Sextonet al. [10]. This simplification, however, comes with thedrawback of sequencing many uninformative, unligatedsonication products both for the TM3C and the Sextonet al. protocols. Because detection of such uninformativeread pairs is computationally trivial, this simplification,fortunately, does not contribute an additional noise factor. The third modification we implement, in-gel ligation,is similar to but simpler than the tethering achieved usingprotein biotinylation in the tethered conformation capture (TCC) assay [13]. Our initial experimental data whichomitted the in-gel ligation demonstrated that without thisstep the resulting signal-to-noise ratio for the case of four4-cutters is very low (95% of the contacts are interchromosomal). Addition of in-gel ligation step improved thepercentage of intrachromosomal contacts from 5% to 20%and 48% for the four 4-cutter (KBM7-TM3C-4) and one 4cutter (KBM7-TM3C-1) libraries, respectively. Therefore,we only present the results from the libraries generatedusing the in-gel ligation and focus mainly on the resultsfrom our one 4-cutter library for both KBM7 and NHEKcell lines.We use TM3C to investigate the chromatin architecture of the near haploid cell line KBM7 (25, XY, 8,Ph ) extracted from a heterogeneous chronic leukemiacell line [19], and NHEK, a normal diploid human keratinocyte primary cell line (Lonza Walkersville Inc.). Weconstruct libraries using only one four-base cutter restriction enzyme (TM3C-1) for both KBM7 and NHEK. Wealso create two libraries from KBM7 cells using fourdifferent four-base cutters, one from crosslinked cells(KBM7-TM3C-4) and one from non-crosslinked cells(KBM7-MCcont-4) as a control (Table 1). In what follows,we report results from application of TM3C to these twohuman cell lines mainly focusing on KBM7.TM3C reveals multi-locus chromatin contactsIn addition to providing higher resolution, the use of frequently cutting REs (4-cutters) or multiple REs togetherallows identification of simultaneous contacts amongmore than two loci, even with reads as short as 100 bp.The original Hi-C method only retains read pairs in whichboth reads map completely to the reference genome.Here we refer to this type of contacts as type F-F (fullymapped/fully mapped, Step 5 of Figure 1). Unlike current Hi-C mapping pipelines, after identifying F-F pairs,we further process the unmapped paired-end reads tosee whether we can still rescue some informative chromatin contacts from them. Our motivation to pursuethese reads stems from the striking difference betweenthe number of restriction sites within fully-mapped versusnon-mapped reads (Figure 2a). In both the TM3C-1 andTM3C-4 libraries, greater than 70% of the non-mappedreads contain at least one RE cut site, whereas 90% ofthe mapped reads contain no cut sites for the TM3C1 library (two sample Kolmogorov-Smirnov test p-valuesfor both TM3C-1 and TM3C-4 are approximately equalto 0). This difference suggests that read ends that fail tomap as a whole can still be informative of chromatin contacts because they potentially contain real ligation eventsleading to chimeric reads. In order to extract this contactinformation, we further process the read ends containingone restriction site, thereby identifying contacts betweena partially mapped read and a fully mapped read (P-F)or between two partially mapped reads (P-P, Steps 6–8Table 1 Summary of datasets generated in this paperRestriction enzymes (REs)Cell typeTetheringAluIAG CTMboI/DpnII GATCMspIC CGGNlaIIICATG IdentifierNHEKYesNHEK-TM3C-1KBM7YesKBM7Yes KBM7-TM3C-4KBM7 (gDNA)No KBM7-MCcont-4 KBM7-TM3C-1
Ay et al. BMC Genomics (2015) 16:121Page 5 of 17Figure 2 Consistency of TM3C data with known organizational principles and KBM7 karyotype. (a) Number of RE cut sites within reads thatare fully mapped and nonmapped in the first phase mapping for KBM7 libraries. (b) Scaling of contact probability with genomic distance for threecrosslinked libraries and one non-crosslinked control library. (c) Scaling of contact probability in log–log scale for three different sets of contactsidentified in KBM7-TM3C-1 library. Pairwise chromosome contact matrices for (d) KBM7-TM3C-1, (e) KBM7-TM3C-4, (f) NHEK-TM3C-1 and (g)KBM7-MCcont-4 libraries. For these plots contact counts are averaged over all pairs of mappable 1 Mb windows between the two chromosomes.of Figure 1, Methods). This two-phase mapping strategynot only identifies a greater number of pairwise contacts(doubles) but also allows us to identify contacts involvingthree or four loci from only one paired-end read. Step 8 ofFigure 1 summarizes the different cases arising from thesecond mapping phase for a read pair that did not qualifyas F-F in the first phase. Overall, after excluding intrachromosomal contacts with genomic distance 20 kb, we
Ay et al. BMC Genomics (2015) 16:121Page 6 of 17identify more than 210K triples from our KBM7-TM3C-1library together with 10.1M and 857K additional pairwisecontacts from P-F and P-P type read pairs, respectively(Table 2, Additional file 2). We also investigate the mapping orientations (signs) of ligated fragments that createdifferent contact types (Table 3). The distribution of readsamong all possible sign combinations is expected to havea bias for reads that are sonication products (undigestedor religated) and to be uniform for de novo chromatincontacts due to ligation events. Table 3 shows this is thecase for both the contacts that are identified by traditional Hi-C pipelines (F-F) as well as for the contacts weidentify here that produce triples. Since we size select forfragments that are approximately 250 bp, the genomicdistance threshold of 1 kb eliminates all sonication products, resulting in uniform distribution for the remainingcontacts from TM3C.Two-phase mapping rescues contacts informative ofgenome architectureFollowing identification of all three types of contacts (F-F,P-F, and P-P), we evaluate the quality of the resulting contact sets for each library in four ways. First, we confirmthat the contact probability between two intrachromosomal loci exhibits a sharp decay with increasing genomicdistance for crosslinked libraries but not for the controllibrary when all contact types are pooled (Figure 2b). Second, we observe that this scaling relationship is consistentfor different contact types (Figure 2c), and the scaling islog-linear for the genomic distance range of 0.5–7 Mb,consistent with observations from Hi-C data [4]. Third,we confirm visually and quantitatively that the interchromosomal contact maps we obtain from each contact typeare consistent with each other (Additional file 1: FigureS2, pairwise matrix correlations are 0.997, 0.964 and 0.954for (F-F, P-F), (F-F, P-P) and (P-F, P-P), respectively)and that the contact maps are consistent with knownorganizational hallmarks of human genome architecture,such as the increased number of contacts between smallchromosomes (16–22 except 18) (Figure 2d–f, Additionalfile 1: Figure S2). Fourth, we confirm that our contact profiles capture known karyotypic abnormalities of KBM7cells, such as diploidy of chromosome 8 ( 8), partialdiploidy of chromosome 15, and t(9;22)(q34;q11)) translocation between chromosomes 9 and 22 that leads toPhiladelphia chromosome formation [19,20] (Figure 2d,e, Additional file 1: Figure S3). Normal diploid humankeratinocyte (NHEK) cells exhibit no karyotypic abnormalities except higher average contact counts betweenchromosomes 17, 19 and 22 (Figure 2f ). For the noncrosslinked KBM7 control library, only the changesrelated to copy number (e.g., diploidy) are apparent fromthe heatmap (Figure 2g). Translocations are not visiblein the control because digestion of non-crosslinked chromatin does not preserve genomic distances. Together,these results indicate that TM3C successfully assaysgenome architecture of human cells and suggests that contacts recovered by our two-phase mapping strategy, whichare traditionally discarded from Hi-C analysis, are consistent with traditionally retained contacts. Therefore, for allremaining analyses with pairwise contacts we combine allthree types (F-F, P-F, P-P) into an aggregated contact mapfor each library.TM3C data confirms chromatin compartments andtopological domainsIn addition to evaluating whether results from the TM3Cdata sets are consistent with polymer models of chromatinfolding and karyotypic properties of assayed cell lines, weassess whether TM3C contact maps exhibit the expectedcompartment-scale and domain-scale organization. Forthis purpose we perform eigenvalue decomposition onTable 2 Summary of informative pairwise and multi-locus contacts for each KBM7 libraryLibraryKBM7-TM3C-1KBM7-TM3C-4Total reads95,000,00072,800,218Doubles 15.61%)(0.22%)(0.002%)inter: 8,036,033inter: 92,959inter: 672intra: 6,794,444intra: 28,930intra: 38mixed: 89,360mixed: nter: 11,544,137inter: 594,052inter: 15,889intra: 2,314,848intra: 22,787intra: 85mixed: 199,786mixed: 9,184
Ay et al. BMC Genomics (2015) 16:121Page 7 of 17Table 3 Summary of intrachromosomal read orientations for different contact types (KBM7-TM3C-1)Contact typeGenomic dist.Read orientations (end1/end2) / /--/ -/-All1.8%48.2%48.2%1.8% 1 kb24.9%25.1%25.1%24.9% / ,-/- / -,-/- /- ,-/ - /-,-/ All0.1%49.7%0.2%50% 1 kb24.5%25.8%25.3%24.4% / ,-/- /-,-/ -/ ,- /- -/-,- / All0.2%49.9%49.7%0.2% 1 kb25.6%24.1%25.4%25.0%Doubles (F-F)Triples (F-P)Triples (P-F)our contact maps and compare our compartment callsto those of previous Hi-C data sets on other humancell lines [4,5]. The resulting compartment calls exhibit anearly perfect overlap for chromosome 17 between KBM7and GM06990 (Figures 3a–b) and a high level of genomewide conservation (82%) between these two cell lines.Conservation between pairs of contact maps from thefive previously published contact maps ranged between70–82%.Similarly, we perform topological domain decomposition at 40 kb resolution on KBM7 contact maps and compare our calls to those of two human cell lines publishedby Dixon et al. [5] (Methods). Figures 3c–d demonstratethe significant overlap of topological domain calls fromKBM7 and IMR90 contact maps on a 12 Mb region ofchromosome 6. Overall, 73% of IMR90 and 72.8% of ESCdomain boundaries overlap with the boundaries that weidentify for the KBM7 cell line (Fisher’s exact test pvalues compared to random overlap are 10 100 foreach case).Together, the compartment-scale and domain-scalesimilarities between our data and previous Hi-C data suggests that TM3C, a simpler protocol, provides similarresults to Hi-C and that KBM7, which has a distinct karyotype, preserves the large scale organizational features ofother human cell lines.Genome-wide characterization of triple contactsAfter identifying chromatin compartments at 1 Mb resolution and topological domains at 40 kb resolutionfor the KBM7 cell line, we evaluate whether the triplecontacts identified by TM3C preferentially link regionswith the same compartment labels and regions within theboundaries of a topological domain. Figure 4a shows thattriple contacts, similar to doubles, are enriched amongregions of open chromatin (observed 14.6% compared toexpected 8.33%, Methods). Out of all intrachromosomaltriples (triples that link three loci on the same chromosome), we see that 16.5% are within the same topologicaldomain. Note that we exclude from this percentage allshort range intrachromosomal triples ( 20 kb) as wellas all those that link at least two loci within the same40 kb window which would otherwise inflate the reportedpercentage. We assess the significance of this observedpercentage of intradomain triples by generating a nullmodel with 100 shuffled topological domain decompositions for each chromosome (Methods). The median andthe mean percentages are both 14.1% with a standarddeviation of 0.16% for the null model suggesting a statistically significant enrichment of intradomain triples for theobserved domain decomposition compared to shuffledconfigurations (p-value 0, z-score 14.67).Next we carry out an analysis similar to the compartment label analysis described above using the numbersof DNase hypersensitive sites within each 1 Mb window (Methods). Figure 4b shows that, consistent withand slightly surpassing the enrichment for open chromatin compartments, triple contacts as well as doublesare enriched among regions with higher numbers of DHSs(for triples observed 23.7% compared to expected 12.4%,Methods).Verification of triples involving IGF2-H19 locusWe next investigate whether the multi-locus contactsidentified by the TM3C assay correspond to possiblecombinatorial regulatory interactions in KBM7 cells.
Ay et al. BMC Genomics (2015) 16:121Page 8 of 17Figure 3 Comparison of TM3C data with existing genome architecture datasets. Eigenvalue decomposition to identify open/closedchromatin compartments of chromosome 17 (a) from the KBM7 cell line assayed by TM3C and (b) from GM06990 cell line assayed by Hi-C [4].Topological domain calls and contact count heatmaps of a 6 Mb region of chromosome 6 (c) for the KBM7 cell line assayed by TM3C and (d) for theIMR90 cell line assayed by Hi-C [5].Specifically, we focus on triples (contacts involving threeloci) involving the IGF2-H19 locus, which is a classicexample of imprinting that leads to allele-specific geneexpression and regulation in both mouse and human [2124]. Our previous work in human cells has shown that aregion that is located just upstream of the H19 promoterwhich is differentially methylated between maternal andpaternal copies is involved in formation of allele-specificlong-range chromatin loops [23]. Methylation status ofthis imprinting control region (ICR) determines whetherIGF2 is transcribed (paternal allele) or not (maternalallele). Because KBM7 cells are haploid for chromosome11, we expect our TM3C data to be consistent with onlyone mode of operation of this ICR. Analyzing the triplesinferred from KBM7-TM3C-1 data involving the ICRregion ( 20 kb), we observe contacts that link this ICRregion to distal loci on the same chromosome as well as toa trans loci on other chromosomes (Figure 5a).In order to verify these contacts, we design threeprimers per each triple and perform PCR experiments(Additional file 1: Table S1). We test whether pairs offorward/reverse primers give rise to PCR products withexpected sizes to confirm contacts identified from ourtwo-phase mapping (Figure 5b). For triple 3, we use
Ay et al. BMC Genomics (2015) 16:121Page 9 of 17Figure 4 Genome-wide characterization of triple contacts. (a) Observed over expected percentages of double and triple contacts that link 1 Mbregions with the same (either open or closed) or different (mixed) compartment labels for the KBM7-TM3C-1 library (Methods). Both double andtriple contacts prefer to link open compartments to each other with triples showing slightly more enrichment for this trend. (b) Similar percentagesas in (a) but when 1 Mb windows are segregated according to the number of DHSs they contain (Methods). Contacts linking regions with highernumbers of DHSs than the median number are enriched within the doubles and the triples of the KBM7-TM3C-1 library. Due to lack of DNase datafor KBM7 cells, we use data from six other human cell lines for this analysis. Since the results are very similar among different cell lines, here we onlyplot the results for K562 which is also a leukemia cell line.primers 3a and 3c designed for two loci that are 80 kb awayand are linked by a contact found from a ligation occurringwithin one end of a paired-end read. For triple 5, we useprimers 5a and 5c that link two loci that are 24 kb aparton chromosome 11 and are found (one of them only partially) in two separate ends of a paired-end read. For bothof these cases we observe PCR products near the expectedsize from our primer design (Additional file 1: Table S1).Validation of contacts found by our two-phase mappingeither within a single end of a read or from two different ends supports the idea that chimeric reads containinformation about genuine chromatin contacts.Next, we perform PCR on all the triples shown inFigure 5a using all three primers simultaneously. Out of10 triples tested, 6 of them (triples 1–6) resulted in eitherone or more PCR products that have the expected size(s),confirming these contacts (Additional file 1: Figures S4,S5 and Table S1). Detailed analysis of the distal loci thatare contact partners of ICR (either interchromosomal orinte
actions among remote enhancers and active β-globin genes in mouse erythroid cells. This assay, however, is PCR-based and requires a primer design step, which pre-vents it from providing a genome-wide picture of poten-tial multicomponent contacts. An earlier genome-wide assay by Sexton et al., which is adapted from the tra-