Chip seq what is it




















Sequencing Projects. Primer Walking Service. Special Plate Sequencing. Re-Sequencing Projects. GLP Sequencing.

Additional Services. Sequencing Primers. Sequencing Accessories. Sample Shipment. Free Sample Pick-Up. Overnight Sequencing. NGS prepaid solutions. NGSelect Amplicons. NGSelect Ready2Load. Customised Solutions. Genome Sequencing. Transcriptome Sequencing. Sample Preparation. Bioinformatic Solutions. Oncology solutions. Gene Synthesis. Standard Genes. Express Genes. Complex Genes. Express GeneStrands.

Combinatorial Libraries. New Gene Order Wizard. Sequence Optimisation. Codon Usage Adaptation. Molecular Biology Services. Plasmid Preparation. Site Directed Mutagenesis. While this was a revolutionary approach, the technique is limited by the array technologies used. Secondly, those microarrays are limited by the quality of the reference genome and the ability to design probes that will work on an array.

There are also issues about bias in amplification of the ChIP DNA fragments, normalisation of array data, and comparability of array platforms. ChIP-seq protocols have been adapted from ChIP-chip methods: proteins are cross-linked to their bound DNA by formaldehyde treatment, cells are homogenized, and chromatin is sheared and immunoprecipitated with antibody-bound magnetic beads. The immunoprecipitated DNA is then used as the input for a next-generation sequencing library prep protocol, where it is sequenced and analysed for DNA binding sites.

Although the majority of the or so papers published so far have been analysed on the Illumina platform, ChIP-seq can be performed on any next-generation sequencer Wold ChIP-seq has been widely adopted since it was first reported in In fact, it has almost totally supplanted ChIP-Chip, since it allows genome-wide analysis and does not have the limitations discussed above.

ChIP-seq is a powerful tool and versatile tool, and there are many great examples of ChIP-seq use in the literature. I have picked a couple of my favourites from work performed in the core facility I manage to illustrate what is possible, and have included examples where ChIP-seq has inspired the development of new methods:. How could you use ChIP-seq in your research?

Barski et al, High-resolution profiling of histone methylations in the human genome. Cell Hurtado et al, FOXA1 is a key determinant of estrogen receptor function and endocrine response. Nature Genetics The first step depends on the proteins under investigation Figure 1. For many protein-DNA interactions, particularly for transiently bound factors, the first step might be to fix the interaction using formaldehyde as a cross-linker.

This may not be necessary, however, for localizing histone modifications or for simply determining nucleosome positioning, because the histone-DNA interactions are generally strong enough to be maintained without using a cross-linking agent, and in this case a native ChIP n-ChIP without cross-linking might be preferable [ 1 ]. In the case of chromatin-remodeling enzymes such as histone deacetylases HDACs or histone acetyltransferases HATs , an additional cross-linking step using disuccinimidyl glutarate can be included, to preserve protein-protein complexes before cross-linking with formaldehyde [ 2 ].

After cross-linking, the chromatin is fragmented into pieces of about to bp. For ChIP of transcription factors and under cross-linked conditions this is done using sonication. It is important to achieve sufficient and reproducible fragmentation, as preparation of the subsequent library of fragments for sequencing requires fragment sizes of to bp. In the case of n-ChIP, the DNA is digested with micrococcal nuclease to give a slightly better resolution, as it will leave the nucleosome as the smallest unit approximately bp.

After fragmentation, the next step is immunoprecipitation, using a specific antibody against the protein of interest. The success of a ChIP-seq project depends crucially on strong enrichment of the chromatin specifically bound by the protein under study.

We routinely test a number of antibodies and choose the one with consistently high enrichment of DNA at a known binding site when compared with the DNA immunoprecipitated by a nonspecific control antibody such as anti-IgG and no enrichment at negative control sites. Once the enrichment is convincing, the material is ready to be sequenced. If the amount of material is not a limiting factor for example, when it comes from a tissue culture the amount of DNA used for library preparation is about 10 to 15 ng.

If the sequencing platform requires the incorporation of linkers and involves a PCR amplification step, this can be a considerable source of bias [ 3 , 4 ], and it is advisable to keep the number of cycles as low as possible. Once the material is amplified, DNA fragments of to bp long are selected and sequenced. Cross-contamination is a risk, both before PCR and afterwards, but can be minimized by preparing only a very small number of libraries in parallel and using separate gels when purifying the amplified libraries.

When material is limited, which is often the case with primary cell or tissue samples, smaller starting amounts of DNA have to be used. This is usually at the cost of additional rounds of amplification, which introduces amplification biases. However, one way of avoiding this might be to use the Helicos next-generation single-molecule sequencing platform, which can generate a sequencing library from 50 pg of starting material without requiring amplification [ 4 ].

Finally, the short sequenced fragments known as tags are computationally mapped by alignment to a reference genome and regions of enriched tag counts are identified, a step known as peak-calling. ChIP itself has been around for a while. The problem with this approach is that only predetermined individual sites of known sequence can be studied. When this fusion protein is expressed in cells, the adenines in the DNA adjacent to its binding site will be methylated.

These sites can then be identified by methylation-sensitive restriction endonuclease mapping. But this technique is cumbersome, and requires overexpressing an artificial construct, limiting analysis to transfectable cell lines. The DNA bound by the protein of interest is hybridized to a DNA microarray with probes that cover either the entire genome, or specific portions of the genome for example, promoter regions. This is the closest methodology to ChIP-seq, but its mapping precision is lower, and the dynamic range of the readout is significantly less.

The resolution and sensitivity of the two techniques are compared in Figure 2. Moreover, all hybridization approaches mask repetitive sequences. However, we still use ChIP-chip with custom arrays when specific binding sites are to be interrogated repeatedly over many experimental conditions.

Roughly speaking, ChIP-seq has three key steps that determine its success. The first and most crucial is antibody selection; the second is the actual sequencing, which is subject to several possible biases; and the third is the algorithmic analysis, including mapping and peak-calling.

The first requirement, obviously, is that the antibody has some specificity for the protein under study: this can be tested using a panel of recombinant proteins or cell lines transfected with different protein targets. Then, the antibody must be able to immunoprecipitate the target protein. Not all antibodies immunoprecipitate, and even when they do, they may not do well in ChIP.

Ideally, earlier studies will have identified genomic sites where the protein is known to bind, and these sites can be used to optimize the ChIP conditions. The second issue is sequencing, which is a 'black box' for many biologists, who are familiar with what goes in and what comes out, but perhaps not with the possible biases introduced in between. Next-generation sequencing approaches require bulk processing of DNA fragments and massively parallel sequencing.

This means that even the slightest bias in the ligation of linkers, in PCR amplification, or in hybridization might result in some platform-dependent biases in the population data emerging from 10 million or more reads. The technologies are still evolving and the different formats have different biases. The third issue is mapping, which with short tags around 25 to 35 bp can be ambiguous in regions of high homology or in repeat regions.

As the tag sequences get longer, this is less of a problem, but base calling and sequencing errors then limit the mappability. In ChIP-seq, the density of mapped sequence tags is a prime determinant of success. There is now a large number of free and commercial peak-calling software packages. Peak-calling algorithms look for 'peaks' - regions of significant tag enrichment that are typically assumed to reflect transcription factor binding to the region.

While some packages simply aggregate mapped tags without regard to strand, others use strand information to locate the peaks more sensitively. Some peak-calling algorithms require the user to supply a control library whereas others can work without one, but there are several known sources of bias in sequencing reads with ChIP-seq, so that the estimation of confidence in the peaks without a control library is highly unreliable and should be avoided [ 6 ].

Confidence in the peaks is quantified using measures such as P -value or false discovery rate FDR , typically based on comparisons of the ChIP library and the control library, though different peak-calling packages differ in exactly how this is done. Some publicly available peak-calling algorithms are listed in Table 1 and several excellent and detailed reviews are available [ 7 — 9 ], although differences in performance between peak-callers are not well understood [ 9 , 10 ].

Many commercial software packages also contain peak-calling functionality. Many kinds of systematic biases have been described in next-generation sequencing in general and ChIP-seq in particular. Mapping bias results from the frequency of occurrence of particular short homologous sequences in the genome, and from genomic amplifications and repeats.

However, certain biases seem to remain even in the control library; in particular, genomic landmarks such as transcription start sites tend to have higher read counts even in control libraries [ 12 ]. Chromatin structure also introduces biases into the physical manipulation of DNA in ChIP experiments as a result of non-uniform shearing [ 13 ].

Specifically, silenced chromatin is harder to shear than euchromatin and will thus be underrepresented in sequence reads. So regions in transcribed genes appear to be more represented than in silent genes.



0コメント

  • 1000 / 1000