Shotgun sequencing
From Free net encyclopedia
Shotgun sequencing is a method used in genetics for sequencing long DNA strands. Since the chain termination method of DNA sequencing can only be used for fairly short strands, it is necessary to divide longer sequences up and then assemble the results to give the overall sequence. In chromosome walking, this division is done by progressing through the entire strand, piece by piece; shotgun sequencing uses a faster, but more complex, process to assemble random pieces of the sequence.
In shotgun sequencing, DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a contiguous sequence.
For example, consider the following two rounds of shotgun reads:
Original strand : AGCATGCTGCAGTCATGCTTAGGCTA
First round of shotgun reads : AGCATGCTGCAG TCATGCTTAGGCTA
Second round of shotgun reads : TTAGGCTA AGCATGCTGCAGTCATGC
In this extremely simplified example, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally aggravated by the great abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence.
Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome.
Whole genome shotgun sequencing
High-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.
The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as contigs. Contigs can be linked together into scaffolds by following connections between mate pairs. The distance between contigs can be inferred from the mate pair positions if the library size is known and has a narrow window of deviation.
Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as <math>{NL \over G}</math>. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x coverage.
Proponents of this approach argue that it is possible to sequence the whole genome at once using large arrays of sequencers, which makes the whole process much more efficient than more traditional approaches. Detractors argue that although the technique quickly sequences large regions of DNA, its ability to correctly link these regions is suspect, particularly for genomes with repeating regions. As sequence assembly programs become more sophisticated and computing power becomes cheaper, it will be possible to overcome this limitation.
References
This article contains material from the NCBI Handbook published by the NCBI, which, as a US government publication, is in the public domain at http://www.ncbi.nlm.nih.gov/About/disclaimer.html.ja:ショットガン・シークエンシング法