draw out year’s triumphs in sequencing entire microbial genomes [HGN 7(1), 5 (May-June 1995)] left little doubt that the era of large-scale sequencing had began. The latest whole-genome sequencing feat was presented at the workshop, this time for the heat-loving, methane-producing M. jannaschii.
These remarkable theory tell an chief step toward developing besides optimizing the technologies and recommendations needed to fully sequence the 3 billion bases of anthropomorphic DNA. Optimism runs high that the first human genome reference sequence can be obtained on time, even without revolutionary technical advances. However, significant improvements still are needed for up the accuracy and efficiency and reducing the cost of conventional gel-electrophoretic methods.
Current large-scale genome sequencing focuses on driving gel-based instruments with random (shotgun) or directed strategies, often combining facets of both approaches. Pilot initiatives for production sequencing of anthropomorphic DNA concentrate on picking out and inspecting areas of known biological importance.
Support for the whole-genome also other microorganism sequencing efforts described below comes from DOE’s heavier Microbial Genome Program, which goals to sequence the genomes of microbes with advantage industrial, environmental, and economic importance.
In addition to microbial production-sequencing reports, speakers supplied rest on technology improvements with current gel-based systems; improved manifest technologies, especially capillary electrophoresis (CE); besides alternative, potentially high payoff technologies such as mass spectrometry (MS). Many of these DNA-analysis technologies will feed into the rapidly expanding biotechnology industry, where they will have broader functions in clinical diagnostics, environmental testing, industrial process monitoring, forensics, and agriculture.
Highlights of presentations follow.
Microbial Genome Sequencing
M. jannaschii
Carol Bult [The Institute for Genomic research (TIGR)] described the whole-genome shotgun-sequencing strategy misused to obtain the negotiate genomic issue of M. jannaschii, an organism first isolated from a deep-sea hydrothermal vent in 1982. This genome is the first to be completed from the Archaea domain of life, a group of symbolic microbes that are genetically different from each bacteria and eukaryotes. The Archaea, which include methanogens, thermoacidophiles, and radical halophiles, may represent some of the earliest forms of living cells.
Bult identified several aspects critical to the group’s success. These factors include the availability of a random genomic 2.5-kb-insert plasmid library from Gary Olsen (university of Illinois)and a representative 20-kb-insert lambda library for building a genome scaffold; high-quality sequence data from both ends of the plasmid and lambda clones (using ABI 373 besides 377 automated sequencers with shining technologies); and a robust sequence constituent assembly apparatus (TIGR Assembler, discussed under by G. Sutton). backlash coverage was obtained for the entire genome. Bult emphasized the importance of tightly integrating information strain duck tools for managing and examining data. Sequence annotation is now complete, and all data and clones bequeath be available by inaugural summer.
Pyrococcus furiosus
Robert Weiss (University of Utah) described a project to event the 2-Mb genome of P. furiosus, another of the hyperthermophilic Archaea. Investigators are driving a multiplexed, transposon-based directed approach with an end-sequencing strategy. In both the mapping and sequencing phases, automated gadgets detect enzyme-linked fluorescence from dna hybrids on nylon membranes.
Summarizing mapping progress, Weiss reported that 1.4 Mb (two-thirds of the genome) had been pure; 2500 mapped clones, representing minimal-set coverage of 0.76 Mb, have been sequenced. Mapping transposons, he noted, is much simpler than acting the base-calling, assembly, and editing steps chief by other sequencing approaches, and the sincerity of the dna constitution encourages scaleup. The group is now moment the end-sequencing phase besides will piece in combination the genome by aligning resolve sequence with organelle transposon maps besides walking their way to continuity. They have about 500 kb of unison sequence.
Borrelia burgdorferi
John Dunn [Brookhaven at rest Laboratory (BNL)] and colleagues are sequencing the 935-kb genome of B. burgdorferi, the spirochete that causes Lyme disease, to develop methods for DNA sequencing by primer walking hold back a hexamer bibliothecarial. The goal is to eliminate up-front function and enable a tenfold reduction in the template preparations required for a average shotgun project. The approach involves driving end sequencing and either hexamer strings or ligated hexamers (18-mers) to walk secluded the grain. As a remonstrance of the method, a 35-kb Borrelia fesmid image (based on the Fos vector) was sequenced by means of this approach. Dunn believes the neighborhood can achieve fourfold redundancy, protect both strands completely sequenced. The next level in the project, he said, is to convey online a high-volume CE system to agree faster sequencing ( http://www.biology.bnl.gov/cellbio/dunn.html).
Sequencing Technologies
Researchers at the mill reported on efforts to optimize current approaches in scaling up for multi- megabase sequencing. Goals are to minimize or eliminate some of the bottlenecks at each step in the sequencing process: template isolation, sequencing reactions, hunk separation and detection, and data assortment and analysis.
Emphasis is on developing fully automated, integrated, modular systems that can process a sample from template estrangement to data evaluation with little or no human intervention.
Front-End Automation: The „Sequatron”
The „Sequatron” described via Trevor Hawkins (Whitehead-MIT) uses accessible components to automate further combine the tasks of cdna isolation, setup of sequencing reactions,thermal cycling, and array purification further concentration for separation on gels. The major element is an articulated CRS 255A robot arm. Solid-phase reversible immobilization is used to isolate and manipulate the DNA on magnetic particles across the process.
Current throughput is 80 microtiter plates of samples (about 8000) from M13 phage supernatants or crude PCR products to sequence-ready samples every 24 hours. New enzymes, energy-transfer primers, and higher-density microtiter plates may increase throughput to 25,000 samples.
Speeding Up cdna Separation and Detection: Capillary Gel Systems
Researchers reported magnificent further exciting results over the past year in developing newer CE strategies in that dramatically faster and higher-resolution dna fragment separation. Advantages of CE come with longer read lengths; more appropriate scorching transfer in the long, thin, polymer-filled capillaries vs that in general slab gels; automatability; further online fragment detection.
Edward Yeung (Iowa State University)described a CE system based on unaccustomed separation, detection, and imaging suggestions for real-time monitoring that he believes will soon enable sequencing of 40 Mb of dna in a single day. A first-generation system has been constructed that uses besides fluid-gel matrices in one hundred capillaries that are read simultaneously in real time, compared with the much slower consecutive-reading technology available with current instruments. Most processes are computerized. The team is scaling up the technology to allow freedom sequencing in up to 1000 capillaries.
Multiplexed CE DNA sequencerThe new system, the ESY9600 Multiplexed Capillary Electrophoresis DNA Sequencer, is scheduled for parting this year by Premier American Technologies Corporation, which licensed the technology from DOE’s Ames Laboratory at Iowa State. The system uses cartridges of 96 fused-silica capillaries leverage the electrophoresis and a PC with software for automation control and data processing. Fragments are detected using an argon laser and a CCD camera. The new system offers a 100-fold gain mark speed and 24-fold gain in throughput over conventional automated sequencers.
Norman Dovichi (University of Alberta, Edmonton) achieved reads of 1000 bases in a hundred thirty five min. by means of decreasing the intense field to one hundred to 150 V/cm, improvement coldness to 70°C, and using a noncrosslinked polyacrylamide matrix (3% concentration) in the capillaries. His presentation concentrated on high-sensitivity fluorescence-detection cuvettes made cover glass microscope slides into which are etched a series of fingers that align precisely with the capillaries. The 16-capillary machine is now online, and the group has demonstrated success squirrel a two-dimensional ostentation create (4-by-5 besides 8-by-12 rows of capillaries), now which they recently filed a patent. They are being rush on a 24-by-24 array (576 capillaries) that has the potential to generate 500 bases per run or up to 300 Mb of arctic ramification in about 2 hours. With 4 runs a day, this might yield 1 Mb of sequel per day per instrument.
Barry Karger (north University) updated progress toward production of a robust CE gadget using replaceable polymer matrices. Factors to optimize include class and concentration of polymer matrix, column conditions, dye chemistries, besides software. Karger’s neighborhood found that fed up (2%) concentrations of very high molecular-weight replaceable noncrosslinked polymer, elevated temperatures, and signal-processing application allow fast sequencing and crave reads (1000 bases in 80 min. veil 97% accuracy). A tangled capillary-array system with CCD camera currently is being tested with a goal of 1 kb per column every 1.5 hours.
Richard Mathies [University of California, Berkeley (UCB)] discussed his collaboration with Alexander Glazer (UCB) and others to develop extra DNA detection methods and CE salary for ultrahigh-speed DNA evaluation. Jingyue Ju, old human Genome Postdoctoral fellow now at Incyte Pharmaceuticals (Palo Alto, California), reported on the development of high-sensitivity energy-transfer dye-labeled primers through dna sequencing and PCR analysis. Mathies also Glazer are using 4 sets of these dyes to accomplish four-color confocal sequencing of mitochondrial DNA in CE. They are also developing a miniaturized integrated DNA analysis system (MIDAS), affixation their CE arrays and comedienne Northrup’s (LLNL) fast PCR methods. Mathies envisions making use of this technique to samples generated by rotation sequencing to produce again analyze 15 kb per detail per hour.
Mark Quesada (BNL) described a prototype 8-capillary DNA sequencer that uses replaceable linear polyacrylamide matrices and material optics being illumination and detection. declaration limits are about 400 to 450 bases in step with hour. The team hopes to have a 21-capillary system this summer to use in the directed sequencing project aimed at the genome of the microorganism that causes Lyme disease.
Innovative Gel-Less Technologies: Mass Spectrometry
Replacing the gel-separation step with MS promises resolution, accuracy, and speed surpassing those of electrophoretic strategies. MS uses the differentiation in mass-to-charge ratios of ionized atoms or molecules to separate them. Problems are encountered command ionizing huge cdna fragments with minimum sample degeneration.
Richard Smith (Pacific Northwest National Laboratory) and Lloyd Smith (University of Wisconsin) talked about the challenges and potential capabilities of dna sequencing using MS with electrospray (ES) activity and matrix-assisted laser desorption ionization (MALDI).
The gentler, solution-based ES ionization produces ions duck a distribution of trap charge states that spreads the molecule signal among several mass-spectrometric peaks, complicating feeling of the resulting garner spectrum. This problem is variegated by means of salts associating blot out the molecule during the sample-preparation activity. Richard smith reported striking reduction of salt content via incorporating an online microdialysis unit before the ionizing march. either incommensurable- or double-stranded PCR products of >100 bp may mean analyzed and accurate crowd measurements obtained without extra detectable fragmentation that may make sequencing much more difficult. The researchers also demonstrated a method for go-getter range maturity using Fourier transform ion cyclotron resonance MS that significantly extends the true read length owing to sequencing.
Lloyd Smith’s laboratory is exploring reasons thanks to DNA fragmentation during MALDI from a solid matrix. Fragmentation is dependent on sequence as well as matrix, again chemical modification may design the breakage. Smith’s group identified a base-protonation mechanism they believe is compound in strand breakage, and the team is for developing a play ball of modified nucleotides that could offer protection.
Sequence Finishing and Analysis
Sequence Finishing
A key component in high-throughput sequencing systems commit be the integration of improved software tools for amassing and finishing the sequence. Steps in this process include automated reading of sequencing gels (set up calling), assembly of contiguous sequence from separate dna fragments (sequence assembly), further editing to resolve ambiguities.
Many assembly algorithms are available, but they cannot tackle data sets lock up large underscore regions. Other assembly challenges include large numbers of pairwise comparisons to determine overlaps in large data sets, sequence-assembly uncertainties due to clone chimerism, and sequencing errors. Granger Sutton (TIGR) described TIGR Assembler, which was used to bring together the shotgun-sequenced genomes of H. influenzae, M. genitalium, and M. jannaschii. It is also being used to assemble shotgun-sequenced BAC clones. Source appeal again executables for TIGR Assembler are available to nonprofit researchers(grange@tigr.org).
Phil flourishing (University of Washington, Seattle) discussed two application tools that improve validity also minimize anthropomorphic enterprise. Phred improves base calls substantially further assesses the quality of processed ABI 373A and 377 trace data. Phrap is a sequence-assembly program that uses information from Phred and from read comparisons to delineate promising base calls; this helps identify repeats further allows use of the full reads significance assembly. Phrap also identifies data anomalies such in that chimeras also vector DNA and leave soon incorporate mapping information. Green’s neighborhood is nearing the point at which unedited, automatically assembled sequences will give an slip rate of 1 per 10 kb on typical cosmid data sets. Phred and Phrap are now considering beta tested.
Noting that sequel accuracy improves exponentially with incremental project cost, David States (Washington University, St. louis) argued since setting precision criteria when large-scale projects are applied. The ultimate momentousness of a sequence to the wider biochemical again clinical communities is an important consideration, he said, noting that an error rate of third in 104 would be possible and endurable. States’ team is pursuing a mathematical, model-based approach to the undiminished sequencing working and has completed a nonproprietary data alley from gel images through complete sequence as prototype code. In a coffer discussion, States great that the group’s lane-tracking utility is accessible from their ftp site and that trace extraction and extra code, still in the developmental stage, is accessible to collaborators.
Sequence Analysis
Computers discourse about stretches of DNA sequence whereas patterns that identify allied biologically important features as protein-coding regions, regulatory areas, and RNA unite sites. Other computer tools are used to compare a new sequence (i.e., a gene) against all offbeat entries in a database and retrieve lump homologous sequences that have already been entered.
The popular goblet and GenQuest servers lie genes and other biologically important features in event also search other databases for homologous DNA and protein sequences and structural motifs. Over 17 meg bases of sequence are processed each month repercussion 13,000 sessions. GRAIL, a comprehensive gadget that uses synthetic proficiency and laptop learning to recognize many different signals, allows the gadget to incorporate more personality relationships between the data than ability stand for anticipated a priori. GenQuest takes information generated by GRAIL and compares it with data in protein, DNA, further motif databases.
Ed Uberbacher and Richard mural (both at Oak overhang national Laboratory) supplied the latest version (1.3) of grail. This version features more desirable sensitivity and splice-site accuracy, better performance mark AT-rich regions, new analysis programs considering four model organisms, frameshift detection, batch processing, and a wider variety of ways to access GRAIL automatically or interactively. GRAIL 1.3 also builds annotation reviews; processing a cosmid or an even larger sequence can be executed in under an hour. Responding to requests considering a more automated approach for goblet analysis, the neighborhood will provide unix socket access to all purposes of the spare version (http://compbio.ornl.gov/Grail-1.3/).
Randy Smith [Baylor College of Medicine (BCM)] described efforts to improve consumer access to the wide variety of database-search gear available on WWW. The BCM explore Launcher features a single point of entry for related searches, the addition of hypertext links to results returned through faraway servers, and a batch client.
Smith outlined peculiar activities of the BCM group. FASTA-SWAP, a other pattern-search tool over databases, improves sensitivity and specificity to help detect related sequences. BEAUTY, an enhanced version of the BLAST database-search program, improves access to information about the functions of matched sequences and incorporates additional hypertext links. The graphical displays allow correlation of hunt for positions ensconce annotated domain positions.
Future plans include developing a post-processor version of BEAUTY and providing access to guidance from and direct links to divers databases, including organism-specific databases. The BCM neighborhood is also furnishing peripheral analysis services to the Genome Sequence Data build corollary annotator. Human Genome Postdoctoral fellow ticket Graves (BCM) reported on a simple database-management device for biologists to use leverage crafty their avow laboratory databases (mgraves@bcm.tmc.edu).
Gary Stormo (college of Colorado) described an approach now predicting coding areas domination genomic DNA; it uses multiple types of evidence, combines them into a single scoring function, and by-product both optimal and ranked suboptimal solutions. The advance is robust to substitution errors but susceptible to frameshift errors. Stormo’s neighborhood is now exploring methods for predicting different classes of sequence regions, especially promoters.








