A parallel effort by Celera Genomics to shotgun-sequence entire genomes, including human, required considerable investment in proprietary software development to avoid the pitfalls of coassembly of regions that are similar in sequence but reside in distant regions of the genome. Each platform has disparate output and unique error profiles, negating a Swiss-army-knife approach to universal base-calling and sequence analysis. Cloud-computing solutions providing internet access to large clusters of computers offer some hope of accessing data pipelines in conjunction with significant hardware power at a reasonable cost. Solving these issues may simply shift the software gap from sequence processing (base-calling, alignment or assembly, positional counting and variant detection) to sequence analysis (annotation and functional impact). Although the data-collection phase of DNA sequencing was greatly simplified by the advent of capillary-based fluorescence sequencers, the massive scale of templates needed for production-scale sequencing limited highthroughput DNA sequencing to a relatively few, specialized laboratories. Although this is an impressive first for next-generation sequence assemblers, there is room for much improvement, as the effort generated more than 7 million contigs longer than 100 bp and 680,000 contigs longer than 1,000. Nextgeneration platforms now put the power of high-throughput sequencing within the grasp of a single investigatorled laboratory or core facility. One approach is a mapped assembly in which sequence reads are first aligned to a reference genome and a consensus sequence generated for the new genome. By far the largest gap between DNA sequencing and analysis was seen in annotation and visualization, with several groups scrambling to package the new human genome in a usable format (ucsc Genome Browser, Ensembl Genome Browser and ncbi Map Viewer). Next-generation sequencing in many cases will not provide the answer but rather is only one of many investigational tools needed. Second generation sequencing technology promises to deliver costeffective genome coverage in the very near future, but a software and computational hardware gap for de novo assembly is likely to lag these developments.
Chromatin immunoprecipitation (ChIP) has traditionally been analyzed using microarrays but is also readily replaced by direct sequencing of the captured material (ChIP-seq25).
Unfortunately, the software and computer hardware demands on these analyses are not much less than those of the large Genome Centers.