This example illustrates, however, how inclusion of homologous ESTs and proteins of non-native origin enhance the coverage attainable by spliced alignment and can improve the accuracy of gene annotation. Fifteen non- Arabidopsis sequences could be reliably aligned.

The tiling of the spliced alignments covering the entire region supports the existence of a single gene in the region. Strikingly, an open reading frame spans all 27 predicted exons. Figure 2 D summarizes the evidence, strongly supporting this region to encompass a single gene for a transportin-SR protein. Genomic sequence assemblies submitted by high-throughput sequencing projects as draft quality sequences are a great resource for the community even prior to complete annotation.

For example, a researcher may identify a BAC clone by hybridization with a probe representing a gene of interest. Rather than waiting for annotation of the corresponding sequence by the sequence providers, this researcher will welcome tools to annotate the region encoding his or her gene.

Of course, such annotation may still be difficult for the same reasons confounding gene structure annotation on a large scale. To illustrate applications of this type, we briefly discuss annotation of a segment of genomic sequence from Sorghum bicolor GenBank accession AF After initial annotation using the spliced alignment of the TUG consensus sequences, a more detailed analysis of individual regions may always be performed to visualize constituent spliced alignments or to probe for interesting biological properties such as evidence of alternative splicing.

In this example, five new gene models were identified. Figure 3 shows gene structure prediction in a representative kb segment. The three loci putatively represent a mitochondrial carrier tricarboxylate transport protein, subunit 1 of a cleavage stimulation factor and a serine threonine kinase based on Blastp analysis against the NCBI non-redundant protein database. The GeneSeqer PlantGDB web service provides a convenient workbench interface to the tools necessary for complex gene structure annotation tasks.

Immediate access to expansive transcribed sequence data collections and interactive visualization of spliced alignment results gives users the ability to generate high-quality gene structure annotation without specific bioinformatics training or experience. The Web service is not meant to replace offline annotation pipelines. Figure 1. The server implements a simple four-step protocol.

Steps 1 and 2 were omitted from the figure for clarity. Figure 2. Refined analysis of an existing gene annotation. A , B and C depict three stages in the annotation of gene structure for a region of A. In each panel, spliced alignments and inferred established gene structures are represented by arrows extending from the first exon to the last, pointing in the most probable direction of transcription.

Exons are represented as boxes connected by introns shown as single lines. A In this display, which is available for all A. The current gene annotations, as established by AGI Arabidopsis Genome Initiative , are shown in blue, with start and stop codons labeled with green and red triangles, respectively.

The National Plant Genome Initiative: Objectives for 2003-2008

Alternative gene structures are shown in green representing consistent predictions from multiple ESTs and long open reading frames in the predicted gene structures are indicated in orange. Established gene annotation, as reported in GenBank, is shown in blue. D summarizes the most probable gene structure prediction for this region. The blue, orange, green and red structures represent respectively the established gene annotation, the longest predicted open reading frame, the predicted gene structure and the consensus transcribed sequence spliced alignment derived from B.

The purple structure represents the alignment with the most closely related protein, a Drosophila protein GenBank gi: with high similarity to vertebrate transportin-SR proteins. Figure 3. Large-scale annotation by spliced alignment. Graphical representations are as in Figure 2. Oxford University Press is a department of the University of Oxford.

The National Plant Genome Initiative: Objectives for

The Arabidopsis Information Resource. Rice Annotation Database. The International Populus Genome Consortium. JGI Populus trichocarpa v1. Grape Genome Browser. Trends Plant Sci. Friedman WE: Embryological evidence for developmental lability during early angiosperm evolution. Plant Cell. Genome Res. BMC Plant Biol. Am J Bot.

Mol Biol Evol. Download references.

Correspondence to Pamela S Soltis. Reprints and Permissions. Search all BMC articles Search. Abstract The nuclear genome sequence of Amborella trichopoda , the sister species to all other extant angiosperms, will be an exceptional resource for plant genomics.

Figure 1. Full size image. Figure 2. Figure 3. References 1.


