Welcome maize researcher!


This is an exciting time for maize researchers as significant portions of the maize genome have been sequenced. It has reached >93% sequence coverage and although its assembly still has a long ways to go, there are already two completed grass genomes, rice and sorghum, to which the current sequence can be compared. With several more grass species also in progress, maize researchers (and grass researchers in general) have an opportunity to leverage these multiple orthologs to study their favorite genes. However, with the plethora of genomic sequence comes the problems of finding your sequence in a given genome, finding putative paralogs and homeologs in the parent genome, finding putative orthologs in related species, and how best make sense of all the sequence data. CoGe, a comparative genomics toolset developed by the Freeling Laboratory in the Department of Plant Biology at the University of California, Berkeley aims to simplify these processes. Eric Lyons is the lead developer of CoGe and has prepared this maize-centric tutorial to walk you through CoGe in order to:


  1. 1.Identify your sequence in the partially sequenced and partially assembled maize genome.

  2. 2.Identify putative orthologs in related grass species.

  3. 3.Use comparative genomics to find regions of synteny (hence validating the putative orthologs).

  4. 4.Use comparative genomics to find conserved non-coding sequences in orthologous sequences.

  5. 5.Uncover the process of fractionation of conserved non-coding sequence.

  6. 6.Discover subfunctionalization of conserved non-coding sequences for your sequence.


The evolutionary history of the maize genome is a bit complicated. The grass lineage underwent a tetraploidy event ~80Mya and maize subsequently underwent another tetraploidy event ~10 Mya. These genome-wide duplication events create a contemporaneous copy of every genomic feature (e.g. gene) in the genome. Over evolutionary time, most of the duplicated features are lost through the process of fractionation and those that are retained in duplicate an subjected to subfunctionalization. Through the use of comparative genomics, you can identify syntenic regions from these tetraploidy events in the maize lineage, as well as syntenic regions in other grass species in order to understand and characterize the types of evolution a genomic region may have experienced. [If any of these terms are unfamiliar, please refer to CoGe's Definition page for more details.]


This tutorial begins with a putative coding sequence from maize identified in an unordered contig sequenced by The Maize Sequencing Consortium and deposited in NCBI under the accession AC210314.1. This contig is typical of in-progess sequencing projects as it has 21 unordered and unannotated pieces. The specific sequence from this contig can be downloaded here. The tutorial that follows uses this maize sequence, but feel free to follow along with your own favorite sequence.



CoGe's Homepage


When you first enter CoGe's website, you'll have to login. If you have a CoGe user account, enter your user name and password, otherwise press the "Use Public Login" button. Once in, you'll be sent to CoGe's home page (either the animated version or the flat-text page).

From the homepage you can use the links to go to some of CoGe's tools. When a researcher already has a sequence and is interested in finding the sequence (or homologs) in a genome, CoGeBlast is a good place to start.


Photos credit: Damon Lisch 2007