DroSpeGe About Arthropods BLAST BioMart Maps Data News

Gene variation by Gene Ontology group in Drosophila genomes

Version 9 (Dec 2006).
See also older Version 8 analysis (May 2006)

Potential gain and loss of gene functions and biological processes among species genomes is shown. These suggest where species genes differ in functional categories. Statistically significant deviations are brightly colored. 'Lost genes' may be due to divergence in genes rather than loss. 'Gained genes' should be interpreted with caution also, as these data appear to contain a variety of effects (noted below).

Target genomes Fruitfly (Dmel) query proteins Mouse query proteins Worm query proteins
Drosophila-12 Dros-12-Fruitfly Dros-12-Mouse Dros-12-Worm
C. elegans, Daphnia pulex, and 3 Drosophila Arthropod-5-Fruitfly Arthropod-5-Mouse Arthropod-5-Worm
In each above folder, the files index-brief.html provides a text table, and index-summary.html provides a graphic table, of species gains and losses in molecular function, biological processes and cellular components. Each of these indexes links each GO term to a gene detail table with counts and links to genome maps showing the found gene matches.

Genome views of example duplications
Alternate exons (Dvir) New gene, parts of 4 reproductive genes (Dsec) New gene 2, parts of 4 reproductive genes (Dsec) Duplicate gene (Dpse and Dper) Duplicate region (Dper, not Dpse)


  • The "gene match counts" here are High-scoring Segment Pair (HSP) groupings, and include various events: gene duplications, alternate splice exons within genes, new genes that appear composed of exons from known genes, as well as computational artifacts (see notes below). The detail pages provide links to GBrowse genome map views showing all secondary HSPs.

  • New: Use of Gene-level GO associations. The prior analysis (v8) collapsed gene-level GO associations into higher-level GO categories (GO-Slim groupings). Retaining the gene-level GO terms yields clearer phylogenetic changes, if in more GO classes with fewer contributing genes. Species changes of different directions at the detailed level cancelled out when collaped into higher level GO classes.

  • New: Gene HSP matches were first mapped onto predicted genes, to better group HSPs into same or distinct genes. Secondary HSPs overlapping on the same predicted gene were eliminated for analyses. This reduces total number of HSP data, and reduces the number of apparent duplications, yeilding a clearer distinction among gene gain/loss events.

  • All protein matches for tBLASTn, probability <= 1e-3, includes duplicate matches. Low score matches contained in the location of better matches are removed.

  • Gene counts are based on High-scoring Segment Pair (HSP) groupings, where the group is determined from overlap of query protein parts, and target genome overlaps. Included are HSP groups that are distinct protein parts in the same gene region (alternate exons), as well as protein parts found at distinct genome locations. The data includes computational artifacts, esp. where paralogs exist, a secondary HSP group for paralog-A can partially overlap primary HSP matches to paralog-B.

  • Proteome source subsets are those organism with extensive GO annotations: Dmel, Mouse, Worm, Yeast
    Target genomes analyzed include Drosophila species along with outgroup species Ano. gambia, Daphnia pulex and C. elegans

  • Data tables, genes6-data, used in this analysis, extracted from BLAST output:
     go3-spp-modCE.tab   : Gene HSP match counts for species genomes per GO class, C.elegans source proteins
     gene-gop3-modCE.tab : Association table of WormBase gene ID x GO class, Ngenes=9670
     go3-spp-modMM.tab   : Gene HSP match counts for species genomes per GO class, Mouse source proteins  
     gene-gop3-modMM.tab : Association table of MGI gene ID x GO class, Ngenes=9919 
     go3-spp-modDM.tab   : Gene HSP match counts for species genomes per GO class, Fruitfly (Dmel) source proteins
     gene-gop3-modDM.tab : Association table of FlyBase gene IDs x GO class, Ngenes=8638
     gocatall.tab : Table of GO IDs, GO-Parent class, and term 
    NGenes used in analyses are 1/2 to 2/3 of the total proteome data set used in BLAST matching, due to subsequent filtering by coincident gene predictions and GO associations.

  • Exploratory Mosaic plots, used in preliminary analyses, showing high level GO-Slim groupings for all target species genomes by source proteomes.
  • Software (Perl and R-statistics) for these analyses (documentation is lacking).

Don Gilbert, December 2006
      Name                         Last modified       Size  Description

[DIR] Parent Directory 13-Aug-2007 19:47 - [DIR] duplgene-examples/ 10-May-2006 12:14 - [DIR] egstats8/ 02-Jan-2007 17:15 - [DIR] genes6-cele5-fruitfly-v9g3/ 01-Jan-2007 21:15 - [DIR] genes6-cele5-mouse-v9g3/ 01-Jan-2007 21:13 - [DIR] genes6-cele5-worm-v9g3/ 01-Jan-2007 21:08 - [DIR] genes6-data/ 31-Dec-2006 21:11 - [DIR] genes6-dmel12-fruitfly-v9g3/ 01-Jan-2007 21:16 - [DIR] genes6-dmel12-mouse-v9g3/ 01-Jan-2007 21:16 - [DIR] genes6-dmel12-worm-v9g3/ 01-Jan-2007 21:16 - [DIR] mosaic-plots/ 31-Dec-2006 21:16 - [DIR] soft/ 01-Jan-2007 00:22 -

Developed at the Genome Informatics Lab of Indiana University Biology Department