Gene variation by Gene Ontology group in Drosophila genomes
Version 9 (Dec 2006).
See also older Version 8 analysis (May 2006)
Potential gain and loss of gene functions and biological processes
among species genomes is shown.
These suggest where species genes differ in functional categories.
Statistically significant deviations are brightly colored.
'Lost genes' may be due to divergence in genes rather than loss.
'Gained genes' should be interpreted with caution also, as
these data appear to contain a variety of effects (noted below).
In each above folder, the files index-brief.html provides a text table,
and index-summary.html provides a graphic table, of
species gains and losses in molecular function, biological processes and
cellular components. Each of these indexes links each GO term
to a gene detail table with counts and links to genome maps showing
the found gene matches.
Notes:
-
The "gene match counts" here are High-scoring Segment Pair (HSP) groupings,
and include various events: gene duplications, alternate
splice exons within genes, new genes that appear composed of exons
from known genes, as well as computational artifacts (see notes
below). The detail pages provide links to GBrowse genome map views
showing all secondary HSPs.
- New: Use of Gene-level GO associations. The prior
analysis (v8) collapsed gene-level GO associations into higher-level
GO categories (GO-Slim groupings). Retaining the gene-level GO terms
yields clearer phylogenetic changes, if in more GO classes with fewer
contributing genes. Species changes of different directions at the
detailed level cancelled out when collaped into higher level GO
classes.
- New: Gene HSP matches were first mapped onto predicted
genes, to better group HSPs into same or distinct genes. Secondary
HSPs overlapping on the same predicted gene were eliminated for
analyses. This reduces total number of HSP data, and reduces the
number of apparent duplications, yeilding a clearer distinction among
gene gain/loss events.
- All protein matches for tBLASTn, probability <= 1e-3, includes duplicate matches.
Low score matches contained in the location of better matches are removed.
- Gene counts are based on High-scoring Segment Pair (HSP) groupings, where the
group is determined from overlap of query protein parts, and target genome overlaps.
Included are HSP groups that are distinct protein parts in the same gene region (alternate
exons), as well as protein parts found at distinct genome locations. The data includes
computational artifacts, esp. where paralogs exist, a secondary HSP group for paralog-A
can partially overlap primary HSP matches to paralog-B.
- Proteome source subsets are those organism with extensive
GO annotations: Dmel, Mouse, Worm, Yeast
Target genomes analyzed include Drosophila species along with outgroup species
Ano. gambia, Daphnia pulex and C. elegans
- Data tables, genes6-data, used in this analysis,
extracted from BLAST output:
go3-spp-modCE.tab : Gene HSP match counts for species genomes per GO class, C.elegans source proteins
gene-gop3-modCE.tab : Association table of WormBase gene ID x GO class, Ngenes=9670
go3-spp-modMM.tab : Gene HSP match counts for species genomes per GO class, Mouse source proteins
gene-gop3-modMM.tab : Association table of MGI gene ID x GO class, Ngenes=9919
go3-spp-modDM.tab : Gene HSP match counts for species genomes per GO class, Fruitfly (Dmel) source proteins
gene-gop3-modDM.tab : Association table of FlyBase gene IDs x GO class, Ngenes=8638
gocatall.tab : Table of GO IDs, GO-Parent class, and term
NGenes used in analyses are 1/2 to 2/3 of the total proteome data set used in
BLAST matching, due to subsequent filtering by coincident
gene predictions and GO associations.
- Exploratory Mosaic plots, used in preliminary analyses,
showing high level GO-Slim groupings for all target species genomes by source proteomes.
- Software (Perl and R-statistics) for these analyses (documentation
is lacking).
Don Gilbert, December 2006
|