This directory has computed annotations for the first draft genome release (Acyr_1.0) of the pea aphid, Acyrthosiphon pisum. These data have been produced at the Genome Informatics Lab, Indiana University, by Don Gilbert, June 2008. Find out more at http://insects.eugenes.org/species/ The source data are: - Acyr 1.0 genome assembly from Baylor CM [Dec. 2007] - EST data set for this species as of 2008-06 from NCBI Genbank [count: ~160,000] - Protein set from two "nearby" species, Nasonia wasp (~25k) and Daphnia crustacean (~30k) (off the root end of insect tree) The analyses tools used include: - PASA for EST assembly and production of cDNA-gene models, - NCBI BLAST for locating proteins roughly (tblastn), then annotating predicted proteins (blastp). - exonerate to provide refined gene mappings of BLAST-located proteins. - Augustus, trained on cDNA genes and using cDNA hints (optionally also mapped protein hints) to call genes. - Snap, trained on same cDNA genes, to also call genes that Augustus may miss. - EvidenceModeler to combine the gene calls and mix in cDNA, protein mapped evidence to give a "final" model set. A complete automated prediction and annotation gene set is now available in the annotation/ section of http://insects.eugenes.org/aphid/data/ (July 2008). See there notes on methods and contents, generally drawn from thee above data, including annotaiton/Aphid-annotation-notes.txt This is a test case for a GMOD Genome Grid community genome analysis system, http://gmod.org/Genome_grid What I hope this will offer those with a new genome or wanting to update an old one, is the ugeneral equivalent of what NCBI and some sequencing centers do in their experienced way. Smaller groups and labs now able to sequence their favorite genome(s), but generally lack the resources (human and computational) to easily compute a genome annotation. The methods used here are similar to those developed at TIGR and now continued at Institute for Genome Sciences at U of Maryland Sch. of Medicine (Owen White and colleagues) and elsewhere. The tools are open-source, tested genome analysis/annotation tools, that will work in on TeraGrid cyberinfrastructure, using genome data partitioning/parallelization.