DroSpeGe About BLAST BioMart Maps Data News

Index of /species/data/acyr/augustus

      Name                       Last modified       Size  Description

[DIR] Parent Directory 18-Jul-2008 15:12 - [   ] acyr-augmap4a.aa.gz 13-Jun-2008 11:31 6.3M [   ] acyr-augmap4a.gff.gz 13-Jun-2008 11:31 9.9M [   ] acyr-augmap5c-best4.aa.gz 18-Jul-2008 15:16 6.5M [   ] acyr-augmap5c-best4.gff.gz 18-Jul-2008 13:48 15.4M [   ] acyr-augrun4a.gff.gz 13-Jun-2008 11:25 14.7M [   ] acyr-cdna.hints.gz 05-Jun-2008 11:26 1016k [   ] acyr-estval.hints.gz 05-Jun-2008 11:21 6.0M [DIR] acyrthosiphon_pisum/ 09-Jun-2008 21:57 - [TXT] augextrun5.cfg 05-Jan-2008 14:02 2k


NOTES on Augustus predictions for pea aphid Acyr1.0 genome

acyr-augmap5c-best4.gff.gz  : Augustus+cDNA+proteins+Gnomon  (July 2008)
acyr-augmap5c-best4.aa.gz   : model proteins
        : acyr-augmap5c is a second prediction set using Augustus in combiner mode
         with a full range of evidence, including NCBI Gnomon models, exonerate mapped 
         proteins from Tribolium, Nasonia and Daphnia, and all pea aphid EST assemblies. 
         This set also includes gene annotations derived from UniProt and NCBI-NR blast
         best matches.

acyr-augmap4a.gff.gz : Augustus+cDNA gene models  (June 2008)
acyr-augmap4a.aa.gz  : model proteins
acyr-augrun4a        : this has the same data as above two, and is the direct output of augustus
acyrthosiphon_pisum/ : augustus config/species/ training set, from optimize run using pasa_out cDNA genes
acyr-cdna.hints      : full cDNA assembly genes (pasa_out) for aug hints guidance
acyr-estval.hints    : validated EST matches (pasa_out)
augextrun5.cfg       : hints weights

The acyr-augmap4a run  used the cDNA trained augustus HMM and acyr-cdna.hints.  It did not
use acyr-estval.hints (un-assembled est data that produced acyr-cdna.hints), nor any 
mapped protein hints. 

The acyr-augmap5c-best4 run appears to be a better prediction set, with mapped proteins
and alternate gene models calling genes missed by the augmap4a run.  Agreement with
the NCBI Gnomon set is fairly high.  This set may prove slightly better than Gnomon
for many genes due to its use of the two new proteome sets from Tribolium and Nasonia. 
Statistics will follow.

Developed at the Genome Informatics Lab of Indiana University Biology Department