DroSpeGe About Arthropods BLAST BioMart Maps Data News

Index of /genes2/pea_aphid2/genes

      Name                        Last modified       Size  Description

[DIR] Parent Directory 16-Jul-2011 12:40 - [   ] aphid2_evigene8f.tr.gz 05-Jun-2011 23:50 24.1M [   ] aphid2_evigene8f.cds.gz 05-Jun-2011 23:52 14.0M [   ] aphid2_evigene8f.gff.gz 05-Jun-2011 23:45 11.0M [   ] aphid2_evigene8f.aa.gz 05-Jun-2011 23:41 8.9M [   ] aphid2_evigene8f.tbl.gz 05-Jun-2011 23:16 3.8M [TXT] evigene_aphid2.conf 30-May-2011 18:08 15k [TXT] evigene_aphid2ndary.conf 17-Apr-2011 11:49 5k [TXT] aphid2_evigene8e.readme.txt 03-Jun-2011 19:13 2k [DIR] quality/ 03-Jun-2011 19:12 - [DIR] other/ 03-Jun-2011 18:47 - [DIR] aphid2_genemodels/ 03-Jun-2011 19:09 - [DIR] aphid2_evigene3_2010/ 03-Jun-2011 19:09 -

Evidential Gene for Pea aphid assembly 2
June 2011, by Don Gilbert

aphid2_evigene8e.gff  : annotated gene models, GFFv3 format
aphid2_evigene8e.tbl  : table of gene annotations, tabbed
aphid2_evigene8e.aa   : fasta sequence of aa (proteins), tr (transcript na), cds (coding na),

quality/         : gene quality information, including validated chimeric splits o ACYPI v1 genes

other/           : additional gene models and supporting information

  Names are derived from protein homology to Uniprot of May 2011, uniref50-arthropods,
  and related named gene data sets.  Match criteria to name of >33% alignment is used, and noted 
  on names as (nn%).

  A small set of curated proteins are included, mostly from chimeric splits,
  that cannot be computed from gene.gff.  See quality=Protein:curated flag

  GFF format is 3 level (gene/mRNA/exon,CDS) with alternate transcripts flagged as isoform=N,
  and ID=...t1,t2,t3 to indicate alternates.  All primary models have ID=t1 suffix, but may not
  be "best" form (longest protein).

  Long introns in gene models are all evidence supported from rna/est assemblies
     many are  > 20kb, a few >100kb, > 35 genes span over 250kb (more than bee, but same ballpark)

  False UTRs were worked over, and many but not all removed.
     These  extend into next gene, or include introns, sometimes many utr-exons.
     These are areas of high expression, joined to gene ends when should not be, 
     or coding section broken artifactually to non-coding (artifactually);
     e.g. commonest in est/rna-assemblies by PASA, cufflinks

  Chimera/split genes from version 1: 1000 computed but <100 validated,
     some matched alternate models.  These include a few well known genes like 
     dicer-1, maleless, sex-determining fem-1
   Chimeric genes are  entered 2 times in genes.gff, with 2 separate IDs, to conform
   to GFF format requirements.  Protein is listed only once.  See annotation chimeria=1,2

Developed at the Genome Informatics Lab of Indiana University Biology Department