Drosophila yakuba Genome Data:
Name Last modified Size Description
Parent Directory 19-Jun-2008 13:51 -
PASA_EST -
blast/ 02-Sep-2006 20:15 -
dmel-dyak2-algn.stats 21-Nov-2005 13:43 1k
dyak-chromosomes-caf1-060302.fa.count 18-Jul-2006 17:23 1k
dyak-chromosomes-caf1-060302.fa.gapcount 04-Sep-2006 12:54 1k
dyak2-dmel-algn.stats 18-Dec-2005 18:36 2k
fasta/ 31-May-2006 18:45 -
gff/ 14-Mar-2007 14:06 -
Subject: BRIEF INFO about these DroSpeGe Annotations
2007 Nov:
Basic analysis methods and software scripts used for DroSpeGe are provided at
http://gmod.cvs.sourceforge.net/gmod/genogrid/drospege/
This repository however doesn't cover all of the detailed analyses, many of
which are one-time or exploratory analyses scripts, not well suited for
general use.
21 Sep 2005:
... I've done tblastn of 9 eukaryote proteomes against the new dros. assemblies.
Documentation is absent just now (in my working folders, manuscript
is being written now...), but you are welcome to ask me what you need to
know.
The GFF feature locations for each d. species are at
http://insects.eugenes.org/species/data/dana/gff/ {etc. for each /dxxx/ species}
dana-dmel-algn.gff.gz 15-Aug-2005 16:36 427k = summarized dmel dna blastn aligments
dana-dmel-dna.gff0.gz 10-Aug-2005 18:59 7.0M = all hits of dmel dna blastn aligments
dana-markers.gff.gz 20-Aug-2005 16:51 12k = dmel marker genes
dana-prot9.gff.gz 15-Aug-2005 20:39 15.0M = 9 proteome tblastn
dana-scaffolds.gff 09-Aug-2005 20:38 880k = dspp-scaffolds/chromosomes as gff
dros-dana-micsat.gff.gz 20-Aug-2005 12:32 17k = dros. microsattelites
Raw data is at ftp://ftp.eugenes.org/eugenes/genomes/
in folders per-organism, per-assembly (e.g. dana1,dana2,..)
dana-scaffolds.gff 880 KB 8/9/05 --- dspp-scaffolds/chromosomes as gff
dana_ag01aug05db.tgz 59445 KB 8/5/05 --- dspp-scaffolds blast database
danadmelc.blout.tgz 25813 KB 8/10/05 --- dspp-dna x dmel-dna blastn output (format #9)
dana-dmel-dna.gff.gz 7206 KB 8/10/05 --- dspp-dna x dmel-dna blastn to gff
danaprot9.blout.tgz 40232 KB 8/10/05 --- dspp-dna x 9 euk. proteomes tblastn output (format #9)
dana-prot9.gff.gz 12306 KB 8/10/05 --- dspp-dna x 9 euk. proteomes tblastn to gff
Ditto for other dros. species.
The prot9 data set = Dros. melanogaster, C.elegans, Bee, Mosquito,
Human, Mouse, Zebrafish, Arabidopsis, Yeast,
(find at ftp://ftp.eugenes.org/eugenes/proteomes/)
March 2006:
The annotations at DroSpeGe now include a set of gene predictions
made with SNAP (I.Korf) for Drosophila species genomes. These where
generated using HMM files for SNAP that were bootstrapped using Ian's
D.melanogaster.hmm on each species genome for an initial prediction
set, then that initial set was used to train the HMM predictor
for each species.
- Don Gilbert
E.g. http://insects.eugenes.org/species/data/dana/gff/
snap-dana_caf051209.aa.gz 07-Feb-2006 19:29 6.3M -- Gene prediction proteins
snap-dana_caf051209.gff.gz 10-Feb-2006 20:15 2.0M -- GFF gene + exon predictions
snap-dana_caf051209.hmm 07-Feb-2006 18:38 45k -- Bootstrapped HMM for SNAP predictions
snap-dana_caf051209.tr.gz 07-Feb-2006 19:29 10.0M -- Gene prediction transcripts
The script used to generate these is
http://insects.eugenes.org/species/data/work/snap-predictions/snapmake.script
Jan 2006:
> This raises a general issue that I think is almost sorted out with the
> CAF assemblies. Unfortunately until we have a single, clearly marked,
> MD5 checksummed fasta file for each species, I think there is still
> room for some error in merging information among groups.
I generate the MD5 and SwissProt/EMBL CRC64 checksums in headers
for my uses of these assemblies.
These are the comparative annotation freeze 1 (CAF1) assemblies. My
blast results are available here as GFF for others who want to use.
Index of ftp://ftp.eugenes.org/eugenes/genomes/
Directory: dana3 1/12/06 7:52:00 PM
Directory: dere3 1/12/06 7:54:00 PM
Directory: dgri3 1/12/06 7:59:00 PM
Directory: dmel4 11/4/05 11:06:00 PM
Directory: dmoj3 1/12/06 7:50:00 PM
Directory: dper1 11/11/05 6:11:00 PM
Directory: dpse2 1/11/06 5:26:00 PM
Directory: dsec1 11/11/05 6:10:00 PM
Directory: dsim2 1/11/06 5:29:00 PM
Directory: dvir3 1/12/06 7:26:00 PM
Directory: dyak3 1/12/06 5:15:00 PM
For example,
Index of ftp://ftp.eugenes.org/eugenes/genomes/dere3
File: dere_caf051209.fa.gz << original scaffolds.bases with expanded headers
File: dere_caf051209shred.fa.gz << shredded to 50KB chunks with overlap for grid blasts
File: dere_caf051209shreddb.tgz << ncbi blast db
File: dere_scaffolds.gff << all the scaffold headers with checksums
dere_caf051209.fa.gz:
>scaffold_1 Drosophila erecta scaffold_1 WGS
CRC64=79943112DB1F8801; MD5=eb75e438205017245952738d1cc924b2; size=1468;
dere_scaffolds.gff:
scaffold_1 dere_caf051209 chromosome 1 1468 . + .
ID=scaffold_1;Drosophila+erecta+scaffold_1+WGS;
CRC64=79943112DB1F8801;MD5=eb75e438205017245952738d1cc924b2;size=1468
- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/
|