DroSpeGe About Arthropods BLAST BioMart Maps Data News
DroSpeGe: Drosophila persimilis genome data

Drosophila persimilis Genome Data:

      Name                       Last modified       Size  Description

[DIR] Parent Directory 19-Jun-2008 13:51 - [DIR] request/ 09-Sep-2006 17:11 - [DIR] gff/ 14-Mar-2007 14:04 - [DIR] fasta/ 26-Feb-2006 19:12 - [TXT] dper_caf060213.fa.gapcount 13-Feb-2006 19:18 1k [TXT] dper_caf060213.fa.count 13-Feb-2006 19:18 483k [TXT] dper1-dmel-algn.stats 22-Feb-2006 09:42 171k [TXT] dmel-dper1-algn.stats 22-Feb-2006 09:41 3k [DIR] blast/ 02-Sep-2006 20:11 -

Subject: BRIEF INFO about these DroSpeGe Annotations

2007 Nov:
Basic analysis methods and software scripts used for DroSpeGe are provided at

This repository  however doesn't cover all of the detailed analyses, many of
which are one-time or exploratory analyses scripts, not well suited for
general use.

21 Sep 2005:

... I've done tblastn of 9 eukaryote proteomes against the new dros. assemblies.
Documentation is absent just now (in my working folders, manuscript
is being written now...), but you are welcome to ask me what you need to 

The GFF feature locations for each d. species are at
  http://insects.eugenes.org/species/data/dana/gff/   {etc. for each /dxxx/ species}
dana-dmel-algn.gff.gz   15-Aug-2005 16:36   427k  = summarized dmel dna blastn aligments
dana-dmel-dna.gff0.gz   10-Aug-2005 18:59   7.0M  = all hits of dmel dna blastn aligments
dana-markers.gff.gz     20-Aug-2005 16:51    12k  = dmel marker genes
dana-prot9.gff.gz       15-Aug-2005 20:39  15.0M  = 9 proteome tblastn
dana-scaffolds.gff      09-Aug-2005 20:38   880k  = dspp-scaffolds/chromosomes as gff
dros-dana-micsat.gff.gz 20-Aug-2005 12:32    17k  = dros. microsattelites

Raw data is at ftp://ftp.eugenes.org/eugenes/genomes/
 in folders per-organism, per-assembly (e.g. dana1,dana2,..)
dana-scaffolds.gff        880 KB  8/9/05 --- dspp-scaffolds/chromosomes as gff
dana_ag01aug05db.tgz      59445 KB        8/5/05  --- dspp-scaffolds blast database
danadmelc.blout.tgz       25813 KB        8/10/05 --- dspp-dna x dmel-dna blastn output (format #9)
dana-dmel-dna.gff.gz      7206 KB         8/10/05 --- dspp-dna x dmel-dna blastn to gff
danaprot9.blout.tgz       40232 KB        8/10/05 --- dspp-dna x 9 euk. proteomes  tblastn output (format #9)
dana-prot9.gff.gz         12306 KB        8/10/05 --- dspp-dna x 9 euk. proteomes  tblastn to gff

Ditto for other dros. species.
The prot9 data set = Dros. melanogaster, C.elegans, Bee, Mosquito,
  Human, Mouse, Zebrafish, Arabidopsis,   Yeast,
(find at ftp://ftp.eugenes.org/eugenes/proteomes/)

March 2006:
The annotations at DroSpeGe now include a set of gene predictions 
made with SNAP (I.Korf) for Drosophila species genomes.  These where
generated using HMM files for SNAP that were bootstrapped using Ian's
D.melanogaster.hmm on each species genome for an initial prediction
set, then that initial set was used to train the HMM predictor
for each species.

- Don Gilbert

E.g. http://insects.eugenes.org/species/data/dana/gff/
 snap-dana_caf051209.aa.gz  07-Feb-2006 19:29   6.3M  -- Gene prediction proteins
 snap-dana_caf051209.gff.gz 10-Feb-2006 20:15   2.0M  -- GFF gene + exon predictions 
 snap-dana_caf051209.hmm    07-Feb-2006 18:38    45k  -- Bootstrapped HMM for SNAP predictions
 snap-dana_caf051209.tr.gz  07-Feb-2006 19:29  10.0M  -- Gene prediction transcripts

The script used to generate these is 

Jan 2006:
> This raises a general issue that I think is almost sorted out with the  
> CAF assemblies. Unfortunately until we have a single, clearly marked,  
> MD5 checksummed fasta file for each species, I think there is still  
> room for some error in merging information among groups. 

I generate the MD5 and SwissProt/EMBL CRC64 checksums in headers
for my uses of these assemblies.  
These are the comparative annotation freeze 1 (CAF1) assemblies. My
blast results are available here as GFF for others who want to use.

Index of ftp://ftp.eugenes.org/eugenes/genomes/
Directory: dana3 		1/12/06 	7:52:00 PM
Directory: dere3 		1/12/06 	7:54:00 PM
Directory: dgri3 		1/12/06 	7:59:00 PM
Directory: dmel4 		11/4/05 	11:06:00 PM
Directory: dmoj3 		1/12/06 	7:50:00 PM
Directory: dper1 		11/11/05 	6:11:00 PM
Directory: dpse2 		1/11/06 	5:26:00 PM
Directory: dsec1 		11/11/05 	6:10:00 PM
Directory: dsim2 		1/11/06 	5:29:00 PM
Directory: dvir3 		1/12/06 	7:26:00 PM
Directory: dyak3 		1/12/06 	5:15:00 PM

For example,
Index of ftp://ftp.eugenes.org/eugenes/genomes/dere3
File: dere_caf051209.fa.gz 	    << original scaffolds.bases with expanded headers
File: dere_caf051209shred.fa.gz 	 << shredded to 50KB chunks with overlap for grid blasts
File: dere_caf051209shreddb.tgz 	 << ncbi blast db
File: dere_scaffolds.gff       << all the scaffold headers  with checksums

  >scaffold_1 Drosophila erecta scaffold_1 WGS 
   CRC64=79943112DB1F8801; MD5=eb75e438205017245952738d1cc924b2; size=1468;

  scaffold_1	dere_caf051209	chromosome	1	1468	.	+	.	

- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/

Developed at the Genome Informatics Lab of Indiana University Biology Department