DroSpeGe About Arthropods BLAST BioMart Maps Data News

Index of /arthropods/data/polymorphism

      Name                                        Last modified       Size  Description

[DIR] Parent Directory 05-Dec-2009 18:17 - [TXT] acyr1_aug5.genes.mblastdupsc-pi80.genes 22-Sep-2008 00:31 148k [TXT] acyr1_gnomon.genes.mblastdupsc-pi80.genes 23-Sep-2008 19:26 200k [TXT] acyrgeno-smallsc-mblastdupsc-pi80.ids 22-Sep-2008 00:25 89k [TXT] acyrgeno-smallsc.mblast.stats 22-Sep-2008 15:03 951k [TXT] arp-acyr1_gnomon-pi80dupgene.tab 23-Sep-2008 20:39 11k [TXT] dpulex1_gnomon.genes.mblastdupsc-pi80.genes 22-Sep-2008 00:29 15k [TXT] dpulexgeno-smallsc-mblastdupsc-pi80.ids 22-Sep-2008 00:26 4k [TXT] dpulexgeno-smallsc.mblast.stats 22-Sep-2008 14:58 246k [TXT] nasoniageno-smallsc-mblastdupsc-pi80.genes 23-Sep-2008 19:20 20k [TXT] nasoniageno-smallsc-mblastdupsc-pi80.ids 23-Sep-2008 19:20 11k [TXT] nasoniageno-smallsc.mblast.stats 23-Sep-2008 19:06 160k


Estimate of polymorphic (spurious) scaffolds in new genome assemblies.
Re http://insects.eugenes.org/arthropods/data/polymorphism/
2008 Sept 23, D. Gilbert

Method:
  1. Select subset of scaffolds under 10Kb, as these are often ones with partial coverage,
     inability to assemble due to redundancy with existing larger scaffolds. For Daphnia
     with scaffold read coverage stats, these are almost all < 4x (many 1x) of the 8x coverage.
  2. Megablast match small scaf to full genome assembly
  3. Combine HSP/scaffold and select those small scaf with > 80% overall identity to larger scaffold
     (other criteria could be used).

Genomes tested: Acyr. pisum, Nasonia vit., Daphnia pulex,

Results:
Files *smallsc.mblast.stats summarize megablast results as
small (querysc) x large (targetsc) with query length, blast align, mismatch, gap, and % identity
-------- dpulexgeno-smallsc.mblast.stats ----- 
querysc  	targs    c	qlen	align	mismat	gaps	ident	%ident	targloc
scaffold_764	scaffold_65	9938	2770	45	23	2698	27	17302-18434,...
scaffold_821	scaffold_257	9208	3273	46	13	3206	34	77559-80732,38245-38162,
scaffold_872	scaffold_65	8340	8145	15	6	8123	97	101197-98515,....
--------
Files *smallsc-mblastdupsc-pi80.ids list scaffold ids with >80% identity to larger scaffold
In the above case, Daphnia scaffold_872 is highly identical to part of scaffold_65
Files *gnomon.genes.mblastdupsc-pi80.genes list genes on these pi80 scaffolds.

Scaffold counts of putative spurious (polymorphic) scaffolds
  6197 acyr1-geno-smallsc-mblastdupsc-pi80.ids
   821 nasonia1-geno-smallsc-mblastdupsc-pi80.ids
   233 dpulex1-geno-smallsc-mblastdupsc-pi80.ids

Gene counts on putative spurious (polymorphic) scaffolds
  1866 acyr1_gnomon.genes.mblastdupsc-pi80.genes  
   208 nasonia1_gnomon.genes.mblastdupsc-pi80.genes
    89 dpulex1_gnomon.genes.mblastdupsc-pi80.genes

These are likely underestimates: criteria selected <10kb scaffold size (as most likely
partial-coverage heterozygous assemblies), and >80% total scaffold identity (where
high identity on part of scaffold may indicate a heterozygous contig).


Developed at the Genome Informatics Lab of Indiana University Biology Department