Index of /arthropods/data/polymorphism
Name Last modified Size Description
Parent Directory 05-Dec-2009 18:17 -
acyr1_aug5.genes.mblastdupsc-pi80.genes 22-Sep-2008 00:31 148k
acyr1_gnomon.genes.mblastdupsc-pi80.genes 23-Sep-2008 19:26 200k
acyrgeno-smallsc-mblastdupsc-pi80.ids 22-Sep-2008 00:25 89k
acyrgeno-smallsc.mblast.stats 22-Sep-2008 15:03 951k
arp-acyr1_gnomon-pi80dupgene.tab 23-Sep-2008 20:39 11k
dpulex1_gnomon.genes.mblastdupsc-pi80.genes 22-Sep-2008 00:29 15k
dpulexgeno-smallsc-mblastdupsc-pi80.ids 22-Sep-2008 00:26 4k
dpulexgeno-smallsc.mblast.stats 22-Sep-2008 14:58 246k
nasoniageno-smallsc-mblastdupsc-pi80.genes 23-Sep-2008 19:20 20k
nasoniageno-smallsc-mblastdupsc-pi80.ids 23-Sep-2008 19:20 11k
nasoniageno-smallsc.mblast.stats 23-Sep-2008 19:06 160k
Estimate of polymorphic (spurious) scaffolds in new genome assemblies.
Re http://insects.eugenes.org/arthropods/data/polymorphism/
2008 Sept 23, D. Gilbert
Method:
1. Select subset of scaffolds under 10Kb, as these are often ones with partial coverage,
inability to assemble due to redundancy with existing larger scaffolds. For Daphnia
with scaffold read coverage stats, these are almost all < 4x (many 1x) of the 8x coverage.
2. Megablast match small scaf to full genome assembly
3. Combine HSP/scaffold and select those small scaf with > 80% overall identity to larger scaffold
(other criteria could be used).
Genomes tested: Acyr. pisum, Nasonia vit., Daphnia pulex,
Results:
Files *smallsc.mblast.stats summarize megablast results as
small (querysc) x large (targetsc) with query length, blast align, mismatch, gap, and % identity
-------- dpulexgeno-smallsc.mblast.stats -----
querysc targs c qlen align mismat gaps ident %ident targloc
scaffold_764 scaffold_65 9938 2770 45 23 2698 27 17302-18434,...
scaffold_821 scaffold_257 9208 3273 46 13 3206 34 77559-80732,38245-38162,
scaffold_872 scaffold_65 8340 8145 15 6 8123 97 101197-98515,....
--------
Files *smallsc-mblastdupsc-pi80.ids list scaffold ids with >80% identity to larger scaffold
In the above case, Daphnia scaffold_872 is highly identical to part of scaffold_65
Files *gnomon.genes.mblastdupsc-pi80.genes list genes on these pi80 scaffolds.
Scaffold counts of putative spurious (polymorphic) scaffolds
6197 acyr1-geno-smallsc-mblastdupsc-pi80.ids
821 nasonia1-geno-smallsc-mblastdupsc-pi80.ids
233 dpulex1-geno-smallsc-mblastdupsc-pi80.ids
Gene counts on putative spurious (polymorphic) scaffolds
1866 acyr1_gnomon.genes.mblastdupsc-pi80.genes
208 nasonia1_gnomon.genes.mblastdupsc-pi80.genes
89 dpulex1_gnomon.genes.mblastdupsc-pi80.genes
These are likely underestimates: criteria selected <10kb scaffold size (as most likely
partial-coverage heterozygous assemblies), and >80% total scaffold identity (where
high identity on part of scaffold may indicate a heterozygous contig).
|