Index of /species/data/acyr/dupgenes
Name Last modified Size Description
Parent Directory 26-Dec-2008 20:59 -
acyr1_gnomon.genes.mblastdupsc-pi80.genes 23-Sep-2008 18:26 200k
acyrgeno-smallsc-mblastdupsc-pi80.ids 21-Sep-2008 23:25 89k
acyrgeno-smallsc.mblast.stats 22-Sep-2008 14:03 951k
arp-acyr1_gnomon-pi80dupgene.tab 23-Sep-2008 19:39 11k
Estimate of polymorphic (spurious) scaffolds in new genome assemblies.
Re http://insects.eugenes.org/arthropods/data/aphid/dupgenes/
2008 Sept 23, D. Gilbert
Method:
1. Select subset of scaffolds under 10Kb, as these are often ones with partial coverage,
inability to assemble due to redundancy with existing larger scaffolds. For Daphnia
with scaffold read coverage stats, these are almost all < 4x (many 1x) of the 8x coverage.
2. Megablast match small scaf to full genome assembly
3. Combine HSP/scaffold and select those small scaf with > 80% overall identity to larger scaffold
(other criteria could be used).
Genomes tested: Acyr. pisum, Nasonia vit., Daphnia pulex,
Scaffold counts of putative spurious (polymorphic) scaffolds
6197 acyr1-geno-smallsc-mblastdupsc-pi80.ids
821 nasonia1-geno-smallsc-mblastdupsc-pi80.ids
233 dpulex1-geno-smallsc-mblastdupsc-pi80.ids
Gene counts on putative spurious (polymorphic) scaffolds
1866 acyr1_gnomon.genes.mblastdupsc-pi80.genes
208 nasonia1_gnomon.genes.mblastdupsc-pi80.genes
89 dpulex1_gnomon.genes.mblastdupsc-pi80.genes
These are likely underestimates: criteria selected <10kb scaffold size (as most likely
partial-coverage heterozygous assemblies), and >80% total scaffold identity (where
high identity on part of scaffold may indicate a heterozygous contig).
|