FAQ for DroSpeGe: BioMart and other functions 1. BioMart: Finding features by D. melanogaster gene name 2. DroSpeGe BioMart copies 3. DroSpeGe BLAST report Links to Sequence ........................................................................... 3. DroSpeGe BLAST report Links to Sequence ** As of 10 December 2006, the FastA sequence links in BLAST reports now return the stranded, match region without expansion. I.e. reverse complement when appropriate, which will be printed in the FastA header, and without a +/- 1000 bases expansion that the MapView links show. With some multi-HSP BLAST matches, if they include forward + reversed matches, the FastA link result may be ambiguous. EMBL, GenBank output do not provide reverse complement sequence. Prior to this, sequence links in the BLAST reports provided the BLAST hit region (+/- 1000 bases around), on forward strand only. From help requests: > I have been trying to use DroSpeGe to study a region of the X > chromosome. When I BLAST against sechellia, yakuba, erecta and > simulans with the sequence pasted below, it provides output that > looks OK at first, but a CLUSTAL alignment of the sequence is the > first indication that something is wrong. ... I think I know what happened: from DroSpeGe blast output, you chose the FastA links to collect the sequence for clustal alignment. Because of the complexity of blast output, these links return the *region, on forward strand* of the blast hit shown, not the stranded, blast hits as shown in blast's alignment, which may be multiple. The region for Fasta link includes +/- 1000 bases around the full blast hit range, which may consist of several sections, same as in map view links. Dsec and Dyak have reversed strand chromosomes for this region from Dmel. If you want to see the alignment, you can change the Blast output format to give you alignments. At bottom of insects.eugenes.org/species/blast/ choose Alignment view : flat master-slave with identities I realize this isn't obvious for those doing the kind of additional analyses you want to do, and the blast output should be improved to add further options. Maybe I can figure out how to have it reverse the Fasta sequence when blast is matching reverse strands. - Don Gilbert 2. DroSpeGe BioMart copies Find DroSpeGe's BioMart of drosophila genome annotations now dumped out for your and others uses, at ftp://eugenes.org/eugenes/biomart/drospege_mart_caf1/ Read ftp://eugenes.org/eugenes/biomart/drospege_mart_caf1/Example-drospege-mart.txt See also dsppbiomart6.script therein which lists data sources, and the perl script to create is one folder above. 1. Finding features by D. melanogaster gene name > thank you for the great work making biomart for the drosophilas. It is > truly a valuable tool. My concern is that right know I can not download > sequences (CAF1 and filter: gene names oc,sls ) for D. pseudoobscura Thanks for your question. The Biomart service generally works, but there are a few things about it one needs to know to have it work well. First, D. melanogaster gene names are not the primary data set, but are available for many searches thru protein similarity features. THere is a set of well known genes (markers) which include the oc and sls genes which are retrievable by name. If names don't work try instead the CG ID (e.g. CG12154-PA, CG1915-PA). If you are new to using BioMart, please see documentation including a good tutorial from DictyBase folks, thru this link: http://www.biomart.org/ > Documentation > MartView tutorials from DictyBase - available here For single genes or a small selection, you may find that the GBrowse Maps are easier to use. You can type gene names into its search box and retrieve the map location, and from that the sequence output. See in GBrowse the choice for Reports & Analysis: [Display Feature Fasta] I tried your queury for oc,sls with D. pseudoobscura, and it worked for me. The first thing to try with such a query is the Biomart FILTER [Count] box. If that says "No entries found!" then your query needs to be modified. BioMart > FILTER Feature Attributes: [x] Feature ID lists [ID(s)] : nosuchgene "No entries found!" There are two field choices for search of Feature ID lists, ID(s) which generally are CG12345-PA protein IDs, but for marker genes it includes their names (oc, sls). The "Name" choice is more likely to match common D.mel. gene names. If your Filter returns a number of entries, but the OUTPUT stage fails to return sequence, first check OUTPUT Select the Attribute Page for FEATURE TABLE, and see if that returns feature fields you expect. Then at the Attribute Page: FEATURE SEQUENCE, be sure you have selected the button for "Unspliced (Gene)" or "Flank (Gene)", and also the Header Information: "Ft_ID (required)" If you can let me know in more detail the steps you used and where it failed, I can help further. ............ http://insects.eugenes.org/BioMart/martview/ Dataset: Drosophila_pseudoobscura FILTER > FILTER Feature Attributes: [x] Feature ID lists [ID(s)] : oc,sls OUTPUT > start Schema: DroSpeGe_CAF1 Dataset: Drosophila_pseudoobscura 305498 Entries Total filter ID(s): Uploaded 8 Entries pass Filters FEATURE TABLE Chromosome Source Biotype Start End Strand Score ID Name Dbxref XL_group1e marker:modDM match 7764323 7772191 -1 0.00000 oc oc FlyBase:FBpp0088593,GB_protein:AAF46400.3,FlyBase:FBgn0004102,GB_protein:AAF46400.2,Gadfly:CG12154-PA XL_group1e marker:modDM match 7764323 7772191 -1 0.00000 oc oc FlyBase:FBpp0088593,GB_protein:AAF46400.3,FlyBase:FBgn0004102,GB_protein:AAF46400.2,Gadfly:CG12154-PA XR_group6 marker:modDM match 11947084 11948069 -1 0.00000 sls XR_group6 marker:modDM match 11947084 11948069 -1 0.00000 sls XR_group6 marker:modDM match 11950705 11996361 -1 0.00000 sls XR_group6 marker:modDM match 11950705 11996361 -1 0.00000 sls XR_group6 marker:modDM match 12000763 12016806 -1 0.00000 sls XR_group6 marker:modDM match 12000763 12016806 -1 0.00000 sls FEATURE SEQUENCE >XL_group1e|7764323|7772191|-1|270|match|marker:modDM|oc|oc|FlyBase:FBpp0088593,GB_protein:AAF46400.3,FlyBase:FBgn0004102,GB_protein:AAF46400.2,Gadfly:CG12154-PA AGGTGTCAACACAAGAAAACAGCGTCGGGAGCGCACCACATTCACACGCGCCCAATTGGACGTCCTGGAGGCACTGTTCG ... >XR_group6|11947084|11948069|-1|320|match|marker:modDM|sls|| CCCGCCATCGCCTCCACAGAATCTGCGAGCCCCAGACGTGACGAGCCGCAGCGTGACCCTCGATTGGGAGATTCCAGCGC ..