NIEHS SNPs WorkshopJanuary 30-31, 2005Interactive Tutorial 1: SNP Database ResourcesThe tutorial is designed to take you through the steps necessary to access SNP data fromthe primary database resources:1. Entrez SNP/dbSNP2. HapMap Genome Browser3. NIEHS - GeneSNPs and the NIEHS SNPs Websites4. Other Tools –PolyPhen, ECR, PolyDoms, TransfacNote: Answers to questions from this tutorial are included at the end of this documentAs a launching point, we will begin our searching at the Entrez cross-database browser.This can be accessed on the NCBI home page ( For theseexercises we will be accessing data for the gene: chemokine-like factor (HUGO name:NOS2A).For a cross-database search:1. Enter the gene symbol (NOS2A) into the empty box next to the ‘Search AllDatabases’, type NOS2A into the empty box and click on the GO button, orsimply hit the return key on your keyboard.Which NCBI database gives the most number of resultsWhat is the database? Hint: Mouse over on the ‘?’ next to the icon and click for apopup explanation of this database.2. On the left column note the results returned for the ‘SNP’ and ‘Gene’ database.3. How many results were returned for the ‘SNP’ and ‘Gene’ database?4. Why did the ‘Gene’ database return more than one result?Entrez Gene5. From the cross database search, click on the ‘Gene’ database icon.6. Click on the result that corresponds to the ‘homo sapiens’ NOS2A gene.7. NOS2A maps to which chromosome?8. What are the genes 5’ and 3’ of NOS2A? (Hint: look at the genomic context).9. On the far right of the page next to the NOS2A gene name and description, notethe word ‘Links’(see Figure 1 below).10. Scroll down this list and select ‘SNP: Geneview’1

dbSNP1. The initial dbSNP Geneview only shows SNPs that are located in the codingregion of the gene (cSNPs).2. How many cSNPs are found in dbSNP for NOS2A? How many are validated?Under the ‘Gene Model’ heading, use the button selectors to view all SNPs in the‘gene region’ (select that button) and then select the ‘view rs’ button.3. After selecting this, the page will update and show all SNPs in this gene.4. How many SNPs are found in dbSNP for NOS2A?5. Note: this number will appear just above the SNP map picture of the gene.6. How many SNPs shown have been validated by the HapMap project? i.e. have an‘H’ symbol in the validation column?7. How many SNPs have frequency data (i.e. a heterozygosity value) associated withthem? Hint: count the number without this data and subtract from total8. Click on the rs# link for the intronic SNP rs7208775. What methods were used tovalidate this SNP?9. How many submitters have recorded a discovery of this SNP?10. Click on the ss# (ss38342908) next to the ‘EGP SNPS NOS2A-0044457’ SNPsubmission.11. On this page, scroll down and find the frequency data for this SNP in each of thefour populations studied by this submitter (EGP AD, EGP ASIAN,EGP YORUB, EGP CEPH). What the allele frequency of the C and G allele ineach of these populations?2

12. Using the ‘BACK’ button in your browser, return to the Entrez Gene page forNOS2A.Entrez SNP1. Starting from the Entrez Gene page again, use the ‘Links’ menu on the right sideto view the linkout choices and select the ‘SNP’ option.2. This will automatically query the Entrez SNP database for all SNPs in dbSNP forthe NOS2A gene for species you are viewing (i.e. ‘homo sapiens’).3. How many SNPs are returned?4. Below the search box and tabbed menu choices (i.e. ‘Limits’, ‘Preview/Index’,etc), the ‘Display’ feature menu to show this list as a ‘FASTA’. The page shouldautomatically update when you make your selection.5. In the ‘Send To’ drop down menu, select the ‘Text’ option. The page shouldupdate the results in plain text format. This selection can be directly copied to afile on your computer.6. Use the ‘BACK’ button on your browser. Alternatively this data can be “Sent To’a ‘File’ directly, that is saved on your computer.7. Select the ‘Limits’ tab below the main search box. In the main search box typethe gene name ‘NOS2A’.8. Select the following search limits from the selections on this page:a. Organism: Homo sapiensb. Validation: 2hit-2allele9. After making these selections, use the ‘Go’ button next to the main search box toget the result.10. How many results are returned for validated 2hit-2allele SNPs in this gene?11. Experiment with saving these in different formats using both the ‘Send To’ ‘Text’ option and ‘Send To’ ‘File’ option.12. Go to the ‘Limits’ option again and select the following search items:a. Organism: Homo sapiensb. Has Genotype: True13. How many results are returned for SNPs with genotype data?14. In the ‘Display’ drop down menu select the ‘Genotype’. Genotype data for eachrs# SNP will be displayed.15. Make sure that the checkbox in the ‘Limits’ tab is unchecked. Finally, todemonstrate the ability of using search term fields directly in the main search box,type the following:NOS2A[gene] AND "EGP SNPS"[handle]How many total entries are in dbSNP for this gene and submitted by the NIEHSEGP project (our handle is EGP SNPS)?3

HapMap Browser:The HapMap Genome Browser is linked directly from the main page at byselecting ‘Browse Project Data’. the main search box enter the abbreviated gene name with a wild card ‘NOS*’.A list of result from many genes will be displayed – select NOS2A.The browser page with tracks will be presented.How many HapMap SNPs in this gene?Zoom in to view 20kb using the Scroll/Zoom drop down menuNote the display of frequency data for each population using the pie graphs foreach SNP. Click on the first HapMap SNP in this gene (rs2297516)7. Note the allele frequencies for each population. Select the ‘retrieve genotypes’link on the far right column. Genotype data for this SNP and population will bedisplayed8. Use the ‘BACK’ button to get to the main gene view.9. Below the gene structure image (exons), click on the ‘Tracks’ setting. Under the‘Analysis’ subheading select ‘plugin:tagSNP picker’. Next select the ‘UpdateImage’ using the button on the right side of the page.10. SNPs correlated with rs2297516 should appear on a track. Select the first one(rs2297516). In the ‘Search’ section of layout (near where you entered the genesymbol) select the ‘Reports & Analysis’ pull down menu selection – ‘DownloadtagSNP data’ and select ‘GO’.11. Which SNPs are captured by this s/1. Select ‘Cell Cycle’ from the ‘Gene List’ drop down menu.2. Select the ‘UCSC:hg16:4’ Gene Model link for the gene CCNA2.3. What is the orientation of the gene?4. In the ‘All Submitter’ drop down menu, select EGP SNPS.5. Scroll down the list of SNPs to the first non-synonymous cSNP. In what position ofthe codon is this SNP?6. On what population was this gene re-sequenced? Click on the GS link at the right.What is the genotype of the sample P012?NIEHS SNPs. variation data for the NOS2A can be accessed through the search box on the top rightpart of the home page, or via the ‘A-Z Finished Genes Directory’ link in the left handnavigation bar.4

Using this last link, find NOS2A in the alphabetical listing of genes.1. Under the ‘Mapping Data’ section, click on the ‘cSNPs’ link.2. How many non-synonymous cSNPs were discovered in this gene? What is theposition in our reference sequence of the of the first synonymous SNP?3. What was the cDNA position of this SNP?4. In which population was it discovered?5. Go ‘BACK’. In the ‘Genotyping’ category, click on the ‘Visual Genotype’ link.An image of all the genotyping data for this gene is displayed. Using the SNPlocation of the synonymous SNP, determine which individual carries thispolymorphism?6. Explore other links in the ‘Mapping Data’, ‘Genotyping Data’ and ‘PredictiveData’ sections. The ‘Linkage Data’ and ‘Haplotype Data’ will be covered in asubsequent talk and tutorial. Compare the data from the NIEHS SNPsEnvironmental Genome Project to that found in the other database resources.Genome Variation Server: can also be accessed off the SeattleSNPs home pageNote: Check that your browser is set to NOT block pop-up windows. GVS uses pop-upwindows to display the images and tables.1. Click the Gene Name search button2. Enter ‘NOS2A’3. A table of ‘Populations’ and ‘Submitters’ will appear along with the number ofgenotyped SNPs for each entry.4. Select the button for ‘HapMap-CEU’. De-select AFD-CHN.5. Select the ‘Display Genotypes’ button at the bottom of the page.6. Two windows should pop-up having a visual genotype image and a table with thegenotypes for every sample at every SNP site.7. Experiment with the other buttons for displaying results (e.g. display tagSNPs, etc).8. Parameters can be used to customize these ensky/1.Enter the amino acid sequence for BRCA1 from the fasta file provided. Includethe first line starting the ‘ ’ character.2.Enter 356 for position, Q (glutamine) and R (arginine) for AA1 and AA2, then“Process query.”5

3.The polyphen site returns its prediction, based on alignments of both polypeptidesequences to sequences in the SwissProt data base, the potential of disrupting knownstructural motifs (coils, active sites, disulfide bridges, phosphorylation sites, etc.), and thesteric changes to the three dimensional structure. This substitution is predicted to be“probably damaging” (and is a known BRCA1 mutation).4.Navigate to the NIEHS SNPs EGP web site, use the ‘A-Z Finished GenesDirectory’ link to list the “B” genes and go to BRCA1.5.Click on the non-synonymous cSNP analysis link in the Predictive Analysissection and examine the Polyphen and Sift predictions.6.Which substitutions are predicted by Polyphen to be damaging? Whichsubstitutions are predicted to be intolerant by Sift?7.Extra assignment: Determine which sample in the PDR carries the BRCA1mutation. Hint: see NIEHS SNPs ms/8.Enter BRCA1 in the ‘Gene/Protein Symbol’ field.9.In the Search Results section, 13 proteins (1 unique) should be listed. Click on the‘NonSynonymous’ link of the first Gene Description (protein NP 009225).10.Find the Q356R substitution. What non-EGP submitters have identified thisvariant?11.Use the color key under the gene schematic to find the “mutation.” Whatsubstitution is coded as a mutation by OMIM?12.In the Pathway(s) section, click on the ATM signaling pathway to view an imageof BRCA1 interacting proteins from Biocarta.13.Go back to the NIEHS SNPs site, find the Pathway Image Maps section on thenavigation bar at the left side of the page. Click on the ‘Double Strand Break Repair’link. Mouse over ATM and the other protein icons to find other genes in this pathwayscanned for variations. Note how these two resources can be used to identify SNPs ininteracting genes and pathways.14.Return to the PolyDom site and click on the Download Polydom Annotations linkunder Search Results. Select All and Download table. Save to a file (snpAnnotation.xls)and open using Excel.15.Examine the table and note how many cSNPs are reported in BRCA1?ECR browser and BRCA1 in the search field next to the submit button, then click ‘Submit’.6

2.Click on the RefSeq corresponding to chr17:38449842-38530657.3.Note the horizontal blue lines indicating the gene structure of BRCA1 isoforms.The arrows indicate the gene is shown 3’ to 5’ (right to left). Use the arrow buttons onthe bottom left of the page (next to the and buttons) to flip the orientation of the gene.4.Zoom out 1.5 X using the green “-1.5 X” button to show the genomic contextaround BRCA1. What is the gene shown upstream of BRCA1? What is the orientationof the gene?5.Note the pink colored peaks in the mouse and chicken BRCA1 orthologues: Pinkindicates conserved intronic regions, yellow UTR, red intergenic regions (potentialenhancers), and blue exons.6.Click on the pink region conserved in chicken to center the browser on thisregion.7.Click on the “ 10 X” button to zoom in on this region.8.Click on the “Grab ECR” button.9.Click on the pink region in the mouse corresponding to the pink region inchicken. A conserved sequence alignment of human vs. mouse will appear in a newwindow.10.Swipe the human sequence at the bottom of the page and select ‘Edit’ and ‘Copy’from the browser menu.11.Go to the Transfac page at the URL listed above. Transfac is a database ofconsensus transcription factor binding sequences (TFBS).12.Paste the sequence into the Nucleic Acid sequence window in the page. Select the‘Mammal’ class and click the ‘submit’ button. What is the longest consensus signalsequence of the hits listed? How many of these potential transcription factor bindingsites are in this ECR? Is this TFBS also found in chicken? Hint: click on the ‘Site #’R01162.13.Are any SNPs located in these potential TFBS? Hint: this is a difficult questionto address. What is the strand the TFBS is found on? A more direct approach may be toask if there are any potential TFBS around a specific polymorphism.14.Go back to BRCA1 on the NIEHS SNPs EGP page and click on the ‘SNPContext’ link. Scroll down to the sequence for the insertion-deletion polymorphism at013578. Swipe the three lines of sequence (corresponding to upstream flanking, theinsertion, and downstream flanking sequence) and ‘Copy’ and ‘Paste’ it into the NucleicAcid sequence window of the Transfac page then click ‘submit.’ Note the potentialTFBS.15.Swipe and ‘Cut’ the middle line of sequence (corresponding to the insertedsequence) from the Nucleic Acid sequence window and resubmit. Compare the resultsfrom the two searches and determine the potential sites that are different in the insertionand deletion alleles, and therefore altered, by the insertion-deletion polymorphism.Which site is specific to the insertion?7

Answer KeyCross-database Search1. Geo Profiles3. SNP 563 and Gene 144. The search hits aliases, descriptions of interacting proteins, and sequences from otherspecies.Entrez Gene7. NOS2A is on chromosome 178. 5’ LOC441789 and 3’ LOC201229dbSNP2. How many cSNPs are found in dbSNP for NOS2A? 19 (12-synonymous, 7nonsynonymous) How many cSNPs are validated? Eight. (828036 is listed twice withtwo rs numbers)4. 2946. 13 SNPs HapMap confirmed7. Hint: count the number without this data and subtract from total: 2448. Multiple independent submissions and allele frequency or genotype data.9. 2, EGP and BCM11. AD (African Descent) G 0.857, C 0.143; ASIAN G 0.818, C 0.182;YORUBAN G 0.875, C 0.125; CEPH G 0.955, C 0.045Entrez SNP3. 334 SNPs10. 4613. 15 (if you don’t release previous search) if you clear Validation: 2hit-2allele then 21515. How many total entries are in dbSNP for this gene and submitted by the NIEHS EGPproject (our handle is EGP SNPS)? 252HapMap Browser:4. How many HapMap SNPs in this gene? 5111. rs4462652 , rs2274894, rs3729508, rs4795067, rs2297516, rs944725, rs8068149,rs8072199GeneSNPs3. 3’ to 5’ or right to left.5. The first position of codon 163 causing an isoleucine to valine substitution.6. The PDR, G/G8

NIEHS SNPs Variation Data:1. How many non-synonymous cSNPs were discovered in this gene? Four. What is theposition of the first synonymous SNP location in our reference sequence? 38162. What was the cDNA position of this SNP? 383. In which population was it discovered? European4. Go ‘BACK’. In the ‘Genotyping’ category, click on the ‘Visual Genotype’ link. Animage of all the genotyping data for this gene is displayed. Using the SNP location of thesynonymous SNP, determine which individual carries this polymorphism? E119Genome Variation Server:No answers to report.Polyphen6. Q356R, Q356R and S1040N.7. P073PolyDoms3. HGBASE, SNP500 Cancer, and Sequenome.4. N1040S8. 345ECR Browser and Trafac4. NBR2, 3’ to 5’, opposite of BRCA1.12. NF-1B1, 2, yes.13. Identifying SNPs in the potential TFBS requires tools beyond these websites, sincethe alignment of the consensus TFBS is imperfect. With a default 80% sequencealignment to an often degenerate TFBS sequence it is difficult to find the alignment in theoriginal sequence, and once located, it should be determined whether the SNP position isconserved in both the alignment to the consensus TFBS and between the human andmouse sequences. The (-) strand, which means the sequence displayed in the alignment.The ( ) strand is the reverse complement of the displayed sequence.15. GATA-1 at 77, 78 (the site is listed twice by two entries in Transfac)9

Which NCBI database gives the most number of results What is the database? Hint: Mouse over on the '?' next to the icon and click for a popup explanation of this database. 2. On the left column note the results returned for the 'SNP' and 'Gene' database. 3. How many results were returned for the 'SNP' and 'Gene' database? 4.