Added by Nuno Bandeira, last edited by Nuno Bandeira on Jun 05, 2016  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

ProteoSAFe Demo

ASMS 2016, Bioinformatics of Protein Identification

URL: http://bix-lab.ucsd.edu/display/PS/ASMS

Introduction

The ProteoSAFe environment was developed at the UCSD Center for Computational Mass Spectrometry to enable Scalable (ie, fast), Accessible (ie easy to use) and Flexible (many tools) identification of tandem mass spectra. Here we will explore some of ProteoSAFe's search capabilities using a demonstration dataset of Thermo LCQ spectra acquired from a sample of Human lens proteins from a 93 year old patient with cataracts; lens proteins with low or now turnover accumulate many Post-Translational Modifications (PTMs) over time so this sample is expected to be especially rich in PTMs.

Data files used for this demo:

Spectra: 93S_cat_29.mgf (one of 38 files for the whole sample)

Sequences: ncbi_human_lens_10-24-2009.fasta (only Human lens proteins)

ProteoSAFe database search

We will explore two different types of database searches: a) with a restricted set of modifications (MS-GFDB) or b) allowing for any type of unexpected modification (MODa)

  1. Download 93S_cat_29.mgf and ncbi_human_lens_10-24-2009.fasta
  2. Go to http://proteomics2.ucsd.edu/ProteoSAFe
  3. Create a new user account
    • Click the link labeled 'Register' on the top-right
    • Enter your chosen user name and password
    • Email address is optional but it is recommended since ProteoSAFe emails you when your searches complete
    • After registration, return to the main page
    • Log on at the top right: enter user name and password, click "Sign in"
  4. Upload 93S_cat_29.mgf and ncbi_human_lens_10-24-2009.fasta to your newly-created user account
    1. Click the button labeled "Select Input Files" (your browser may require that you explicitly allow pop-up windows)
    2. Click the "Upload Files" tab
      • Upload 93S_cat_29.mgf and ncbi_human_lens_10-24-2009.fasta
    3. Click the "Select input files" tab
      • Open your files folder by clicking on "+"
      • Select 93S_cat_29.mgf and click the "Select spectrum files" button
      • Select ncbi_human_lens_10-24-2009.fasta and click the "Select Sequence files" button
    4. Close the pop-up window
  5. Configure MS-GFDB(Pubmed reference)
    1. Select "MS-GFDB" from the "Tool" drop-down box
    2. Set "Fragmentation Method" to "CID"
    3. Set "Parent mass tolerance" to "1.5 Da"
    4. Scroll to "More options" at the bottom of the page
      • Check the box labeled "Include common contaminants"
      • Enter your email address in the text box labeled "Email me at"
    5. Scroll to "Allowed post-translational modifications"
      • Check the box for "Oxidation"
      • Check the box for "Pyroglutamate Formation"
      • Check the box for "N-terminal Acetylation"
      • On the bottom row of the same table, enter the values for Deamidation
        • Enter "0.984016" under "Mass (Da)"
        • Enter "NQ" under "Residues"
        • Choose "Optional" under "Type"
        • Submit the new modification by clicking the "+" sign on the leftmost table column
    6. Scroll to the bottom of the page and click "Search"
  6. Configure MODa - discovery of unexpected Post-Translational Modifications (PTMs)(Pubmed reference)
    1. Click "Clone" on the status page of your MS-GFDB search job
    2. Select "MODa" from the "Tool" drop-down box
    3. Set "Parent mass tolerance" to "1.5 Da"
    4. Scroll to "More options" at the bottom of the page
      • Check the box labeled "Include common contaminants"
      • Enter your email address in the text box labeled "Email me at"
    5. Under "More options"
      • Set "Spectrum-Level FDR" to 0.05
    6. Scroll to "Allowed post-translational modifications"
      • Select the option "1 (blind search)" for "Blind Mode (number of modifications per peptide)"
    7. Scroll to the bottom of the page and click "Search"

Impact of different search spaces

MS-GFDB: Searching for 4 PTMs with 1 PTM per peptide vs 3 PTMs per peptide

Search #PTMs #PTMs/pep FDR #PSMs URL
MS-GFDB, common mods only, 1.5 Da 4 1 1% 11,661 http://proteomics2.ucsd.edu/ProteoSAFe/status.jsp?task=698214a03a4a46d1bcbe86f20c4475b1
MS-GFDB, common mods only, 1.5 Da 4 3 1% 11,869 http://proteomics2.ucsd.edu/ProteoSAFe/status.jsp?task=513b27bd04df41d99bf567c656e38a0b

MS-GFDB: searching for 17 PTMs with 1 PTM per peptide vs 2 PTMs per peptide

Search #PTMs #PTMs/pep FDR #PSMs URL
MS-GFDB, w/single blind 17 1 1% 8,476  
MS-GFDB, w/two single blind 17 2 1% 6,785  

MODa: search allowing for one vs 2+ unexpected PTMs per peptide (5% FDR)

MS-GFDB: searching for 17 PTMs with 1 PTM per peptide vs 2 PTMs per peptide (5% FDR)

Search #PTMs #PTMs/pep FDR #PSMs URL
MS-GFDB, w/single blind 17 1 5% 8,915 http://proteomics2.ucsd.edu/ProteoSAFe/status.jsp?task=8da3fa0bd3c94cca9cf2ce3e770f4739
MS-GFDB, w/two single blind 17 2 5% 7,042 http://proteomics2.ucsd.edu/ProteoSAFe/status.jsp?task=84108cd67da2477c9273d87261f6e20c

Selecting PTMs from blind modification search

Example search results from a MODa workflow search job:

Browsing and downloading search results
  1. Downloading search results
    1. Click "Group by spectrum" on the status page for a completed job
    2. Click on the "Donwload" tab at the top of the page
    3. Click the "Download" button
  2. Browsing and filtering search results
    1. Sorting: column headers in the first table row: each column has a header title (e.g., "Filename"). Next to each title there are up/down arrows that can be used to re-sort the table by that field in ascending/descending order, respectively
    2. Filtering: in the second row, under each column header, there are text boxes that can be used to filter the search results and focus on specific subsets of identifications using the "Filter" button on the top-left in the results table.
      1. Text fields, such as "Filename", can be filtered for any substring in the the file name. For example, to see only spectra identifications from "93S_cat_43.mgf", it suffices to enter "43" or "cat_43" in the filter box and click the "Filter" button
      2. Numeric fields, such "Score" can be filtered using numeric ranges; the left box is the minimum accepted value and the right box is the maximum accepted value
    3. Spectrum images: in the leftmost column in the results table, clicking on the spectrum image icon opens a new row in the results table showing an image of the experimental spectrum annotated with the identification shown in the results table
    4. Spectrum groups: the "Group by peptide" view (e.g., http://proteomics2.ucsd.edu/ProteoSAFe/result.jsp?task=ed1c9a43cc074fbf9f3bef93d311f488&view=group_by_peptide_old) groups spectra identified with the same peptide sequence in a single table row. In these views, the set of spectra supporting a peptide identification can be viewed using the double-down-arrows to the right of the spectrum image icon.
Analyzing the mass-shift frequency table

The mass-shift frequency table from the MODa search is available in the "Modification Counts" view of MODa's search results: http://proteomics2.ucsd.edu/ProteoSAFe/result.jsp?task=bf10de33c7c648f5941a682bf85bc585&view=ptm_details

If downloading search results "ptmResult" subfolder in the downloaded zip results file. This text file can be loaded in a spreadsheet program like Excel or can be accessed as a Google Docs spreadsheet at https://docs.google.com/spreadsheets/d/17LbGmdQrlJJH5Gj3bp8e5gELqTh7QPkc7Aopozhvxy4/edit  if you have a Google account then you can get a modifiable version of the file using "File -> Make a copy" after signing in with your account.

The question we want to answer is: Which mass offsets correspond to real post-translational modifications on which sites?

In particular:

  • Question: Is the +16 mass shift believable? Would you classify it as an artifact modification or a post-translational modification?
  • Question: Is the +58 mass shift believable? Would you classify it as an artifact modification or a post-translational modification?

PTM analysis guidelines/tips:

  • Sorting mass-shifts by decreasing occurrence counts
    • Click the top-most, left-most cell in the spreadsheet to select all cells
    • Use "Data -> Sort range" to open the sort dialog box
    • In the pop-up dialog box, check "Data has header row", check "Sort by: Z->A" and select "Sort by: Sum"
  • Highlight cells with high occurrence counts
    • Select columns B through U
    • Use "Format -> Conditional formatting" to open the dialog box
    • In the pop-up dialog box, click "Text contains" and change it to "Greater than", enter "9" in the box next to "Greater than", click on the box to the right of "Background" and select your favorite highlight color
  • Use ProteoSAFe results filters to assess whether you would choose the mass offset as a realistic modification.
  • Observations that reinforce confidence in observed mass-shifts
    • Mass shift corresponds to known PTM on indicated amino acid
    • PSMs with overlapping sequences help reinforce PTM IDs, especially if the same PTM occurs on the same site on different peptides
    • Look for same identification with multiple precursor charge states
    • Look for correlated peptide fragmentation intensities between multiple peptide variants (same sequence, different mass shifts)
  • Common artifacts in mass-shift frequency tables
    • Site localization is especially inaccurate for blind searches because all amino acids are accepted as modification sites for all modifications
    • C13 isotopic peaks often generate -1/+1 "shadow" versions of real modifications; modifications on highly abundant peptides may also have -2/+2 offsets though those are less common
    • Summed-mass artifacts: mass shifts determined from blind modification searches sometimes add up the mass of two modifications. A common example is when the protein N-term is Acetylated (+42 Da) and there is an N-term oxidized Methionine (+16 Da) - blind searches will often return N-term+58, which can be confused with N-term Carboxymethylation (one single modification of mass +58 Da)

Other related searches of the same dataset

Search #PTMs #PTMs/pep FDR #PSMs URL
MSPLIT spectral library search n/a n/a 1% 11,732
(20 mixture spectra)
http://proteomics2.ucsd.edu/ProteoSAFe/status.jsp?task=446f93c02c864c44bd4da1264c626ce0

Tip: you can filter for Mixture-spectrum identifications using "!" (no quotes) on the "Peptide" column in MSPLIT search results.