Added by Mingxun Wang, last edited by Mingxun Wang on Aug 04, 2017  (view change)


Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Table of Contents

Spectral Library Search

One of the major limitations in discovering new chemical entities is determining which metabolites are known compounds within complex biological samples. A way to overcome this limitation is to compare the MS2 spectra of the unknown metabolite with a library of MS2 spectra generated from structurally characterized metabolites.  Herein, this comparison is based upon the similarity cosine scoring of MS/MS spectra.  This comparison is the same use for the Molecular Networking workflow. 

This tutorial will cover the Dereplication workflow on GNPS. Once you have registered and logged in, the Dereplication workflow can be accessed at GNPS by clicking this link&nbsp dereplication workflow near this icon.

Data Format

All files must be in a compatible, universal format (mzXML or mgf). For a basic overview of data conversion, Data Conversion to GNPS Compatible Formats - .mzXML and .mzML .

Dereplication Workflow

Give your dereplication workflow a detailed title. This title will be helpful when you retrieve your data after the workflow is completed.

Upload Data

There are two ways to do upload data: ftp or the web. If your data files are larger than 20MB, they must be uploaded via ftp. It is preferred that data is uploaded by the ftp method. For further instructions for the web based and ftp, click here.

Selecting spectral libraries and input files

Click any of the Select Input Files buttons.

Click Select Input Files tab.

Select your Library Files.

Select your Spectrum Files.

You can dereplicate either a single MS2 spectrum (multiple files can be chosen)

Or you can dereplicate multiple MS2 spectra from a single acquisition (multiple files can be chosen)

Multiple files can be chosen.

Click Finish Selection and close the input window.

Dereplication Parameters

Search Options

Parameter Description Default
Parent Mass Tolerance
Parent mass peak tolerance
Ion Tolerance MS2 peak tolerance 0.5
Min Pairs Cos Cosine score threshold to make a match 0.5
Min Matched Peaks Minimum matched peaks to make a match 6

Advanced Search Options

Parameter Description Default
Library Class Minimum Library Class to Consider in Search Gold
Top Hits Per Spectrum Number of results to return per query spectrum 1
Search Analogs Allows peak shifted matches by parent mass difference similar to spectral networks No
Maximum Analog Search Mass Difference If analog search is turned on, max mass shift tolerance 100.0

Filtering Options

Parameter Description Default
Filter StdDev Intensity Filters out low intensity peaks. Determines mean and std dev of lowest 25% of the spectrum peaks and removes all peaks below mean + n*stdDev. 2.0
Min Peak Int Minimum peak intensity, everything below is removed 50.0
Filter Precursor Window Removes peaks around the precursor mass No
Filter Library Apply peak filters to library No
Filter peaks in 50Da Window

Submit Your Workflow

Click Submit to submit your workflow.  You will be emailed when the workflow is complete.

Results Display

To access your data, either click the link in the email or click the DONE link under Status on the jobs menu page.

You can view your library matches in two ways: Group by Spectrum or View All Spectra DB

Group by Spectrum reads the annotation from the MGF library file while View All Spectra DB reads the annotation from our database which can be updated. To learn about library creation, click here .

Regardless of the way you view the library matches, metadeta is associated with the library spectra including the compound name, a CAS number if it is a commercial compound and a PUBMED ID correlating to the published data.

The score shown correlates to the cosine scoring function where 1 is an exact match.

To view the spectra corresponding to the queried MS2 spectrum and the database/library spectrum, click the spectrum cartoon icon. The queried spectrum is on the left and the database/library spectrum is on the right.

While library/database matches are a good indication that the compound of interest has been previously characterized, it is suggested that manual comparison of the MS2 spectra and/ or commercial compound is used for verification.