One of the major limitations in discovering new chemical entities is determining which metabolites are known compounds within complex biological samples. A way to overcome this limitation is to compare the MS2 spectra of the unknown metabolite with a library of MS2 spectra generated from structurally characterized metabolites. Herein, this comparison is based upon the similarity cosine scoring of MS/MS spectra. This comparison is the same use for the Molecular Networking workflow.
This tutorial will cover the Dereplication workflow on GNPS. Once you have registered and logged in, the Dereplication workflow can be accessed at GNPS by clicking this link  dereplication workflow near this icon.
All files must be in a compatible, universal format (mzXML or mgf). For a basic overview of data conversion, Data Conversion to GNPS Compatible Formats - .mzXML and .mzML .
Give your dereplication workflow a detailed title. This title will be helpful when you retrieve your data after the workflow is completed.
There are two ways to do upload data: ftp or the web. If your data files are larger than 20MB, they must be uploaded via ftp. It is preferred that data is uploaded by the ftp method. For further instructions for the web based and ftp, click here.
Click any of the Select Input Files buttons.
Click Select Input Files tab.
Select your Library Files.
Select your Spectrum Files.
You can dereplicate either a single MS2 spectrum (multiple files can be chosen)
Or you can dereplicate multiple MS2 spectra from a single acquisition (multiple files can be chosen)
Multiple files can be chosen.
Click Finish Selection and close the input window.
| Parent Mass Tolerance
|| Parent mass peak tolerance
|Ion Tolerance||MS2 peak tolerance||0.5|
|Min Pairs Cos||Cosine score threshold to make a match||0.5|
|Min Matched Peaks||Minimum matched peaks to make a match||6|
|Library Class||Minimum Library Class to Consider in Search||Gold|
|Top Hits Per Spectrum||Number of results to return per query spectrum||1|
|Search Analogs||Allows peak shifted matches by parent mass difference similar to spectral networks||No|
|Maximum Analog Search Mass Difference||If analog search is turned on, max mass shift tolerance||100.0|
|Filter StdDev Intensity||Filters out low intensity peaks. Determines mean and std dev of lowest 25% of the spectrum peaks and removes all peaks below mean + n*stdDev.||2.0|
|Min Peak Int||Minimum peak intensity, everything below is removed||50.0|
|Filter Precursor Window||Removes peaks around the precursor mass||No|
|Filter Library||Apply peak filters to library||No|
| Filter peaks in 50Da Window
Click Submit to submit your workflow. You will be emailed when the workflow is complete.
To access your data, either click the link in the email or click the DONE link under Status on the jobs menu page.
You can view your library matches in two ways: Group by Spectrum or View All Spectra DB
Group by Spectrum reads the annotation from the MGF library file while View All Spectra DB reads the annotation from our database which can be updated. To learn about library creation, click here .
Regardless of the way you view the library matches, metadeta is associated with the library spectra including the compound name, a CAS number if it is a commercial compound and a PUBMED ID correlating to the published data.
The score shown correlates to the cosine scoring function where 1 is an exact match.
To view the spectra corresponding to the queried MS2 spectrum and the database/library spectrum, click the spectrum cartoon icon. The queried spectrum is on the left and the database/library spectrum is on the right.
While library/database matches are a good indication that the compound of interest has been previously characterized, it is suggested that manual comparison of the MS2 spectra and/ or commercial compound is used for verification.