The Insilico Peptidic Natural Products Dereplicator is a bioinformatic tool that allows the annotation of known peptidic natural products in MS/MS data using in silico fragmentation tree.
Please cite the following if you are using Dereplicator:
Hosein Mohimani, Alexey Gurevich, Alla Mikheenko, Neha Garg, Louis-Felix Nothias, Akihiro Ninomiya, Kentaro Takada, Pieter C. Dorrestein, Pavel A. Pevzner, Dereplication of Peptidic Natural Products Through Database Search of Mass Spectra, Nature Chemical Biology, 2016, 13, 30--37 (2017) doi:10.1038/nchembio.2219
Step 1 : Go to http://gnps.ucsd.edu , and create/login to your account.
You have the option to import an existing GNPS dataset (by clicking on "Share Files") or upload your own data (by clicking on "Upload Files"). See the corresponding GNPS documentation for FTP upload.
After getting done with selecting files, click on Finish Selection.
Select a proper title for your job, and adjust parameters depending on your data and the mass spectrometer used.
-- Precursor Ion Mass Tolerance: This value specifies how much fragment ions can be shifted from their expected m/z values. Default value is ± 0.02 Da for high-resolution instruments (q-TOF, q-Orbitrap) and ± 0.5 Da for low-resolution instruments (ion traps, QqQ).
-- Fragment ion Mass Tolerance: This value specifies how much fragment ions can be shifted from their expected m/z values. Default value is ± 0.02 Da for high-resolution instruments (q-TOF, q-Orbitrap) and ± 0.5 Da for low-resolution instruments (ion traps, QqQ).
-- Search analog (VarQuest) [RECOMMENDED TO USE]: The VarQuest algorithm can search for analogs of known natural products in MS/MS data using Dereplicator. VarQuest is a modification-tolerant database search tool that discovers unexpected modifications in a blind mode. Since the computational space for variable PNP identification is computationally expensive, VarQuest first constructs a set of feasible PSM using a simple scoring approach and further filters out this set using a rigorous scoring method. Using VarQuest will increase the job time. The use of VarQuest analogs option is recommended as it expands the search space, and it reduces the overall run time.
-- PNP database : a choice of two databases Regular/Extended (Extended one is more informative but 2 times larger, i.e. two times slower processing)
-- Mimimum number of AA: Dereplicator is able to search for short peptides. This parameter determines what is the minimum number of amino acids, we consider for the compounds that are considered in Dereplicator search.
-- Accurate P-values : accurate (but slow) algorithm for P-value computation (MS-DPR), by default raw estimation is used.
-- Max charge: Maximum charge allowed.
ADVANCED DEREPLICATOR OPTIONS:
-- Max Isotopic Shift: When working with molecules with 1000 Da or higher masses, mass spectrometers commonly predict the precursor mass 1 or 2 Da higher than the monoisotopic mass due to the isotopic pattern. Setting Maximum isotopic shift to higher than 0 allows Dereplicator to consider these shifts in the precursor mass.
-- Adducts : Dereplicator can search in addition to protonated adducts, sodiated and potassium adducts.
ADVANCED VARQUEST OPTIONS:
-- Max Allowed Modification Mass : Maximum difference allowed between the precursor ion of an unknown modified peptide and the (dereplicated) known peptide.
-- Min Matched Peaks with Known Compound : Minimum number of common peaks between the MS/MS spectrum of an unknown modified peptide and the (dereplicated) known peptide.
Select an email. Then click on submit.
You will soon get an email with the link to the results when your job is finished. You can also check the status of your job in "Jobs" section.
The status of the job can be vizualized in your account job list, and through the link in provided in email you will received when the job is done.
The job can be Cloned, for further parameters modifications. Summary Statistics and Workflow Parameters can be consulted on the respective links.
Clic on View Unique Peptides (recommended), to have get the list of annotated molecules.
Clic on the View All PSM, to have detailed view of the peptides-to-spectrum matches.
Annotations can be sorted using various column (usually the score or the p-value), or a compound name can be searched with the filter function.
Clic on Show Annotation to see the experimental spectrum and the matching with the fragmentation tree of the Dereplicator database.
The experimental MS/MS spectrum can be viewed in the left panel. The blue peaks are fragment ions that matched the fragmentation tree of peptidic natural products in the database.
The structure of the molecule is displayed on the right.
In the Annotated fragments table, lists the ion fragments that matched with the fragmentation tree, along with their mass error, charge and intensities.
Clic on a blue peak to highlighted the corresponding fragment on the molecular structure, along with its properties.
More than 60% of Dereplicator annotations on GNPS Massive datasets (April 2016) were manually curated and results can be consulted here .
In order to confirm a Dereplicator annotation, one must:
- The MS/MS spectrum from the raw file must be inspected (consistency, noise level, ...)
- The adducts detected must be confirmed using the MS1 spectrum (mono-charged, di-charged ?, protonated adduct ?, ...)
- Main non-annotated fragment ions should be manually annotated.
- Look at the biological source(s) of the peptide (Google scholar, Dictionnary of Natural Products, AntiMarin, MarinLit, ...), to see if it is consistent with the sample.
- If genome sequence(s) are available, genome mining can be performed to search potential biogenetic gene clusters.
Please cite the following if you are using Dereplicator:
Hosein Mohimani, Alexey Gurevich, Alla Mikheenko, Neha Garg, Louis-Felix Nothias, Akihiro Ninomiya, Kentaro Takada, Pieter C. Dorrestein, Pavel A. Pevzner, Dereplication of Peptidic Natural Products Through Database Search of Mass Spectra, Nature Chemical Biology, 2016, in press.