Added by Mingxun Wang, last edited by Mingxun Wang on Sep 27, 2017  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Table of Contents

Introduction to Molecular Networking

Molecular networks are visual displays of the chemical space present in tandem mass spectrometry (MS/MS) experiments. This visualization approach can detect sets of spectra from related molecules (so-called spectral networks), even when the spectra themselves are not matched to any known compounds.
The visualization of molecular networks in GNPS represents each spectrum as a node, and spectrum-to-spectrum alignments as edges (connections) between nodes. Nodes can be supplemented with metadata, including dereplication matches or information that is provided by the user, such as abundance, origin of product, biochemical activity or hydrophobicity, which can be reflected in a node’s size or color. It is possible to visualize the map of related molecules as a molecular network.
A single chemical species is ideally represented as a node and the relatedness between spectra is represented as an edge. For more information, please check out the GNPS paper published in Nature Biotechnology by Ming et al.

This tutorial will cover the Molecular Networking workflow on GNPS. Once you have registered and logged into GNPS, the Molecular Networking workflow can be accessed at GNPS by clicking on Data Analysis.

Separate documentation is available for both Molecular Networking Analysis and Cytoscape 2.8 and Cytoscape 3.2

Citation

GNPS: Wang, Mingxun, et al. "Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking" Nature Biotechnology 34.8 (2016): 828-837. PMID: 27504778. https://www.nature.com/nbt/journal/v34/n8/full/nbt.3597.html

If you use MS-Cluster in your molecular networks, cite the following also: Frank, Ari M., et al. "Clustering millions of tandem mass spectra." Journal of proteome research 7.01 (2007): 113-122. http://pubs.acs.org/doi/abs/10.1021/pr070361e

Data Analysis on GNPS

(a) Molecular networks are constructed from the alignment of MS/MS spectra to one another. Edges connecting nodes (MS/MS spectra) are defined by a modified cosine scoring scheme that determines the similarity of two MS/MS spectra with scores ranging from 0 (totally dissimilar) to 1 (completely identical). MS/MS spectra are also searched against GNPS spectral libraries, seeding putative node matches in the molecular networks. Networks are visualized online in-browser or exported for third-party visualization software such as Cytoscape. (b) An example alignment between three MS/MS spectra of compounds with structural modifications that are captured by modification-tolerant spectral matching used in variable dereplication and molecular networking. (c) In-browser molecular network visualization enables users to interactively explore molecular networks without requiring any external software. To date, >11,000 molecular networks have been analyzed using this feature. Within this interface, (i) users are able to define cohorts of input data and correspondingly, nodes within the network are represented as pie charts to visualize spectral count differences for each molecule across cohorts. (ii) Node labels indicate matches made to GNPS spectral libraries, with additional information displayed with mouseovers. These matches provide users a starting point to annotate unidentified MS/MS spectra within the network. (iii) To facilitate identification of unknowns, users can display MS/MS spectra in the right panels by clicking on the nodes in the network, giving direct interactive access to the underlying MS/MS peak data. Furthermore, alignments between spectra are visualized between spectra in the top right and bottom right panels to gain insight as to what underlying characteristics of the molecule could elicit fragmentation perturbations.

Mass Spectrometry Data Format

All mass spectrometry data files must be in a compatible, universal format (mzXML, mzML or mgf). For a basic overview of data conversion, click here.

mzXML files must be for now in 32-bit uncompressed formats. 

Getting started with Molecular Networking

From the main GNPS page, select the Data Analysis portal.

Provide a detailed title for your molecular network. This title will be helpful when you retrieve your data after the workflow is completed.

Upload Data

Before you can select your data files within the molecular networking workflow, they must first be uploaded. For data files smaller than 20 MB they may be uploaded directly to GNPS. For data files larger then 20 MB they must first be uploaded to ProteoSAFe using an ftp client. The preferred method for data file uploads is utilizing ProteoSAFe as this also has the added benefit of storing your data files for future use in the generation of other molecular networks. 

Data upload method using the ProteoSAFe

Register and login at http://proteomics.ucsd.edu/ProteoSAFe/. The same login and password can be used for both GNPS and ProteoSAFe.

Open a ftp client such as FileZilla with the following input parameters; Host: ccms-ftp01.ucsd.edu followed by entering your user name and password for ProteoSAFe followed by enter.

Choose the location of your files within the local site drop down menu. Highlight the files or folders to upload and select upload by right clicking. You will then see the files being queued and transferred to ProteoSAFe.

Once all the files are uploaded using this method they will be available for you to use for your molecular networking workflows within GNPS. 

Data upload method via the web

Click any of the Select Input Files buttons adjacent to the Spectrum Files fields and then choose the Upload Files tab from the pop-up window.

At this point your data files can be selected by location within your file system or drag and dropped into the pop-up window for uploading. 

Importing Public Datasets to Analyze

From the Select Input Files popup, you will be a shares files tab. There you will find a box called Import Data Share. In this box you may enter an accession of a MassIVE dataset. Upon clicking import the dataset will appear in your workspace to select input files to analyze from.

Selecting Data Input Files

Within the Workflow Selection header; provide your workflow selection with a detailed name.

Within the Basic Options header; to input your spectrum files, select the input files tab next to the Spectrum files (required) field. A pop-up window with three tabs will appear; Select Input Files, Upload Files, Share Files. If you previously uploaded your data files into the ProteoSAFe database, select the Input Files tab. If you need to upload your data via the web, select the Upload Files tab. Proceed by selecting the files you want to designate as Spectrum input files and choosing the correct tab, such as Spectrum Files G1, Spectrum Files G2, etc. If you are performing a detailed network with more than six groups, select the files you want to use for Group mapping, followed by the selecting the Group mapping tab and then select the files to use for Attribute mapping, followed by selecting the Attribute mapping tab. For more information on organizing your spectrum files, refer to thesection below entitled Organizing your Spectrum Files. You will then see your selected files by the corresponding folder and if these are correct, select Submit. For more information regarding advanced networking parameters, refer to the section below entitled Molecular Networking Parameters. 

Organizing your Spectrum Files

By default, files can be categorized into separate groups (G1, G2, etc.). For example, case and control or two different microbes can be separate groups. Using the basic options, only six groups can be created. Individual files or entire folders can be selected.

In addition to molecular networking the input data, users can choose to simultaneously search the input data against annotated reference spectra in a Spectral library. To select a Spectral library, proceed by selecting the files you want to designate as Spectral library input files and choosing the Library Files tab. Library files should only be libraries you have created or files that appear in the Selected Library Files Folder.

Click Finish Selection which will close the pop-up window. 

Metadata table in GNPS

The metadata file describes the samples properties and allows more flexibility for data analysis and visualization than selecting data input files within the workflow (G1, G2, etc).

Metadata table. More information here [Recommended format].

Metadata for 3D molecular cartography with 'ili. More information [Optional].

Attribute and group mapping. More information here [legacy but supported version of metadata].

Molecular Network Parameters

Basic Parameters

Parameter Description Default 
Precursor ion mass tolerance (PIMT) Parameter used for MS-Cluster and spectral library search. Specify the precursor ions mass tolerance, in Daltons. This value influences the aforementioned clustering of nearly-identical MS/MS spectra via MS-Cluster. Note that the value of this parameters should be consistent with the capabilities of the mass spectrometer and the specific instrument method used to generated the MS/MS data. 0.02
Fragment Ion Mass Tolerance (FIMT) Parameters used for MS-Cluster, molecular networking, and MS/MS spectral library searches. For every group of MS/MS spectra being considered for clustering (consensus spectrum creation), this value specifies how much fragment ions can be shifted from their expected m/z values. Default value is ± 0.02 Da for high-resolution instruments (q-TOF, q-Orbitrap) and ± 2.0 Da for low-resolution instruments (ion traps, QqQ). 0.02

Table showing the Da/ppm equivalent depending on m/z range. Refer to it when selecting adjusting the Precursor Ion Mass Tolerance (PIMT) or Fragment Ion Mass Tolerance (FIMT) based on your MS experimental conditions and the targeted molecules.

  2.0 Da 0.5 Da 0.1 Da 0.05 Da 0.03 Da 0.025 Da 0.02 Da 0.0175 Da 0.015 Da 0.01 Da 0.0075 Da
m/z 200 10000 ppm 2500 ppm 500 ppm 250 ppm 150 ppm 250 ppm 100 ppm 87.5 ppm 75 ppm 50 ppm 37.5 ppm
m/z 500 4000 ppm 1000 ppm 200 ppm 100 ppm 60 ppm 49 ppm 40 ppm 35 ppm 29 ppm 20 ppm 15 ppm
m/z 1000 2000 ppm 500 ppm 100 ppm 50 ppm 30 ppm 25 ppm 20 ppm 17.5 ppm 15 ppm 10 ppm 7.5 ppm
m/z 1500 1333 ppm 333 ppm 66 ppm 33 ppm 20 ppm 16 ppm 13 ppm 11.6 ppm 10 ppm 6.6 ppm 5.0 ppm
m/z 2000 1000 pm 250 pm 50 ppm 25 ppm 15 ppm 12.5 ppm 10 ppm 8.75 ppm 7.4 ppm 5.0 ppm 3.75 ppm

Advanced Network Options

Parameter Description Default Notes
Min Pairs Cos Minimum cosine score that must occur between a pair of consensus MS/MS spectra in order for an edge to be formed in the molecular network.
0.7 Lower value will increase the size of the clusters by inducing the clustering of less related MS/MS spectra, higher value will limit do the opposite.
Minimum Matched Fragment Ion (MMFI) Parameters used for molecular networking. Is the minimum number of common fragment ions that are shared by two separate consensus MS/MS spectra in order to be connected by an edge in the molecular network 6 A low value will permit linkages between spectra of molecules with few similar fragment ions, but it will result in many more less-related spectra being connected to the network. An higher value will do the opposite. Default value is 6, but note that this parameters should be adjusted depending on the experimental conditions for mass spectra acquisition (such as mode of ionisation, fragmentation conditions, and the mobile phase,  ...), and the collision-induced fragmentation behavior of the molecules of interest within the samples. High molecular weight (MW) compounds, and compounds with more hetero-atoms will generally tend to produce more fragment ions. However, this rule cannot be systematized. For example, some lipids with high MW generate only few fragment ions.
Node TopK Maximum number of neighbor nodes for one single node. 10 The edge between two nodes are kept only if both nodes are within each other's ‘TopK’ most similar nodes.  For example, if this value is set at 20, then a single node may be connected to up to 20 other nodes.  Keeping this value low makes very large networks (many nodes) much easier to visualize.
Minimum Cluster Size Minimum number of MS/MS spectra in a consensus MS/MS spectra to be considered for molecular networking. 2 (>1 is preferred) Requires MS-Cluster to be on.
Run MSCluster Cluster MS/MS spectra before networking Yes checked MSCluster will analyze every MS/MS spectra resulting from ions that fall within the defined precursor ion mass tolerance (PIMT), and will merge the nearly-identical MS/MS spectra (above the cosine score) into a single consensus MS/MS spectrum. Each consensus MS/MS spectrum usually consist of multiple MS/MS spectra from across multiple LC-MS runs (or data files).  See for more details: Frank, A. M. et al. Spectral Archives: Extending Spectral Libraries to Analyze Both Identified and Unidentified Spectra. Nat Meth 2011, 8 (7), 587--591 .
Maximum Number of Node in one Network Maximum size of nodes allowed in a single connected network
100
Maximum size of nodes allowed in a single connected network. Nodes within a single connected molecular network will be separated by increasing cosine threshold for that specific connected molecular network. Default value is 100. Use 0 to allow an unlimited number of nodes in a single network. Note that with large datasets, or when a great number of related molecules are in the dataset, this value should be higher (or turn to 0) to retain as much information as possible. Downstream, these larger networks can be visualized using Cytoscape layout algorithms that can increase the intra-network clustering, allowing to visualize spectral groups in the network despite the number of nodes in the network.
Group Mapping
Input text file organizing input files into groups.
  Input text file organizing input files into groups. used as a more flexible alternative to assigning groups during data input selection. Discussed in Advanced Features.
Attribute Mapping
Attribute mapping eases visualization of different groups within cytoscape.
  Input text file organizing groups into attributes. These attributes are columns in the output. Discussed in Advanced Features.

Advanced Library Search Options

Parameter Description
Default
Library Search Min Matched Peaks
Minimum number of common fragment ions that MS/MS spectra should contain in order to be considered for spectral library annotation. Default value is 6, but note that this parameters should be tuned depending of the molecule of interest, and the experimental conditions (such as the ionisation mode, and the fragmentation conditions, ...). For example, collision-induced fragmentation of some lipids produce only few fragment ions. A lower value will allow clustering of MS/MS spectra containing less  fragment ions, however it will also induce clustering of  MS/MS spectra from different molecular-type to be connected in one network. An higher value will do the opposite. 6
Score Threshold
Minimum cosine score that MS/MS spectra should get in spectral matching with MS/MS spectral libraries in order to be considered an annotation. 0.7

Advanced Filtering Options

Parameter Description
Default
Notes
Filter stdDev Intensity
Applied before MS-Cluster. For each MS/MS spectrum the 25% least intense fragment ions are collected and the std-dev is calculated as well as the mean. A minimum peak intensity is calculated as mean + k * std-dev where k is user selectable. All peaks below this threshold are deleted. By default filter is inactive (value is set to 0). This option is not encouraged. 0
By default, no filter.
Minimum Fragment Ion Intensity All fragment ions in the MS/MS spectrum below this raw intensity will be deleted.  By default, no filter. 0
Reduce to 0 if your data's raw intensities are very low.
Filter Precursor Ion Window All peaks in a +/- 17 Da around precursor ion mass are deleted. By default, yes filter. This removes the residual precursor ion, which is frequently observed in MS/MS spectra acquired on qTOFs. By default, yes filter. Yes  
Filter Library Apply peak filters to library Yes  
Filter peaks in 50Da Window
Filter out peaks that are not top 6 most intense peaks in a +/- 50Da window Yes Turn off if your data is very small molecules as it might filter out a lot peaks in the lower mass ranges that might be signal.

Submit Workflow

Click Submit to submit your workflow.  You will be emailed when the workflow is complete.

Download Data

To download your data, either click the link in the email or click the DONE link under Status on the Jobs menu page from GNPS.

Within the Status section, under Legacy Views, select  View All Clusters with IDs.

From the new page that opens, select the Download tab and then choose Download. The Tab-Delimited Result Only and All fields should be selected automatically. 

This will download your data as a zipped folder.

Further Analysis

Cytoscape

To learn how to create basic networks in Cytoscape see here

Analysis in GNPS

To learn how to perform analysis in GNPS see here.

Advanced Features

Clone

To load the same settings and files into the workflow, choose the Clone function.  This allows you to make iterative changes to the network settings. Remember to adjust the title of the workflow to reflect these changes.