- Introduction to Molecular Networking
- Data Analysis on GNPS
- Mass Spectrometry Data Format
- Getting started with Molecular Networking
- Upload Data
- Selecting Data Input Files
- Molecular Network Parameters
- Basic Parameters
- Advanced Network Options
- Advanced Library Search Options
- Advanced Filtering Options
- Submit Workflow
- Download Data
- Advanced Features
Molecular networks are visual displays of the chemical space present in tandem mass spectrometry (MS/MS) experiments. This visualization approach can detect sets of spectra from related molecules (so-called spectral networks), even when the spectra themselves are not matched to any known compounds.
The visualization of molecular networks in GNPS represents each spectrum as a node, and spectrum-to-spectrum alignments as edges (connections) between nodes. Nodes can be supplemented with metadata, including dereplication matches or information that is provided by the user, such as abundance, origin of product, biochemical activity or hydrophobicity, which can be reflected in a node’s size or color. It is possible to visualize the map of related molecules as a molecular network.
A single chemical species is ideally represented as a node and the relatedness between spectra is represented as an edge. For more information, please check out the GNPS paper published in Nature Biotechnology by Ming et al, as well as the and sections.
This tutorial will cover the Molecular Networking workflow on GNPS. Once you have registered and logged into GNPS, the Molecular Networking workflow can be accessed at GNPS by clicking on Data Analysis.
GNPS: Wang, Mingxun, et al. "Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking" Nature Biotechnology 34.8 (2016): 828-837. PMID: 27504778. https://www.nature.com/nbt/journal/v34/n8/full/nbt.3597.html
If you use MS-Cluster in your molecular networks, cite the following also: Frank, Ari M., et al. "Clustering millions of tandem mass spectra." Journal of proteome research 7.01 (2007): 113-122. http://pubs.acs.org/doi/abs/10.1021/pr070361e
(a) Molecular networks are constructed from the alignment of MS/MS spectra to one another. Edges connecting nodes (MS/MS spectra) are defined by a modified cosine scoring scheme that determines the similarity of two MS/MS spectra with scores ranging from 0 (totally dissimilar) to 1 (completely identical). MS/MS spectra are also searched against GNPS spectral libraries, seeding putative node matches in the molecular networks. Networks are visualized online in-browser or exported for third-party visualization software such as Cytoscape. (b) An example alignment between three MS/MS spectra of compounds with structural modifications that are captured by modification-tolerant spectral matching used in variable dereplication and molecular networking. (c) In-browser molecular network visualization enables users to interactively explore molecular networks without requiring any external software. To date, >11,000 molecular networks have been analyzed using this feature. Within this interface, (i) users are able to define cohorts of input data and correspondingly, nodes within the network are represented as pie charts to visualize spectral count differences for each molecule across cohorts. (ii) Node labels indicate matches made to GNPS spectral libraries, with additional information displayed with mouseovers. These matches provide users a starting point to annotate unidentified MS/MS spectra within the network. (iii) To facilitate identification of unknowns, users can display MS/MS spectra in the right panels by clicking on the nodes in the network, giving direct interactive access to the underlying MS/MS peak data. Furthermore, alignments between spectra are visualized between spectra in the top right and bottom right panels to gain insight as to what underlying characteristics of the molecule could elicit fragmentation perturbations.
All mass spectrometry data files must be in a compatible, universal format (mzXML, mzML or mgf). For a basic overview of data conversion, click here.
mzXML files must be for now in 32-bit uncompressed formats.
From the main GNPS page, select the Data Analysis portal.
Provide a detailed title for your molecular network. This title will be helpful when you retrieve your data after the workflow is completed.
Before you can select your data files within the molecular networking workflow, they must first be uploaded. For data files smaller than 20 MB they may be uploaded directly to GNPS. For data files larger then 20 MB they must first be uploaded to ProteoSAFe using an ftp client. The preferred method for data file uploads is utilizing ProteoSAFe as this also has the added benefit of storing your data files for future use in the generation of other molecular networks.
Register and login at http://proteomics.ucsd.edu/ProteoSAFe/. The same login and password can be used for both GNPS and ProteoSAFe.
Open a ftp client such as FileZilla with the following input parameters; Host: ccms-ftp01.ucsd.edu followed by entering your user name and password for ProteoSAFe followed by enter.
Choose the location of your files within the local site drop down menu. Highlight the files or folders to upload and select upload by right clicking. You will then see the files being queued and transferred to ProteoSAFe.
Once all the files are uploaded using this method they will be available for you to use for your molecular networking workflows within GNPS.
Click any of the Select Input Files buttons adjacent to the Spectrum Files fields and then choose the Upload Files tab from the pop-up window.
At this point your data files can be selected by location within your file system or drag and dropped into the pop-up window for uploading.
Within the Workflow Selection header; provide your workflow selection with a detailed name.
Within the Basic Options header; to input your spectrum files, select the input files tab next to the Spectrum files (required) field. A pop-up window with three tabs will appear; Select Input Files, Upload Files, Share Files. If you previously uploaded your data files into the ProteoSAFe database, select the Input Files tab. If you need to upload your data via the web, select the Upload Files tab. Proceed by selecting the files you want to designate as Spectrum input files and choosing the correct tab, such as Spectrum Files G1, Spectrum Files G2, etc. If you are performing a detailed network with more than six groups, select the files you want to use for Group mapping, followed by the selecting the Group mapping tab and then select the files to use for Attribute mapping, followed by selecting the Attribute mapping tab. For more information on organizing your spectrum files, refer to thesection below entitled Organizing your Spectrum Files. You will then see your selected files by the corresponding folder and if these are correct, select Submit. For more information regarding advanced networking parameters, refer to the section below entitled Molecular Networking Parameters.
By default, files can be categorized into separate groups (G1, G2, etc.). For example, case and control or two different microbes can be separate groups. Using the basic options, only six groups can be created. Individual files or entire folders can be selected.
In addition to molecular networking the input data, users can choose to simultaneously search the input data against annotated reference spectra in a Spectral library. To select a Spectral library, proceed by selecting the files you want to designate as Spectral library input files and choosing the Library Files tab. Library files should only be libraries you have created or files that appear in the Selected Library Files Folder.
Click Finish Selection which will close the pop-up window.
|Precursor ion mass tolerance (PIMT)||Parameter used for MS-Cluster and spectral library search. Specify the precursor ions mass tolerance, in Daltons. This value influences the aforementioned clustering of nearly-identical MS/MS spectra via MS-Cluster. Note that the value of this parameters should be consistent with the capabilities of the mass spectrometer and the specific instrument method used to generated the MS/MS data.|| 0.02
|Fragment Ion Mass Tolerance (FIMT)||Parameters used for MS-Cluster, molecular networking, and MS/MS spectral library searches. For every group of MS/MS spectra being considered for clustering (consensus spectrum creation), this value specifies how much fragment ions can be shifted from their expected m/z values. Default value is ± 0.02 Da for high-resolution instruments (q-TOF, q-Orbitrap) and ± 2.0 Da for low-resolution instruments (ion traps, QqQ).|| 0.02
Table showing the Da/ppm equivalent depending on m/z range. Refer to it when selecting adjusting the Precursor Ion Mass Tolerance (PIMT) or Fragment Ion Mass Tolerance (FIMT) based on your MS experimental conditions and the targeted molecules.
|2.0 Da||0.5 Da||0.1 Da||0.05 Da||0.03 Da||0.025 Da||0.02 Da||0.0175 Da||0.015 Da||0.01 Da||0.0075 Da|
|m/z 200||10000 ppm||2500 ppm||500 ppm||250 ppm||150 ppm||250 ppm||100 ppm||87.5 ppm||75 ppm||50 ppm||37.5 ppm|
|m/z 500||4000 ppm||1000 ppm||200 ppm||100 ppm||60 ppm||49 ppm||40 ppm||35 ppm||29 ppm||20 ppm||15 ppm|
|m/z 1000||2000 ppm||500 ppm||100 ppm||50 ppm||30 ppm||25 ppm||20 ppm||17.5 ppm||15 ppm||10 ppm||7.5 ppm|
|m/z 1500||1333 ppm||333 ppm||66 ppm||33 ppm||20 ppm||16 ppm||13 ppm||11.6 ppm||10 ppm||6.6 ppm||5.0 ppm|
|m/z 2000||1000 pm||250 pm||50 ppm||25 ppm||15 ppm||12.5 ppm||10 ppm||8.75 ppm||7.4 ppm||5.0 ppm||3.75 ppm|
|Min Pairs Cos|| Minimum cosine score that must occur between a pair of consensus MS/MS spectra in order for an edge to be formed in the molecular network.
||0.7||Lower value will increase the size of the clusters by inducing the clustering of less related MS/MS spectra, higher value will limit do the opposite.|
|Minimum Matched Fragment Ion (MMFI)||Parameters used for molecular networking. Is the minimum number of common fragment ions that are shared by two separate consensus MS/MS spectra in order to be connected by an edge in the molecular network||6||A low value will permit linkages between spectra of molecules with few similar fragment ions, but it will result in many more less-related spectra being connected to the network. An higher value will do the opposite. Default value is 6, but note that this parameters should be adjusted depending on the experimental conditions for mass spectra acquisition (such as mode of ionisation, fragmentation conditions, and the mobile phase, ...), and the collision-induced fragmentation behavior of the molecules of interest within the samples. High molecular weight (MW) compounds, and compounds with more hetero-atoms will generally tend to produce more fragment ions. However, this rule cannot be systematized. For example, some lipids with high MW generate only few fragment ions.|
|Node TopK||Maximum number of neighbor nodes for one single node.||10||The edge between two nodes are kept only if both nodes are within each other's ‘TopK’ most similar nodes. For example, if this value is set at 20, then a single node may be connected to up to 20 other nodes. Keeping this value low makes very large networks (many nodes) much easier to visualize.|
|Minimum Cluster Size||Minimum number of MS/MS spectra in a consensus MS/MS spectra to be considered for molecular networking.||2 (>1 is preferred)||Requires MS-Cluster to be on.|
|Run MSCluster||Cluster MS/MS spectra before networking||Yes checked||MSCluster will analyze every MS/MS spectra resulting from ions that fall within the defined precursor ion mass tolerance (PIMT), and will merge the nearly-identical MS/MS spectra (above the cosine score) into a single consensus MS/MS spectrum. Each consensus MS/MS spectrum usually consist of multiple MS/MS spectra from across multiple LC-MS runs (or data files). See for more details: Frank, A. M. et al. Spectral Archives: Extending Spectral Libraries to Analyze Both Identified and Unidentified Spectra. Nat Meth 2011, 8 (7), 587--591 .|
|Maximum Number of Node in one Network|| Maximum size of nodes allowed in a single connected network
||Maximum size of nodes allowed in a single connected network. Nodes within a single connected molecular network will be separated by increasing cosine threshold for that specific connected molecular network. Default value is 100. Use 0 to allow an unlimited number of nodes in a single network. Note that with large datasets, or when a great number of related molecules are in the dataset, this value should be higher (or turn to 0) to retain as much information as possible. Downstream, these larger networks can be visualized using Cytoscape layout algorithms that can increase the intra-network clustering, allowing to visualize spectral groups in the network despite the number of nodes in the network.|
| Group Mapping
|| Input text file organizing input files into groups.
|| Input text file organizing input files into groups. used as a more flexible alternative to assigning groups during data input selection. Discussed in Advanced Features.
| Attribute Mapping
|| Attribute mapping eases visualization of different groups within cytoscape.
||Input text file organizing groups into attributes. These attributes are columns in the output. Discussed in Advanced Features.|
| Library Search Min Matched Peaks
||Minimum number of common fragment ions that MS/MS spectra should contain in order to be considered for spectral library annotation. Default value is 6, but note that this parameters should be tuned depending of the molecule of interest, and the experimental conditions (such as the ionisation mode, and the fragmentation conditions, ...). For example, collision-induced fragmentation of some lipids produce only few fragment ions. A lower value will allow clustering of MS/MS spectra containing less fragment ions, however it will also induce clustering of MS/MS spectra from different molecular-type to be connected in one network. An higher value will do the opposite.|| 6
| Score Threshold
||Minimum cosine score that MS/MS spectra should get in spectral matching with MS/MS spectral libraries in order to be considered an annotation.|| 0.7
| Filter stdDev Intensity
||Applied before MS-Cluster. For each MS/MS spectrum the 25% least intense fragment ions are collected and the std-dev is calculated as well as the mean. A minimum peak intensity is calculated as mean + k * std-dev where k is user selectable. All peaks below this threshold are deleted. By default filter is inactive (value is set to 0). This option is not encouraged.|| 0
||By default, no filter.|
|Minimum Fragment Ion Intensity||All fragment ions in the MS/MS spectrum below this raw intensity will be deleted. By default, no filter.|| 0
||Reduce to 0 if your data's raw intensities are very low.|
|Filter Precursor Ion Window||All peaks in a +/- 17 Da around precursor ion mass are deleted. By default, yes filter. This removes the residual precursor ion, which is frequently observed in MS/MS spectra acquired on qTOFs. By default, yes filter.||Yes|
|Filter Library||Apply peak filters to library||Yes|
| Filter peaks in 50Da Window
||Filter out peaks that are not top 6 most intense peaks in a +/- 50Da window||Yes||Turn off if your data is very small molecules as it might filter out a lot peaks in the lower mass ranges that might be signal.|
Click Submit to submit your workflow. You will be emailed when the workflow is complete.
To download your data, either click the link in the email or click the DONE link under Status on the Jobs menu page from GNPS.
Within the Status section, under Legacy Views, select View All Clusters with IDs.
From the new page that opens, select the Download tab and then choose Download. The Tab-Delimited Result Only and All fields should be selected automatically.
This will download your data as a zipped folder.
To learn how to create basic networks in Cytoscape see here
To learn how to perform analysis in GNPS see here.
To load the same settings and files into the workflow, choose the Clone function. This allows you to make iterative changes to the network settings. Remember to adjust the title of the workflow to reflect these changes.
Alternative to assigning groups when selecting data input files within the workflow of GNPS, a group mapping file and attribute mapping file can be created. These files allow more flexibility for data analysis and visualization, but requires a specific format. Users must create these file themselves using a text editor (e.g. Notepad++ for Windows, gedit for Linux, TextWrangler for Mac OS) and then designating the file extension as .group (though extension is irrelevant but must be a text file). Note: All .group files can be created, edited, saved and re-opened using a text editor.
Attribute and GROUP mapping can greatly ease the visualization and analysis of data within Cytoscape. The user must group their files using a group mapping file. The GROUPS are then organized into attribute mapping file. Attributes represent a higher level of organization than GROUPS. A GROUP may be organized into more than one Attribute.
Examples of Attributes (Source, Subject_ID, Location) and their corresponding GROUPS
Examples of corresponding GROUPS
Documentation for formatting can be found here.
You can download an example in a link below and edit it appropriately. Finally users will need to upload just as you would upload data input files and select it in the Groupings file upload selection.
Each group in the group mapping must be prefixed with GROUP_. For example:
GROUP_microbe1 signifies that the 'microbe1' group contains the files 1.mzXML and 2.mzXML and the 'microbe2' group contains 3.mzXML and 4.mzXML. Data files can be assigned to more than one group. For a group mapping file template, click here.
If you use the group mapping feature, all files can be selected for the G1 group in the basic workflow and the group mapping file is uploaded and selected in the same manner as selecting spectrum files.