Added by Mingxun Wang, last edited by Mingxun Wang on May 04, 2015  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Table of Contents


Public MassIVE datasets

The public MassIVE datasets are complete datasets that published at massive.ucsd.edu optionally alongside paper publications. The MassIVE repository provides a location for researches to access datasets that have been made available by others. However, these MassIVE repository will not merely function as a file server and a data graveyard.

These datasets remain alive long after publication. At GNPS, users will be able to browse datasets, download datasets, and comment on datasets. These comments can be accompanied with new data or new analysis that enriches the MassIVE dataset. Additionally, users can subscribe to specific datasets, for while the underlying data might not be changing, the understanding of the data will. To continuously learn more about each dataset, each is searched against the ever growing public reference annotated spectral libraries and new identifications are reported to subscribers. Beyond new identifications within a dataset, subscribers will also be made aware of other datasets that exhibit similarities to the subscribed dataset. This allows for users to be connected via their interest in similar datasets.

Browsing MassIVE Datasets

Public MassIVE datasets specific to GNPS can be found here. By default, GNPS datasets are filtered under the Title column. If users wish to view all MassIVE datasets and not just GNPS datasets, the GNPS filter under Title can be removed and hit the Filter button.

Interacting with MassIVE Data

Download MassIVE Data

These MassIVE datasets are available to download in their raw form by clicking on the MassIVE ID:

and the FTP link:

Dataset Information

Upon clicking on the dataset link, users are brought to a dataset page with information such as:

Users can see complete information about the particular dataset.

Computing on MassIVE Data

While downloading the data is nice, users are also able to leverage the tools available at GNPS and compute on the converted mzXML versions of the MassIVE data.

To import a MassIVE dataset to compute on, simply click the Import Dataset to Analyze button, or to go directly to molecular networking and import users can click Import and Analyze Dataset with Networking Now.

NOTE: MassIVE Datasets will not be available to computed upon up to 48 hours after submissions as background conversion and processing must occur.

Dataset Comments

Viewing Comments

In this view, all comments are displayed in this table, and users can click View to view the particular comment's attachments:

Making Comments

To contribute comments, users can click on the Comment on Dataset link:

Metadata/Publication Updates

To add additional metadata per dataset, click on Update/Add Metadata. This will redirect users to a massive.ucsd.edu page that will allow them to update the appropriate metadata. Additionally, to add publications to this particular dataset, users can click the “Add Publication” link to add publications associated with the particular dataset.

Dataset Subscriptions

Beyond individual users being able to compute on MassIVE data, GNPS periodically computes on all MassIVE data. Thus the information associated with each MassIVE dataset is constantly changing. Users can subscribe to be aware of the changes of information known about each dataset. These subscriptions sign the user up to receive continuous identification digest emails regarding changes in identifications on that dataset.  To subscribe/unsubscribe users can simply toggle the Subscribe/Unsubscribe button.

Browsing Continuous Identification Results

With GNPS continually computing and identifying new spectra in MassIVE datasets, there must be a way to present these results in an easily view-able format to users. For each dataset, all the previously run continuous identification jobs are listed, and users can view the results (including at a glance how many identifications were made on a specific day):

Users can click the “View” link for a continuous ID job on a dataset, and will be taken to a status page. From here to browse all identifications, users can click on the “All Identifications (Beta)” link to view all identifications. Some new features here are still experimental but soon will be moved to the “All Identifications” link. The organization of the results of identifications can be found here: Dereplication Documentation , as it is very similar to the results of the dereplication workflow. There is however one key feature that is present in continuous identification: identification ratings. Users are able to browse the results and rate the accuracy of the identification. The scale is as follows:

Rating Description
4 stars correct match as context is right (i.e., molecule is known/expected to be in the sample)
3 stars compound class match – at least part of the structure makes sense to match
2 stars cannot tell – might be correct from the spectrum match and context but there is not enough information to tell
1 star incorrect: molecule does not make sense in this context
No stars No Rating

Users will be able to both add their own rating, add a comment for their given rating, as well as view the average rating of the identification. 

Continuous ID Subscription Email

This example continuous identification email informs subscribers that the dataset of interest has more IDs in the most recent round of continuous identifications. It will first list the title of the dataset, then the changes in identification counts, and finally direct links to explore the data. Users can go directly to the search results and view the new, different, and deleted identifications as well as go straight to the dataset page itself.

Related Datasets

Users can find related MassIVE datasets to the current one. Currently relatedness of datasets is determine by the number of shared identified compounds between the two. Users can see a view like this:

Contributing New MassIVE Datasets

Please submit new massive datasets here.