SPS docs - analysis

Analysis of MS/MS data with the SPS / Spectral Networks package

The SPS analysis of a dataset starts by executing "main_specnets sps.params" from the command line.

Example command lines, using "<sps_dir>" to denote the path to SPS/SpecNets binaries:

  • Run main_specnets on the current node: "<sps_dir>/bin/main_specnets sps.params"
  • Run main_specnets on an SGE compute node: "qsub -l h_vmem=1G <sps_dir>/bin/main_specnets sps.params -g"

Parameters

Parameter Value Description
-g Run on SGE
-ll Integer (0-9) Log level (9 for less information)
-lf File name Log file name
-i Stage name Initial stage

Examples

# Run project logging only errors, using parameters file 'sps.params'
~/sps/main_specnets -ll 9 -lf log.txt sps.params

# Run project on sge grid logging only errors, using parameters file 'sps34.params'
~/sps/main_specnets -ll 9 -lf log.txt sps34.params -g

# Run project logging errors and warnings, using parameters file 'sps.params'
~/sps/main_specnets -ll 5 -lf log.txt sps.params -s

A mode detailed description of how to run and visualizer a project in both ccms and usa can be found [here]

Parameter files

All parameter values, including the name of the file(s) containing the MS/MS spectra, are specified in the parameters file sps.params. Of course, you can choose any file name for the parameters file and multiple parameters files can coexist in the same directory.
The parameters file is a text file where comment lines start with '#', empty lines are ignored and parameters are specified using the format PARAMETER_NAME=PARAMETER_VALUE. The valid parameter names and ranges of values are given below.

Main parameters (required)

Parameter name Valid values Description
INPUT_SPECS_MS Any valid file name Names of the files containing the MS/MS spectra. Valid file formats are MGF, mzXML, ms2 and multi-spectra pkl. Multiple file names should be separated by ';'.
FASTA_DATABASE Any valid file name Database of protein sequences in FASTA format.
EXE_DIR Any valid path The directory containing the SPS / Spectral Networks binaries and configuration files. e.g.: <install directory>/bin
Paths beginning with the tilde character (~) should be avoided, as gnuplot requires paths without ~ for specifying font file locations.
AMINO_ACID_MASSES Any valid amino acid masses file
Used to select amino acid masses by fixed Cysteine blocking group: No blocking (set to AA_standard.txt), blocked with IAA (set to AA_cys_iaa.txt) or blocked with NIPIA (set to AA_cys_nipia.txt)
CPUS Any integer greater than 0 Number of CPUs available for SPS main process for parallel processing (not to be confused with GRID_NUMNODES, which specifies the number of nodes for distributed processing). Defaults to 1.
Currently, used by the reports section to generate static HTML report pages.

enable interactive HTML reports allowing for user visualization and annotation of contigs and spectra.

GRID_EXE_DIR
Any valid path
Path to SPS/SpecNets binaries on SGE compute nodes. Default value is "" (empty)
GRID_NUMNODES
Any integer >= 0
Number of SGE jobs to launch per SPS job. Default value is zero (no SGE grid node available)
GRID_PARAMS
String
Parameters to be passed directly to SGE. Default is "-l h_vmem=1G", which specifies the memory quota per SPS/SpecNets SGE job of 1 gigabyte. 
GRID_SGE_EXE_DIR
Any valid path
Path to SGE binaries (e.g., qsub) on SGE compute nodes

Report related parameters (optional)

Parameter name Valid values Default
Description
REPORT_JOB String   Job description. Used to fill the report header information.
REPORT_USER String   User name. Used to fill the report header information.
REPORT_DIR Any absolute path
report Output directory for report files (it will be created if non-existent).
REPORT_DYNAMIC 0/1 1 Specifies if dynamic or static reports are used. 0 means static reports are generated.
REPORT_SERVER URL   Specifies CGI directory location on server. Ex: REPORT_SERVER=http://myserver.com/cgi-bin/. Mandatory if REPORT_DYNAMIC=1.
REPORT_CELLS_PER_LINE Value > 0 20 Specifies the number of aminoacids displayed per line in report coverage pages. Defaults to 20.
REPORT_MSMS_IMAGES
0/1 1 Include MS/MS images in reports. 1 means MS/MS images are generater. Ignored by dynamic reports.
PROJECT_DIR Any valid path
. Project data files location in file system when generating the report.
REPORT_DIR_SERVER
Any valid path
PROJECT_DIR Project data files location in the report server. Used to specify the directory where the project is after relocation.
Paths starting with ~ should be avoided as they may default to the Apache process' home directory.
REPORT_PWD any string   Report password.

Optional parameters

Parameter name Valid values Default
Description
RELATIVE_DIR
0/1 1 Specifies if paths should be treated as relative. If set to 0, the project directory absolute path is calculated and prepended to all relative paths to files used and generated internally. This is useful in environments where absolute paths are needed.
PEPNOVO_PTMS string empty
string
Specifies known PTMs that should be considered when generating scored PRM spectra from MS/MS spectra. C+57 and M+16 are always considered even if this parameter is not specified. PTMs do not need to be specified here to be reported in resulting analysis since SPS is fully configured to handle unknown PTMs for both de novo sequencing and database matching of de novo sequences. CAUTION: Specifying multiple PTMs beyond the most abundant can significantly degrade the quality of PRM spectra (and resulting analysis) - it is not recommended to specify more than six.

Each PTM identifier must be separated by a ":" - See here for a list of all current PTM identifiers and how to specify new ones.
TOLERANCE_PEAK 0.0 - 0.4 0.4
Peak mass tolerance (in Daltons).
TOLERANCE_PM 0.0 - 3.0 1.5
Parent mass tolerance (in Daltons).
TOLERANCE_PM_PPM
any number > 0 none if not specified Parent mass ppm tolerance. This is only used for clustering with PrmClust.
RANK_FILTER any integer >0
-1 Rank filters raw MS/MS spectra by removing any peaks not in the top k scoring peaks within +/- 56 Da
CLUST_RANK_FILTER
any integer >0
-1 Rank filters clustered MS/MS spectra by removing any peaks not in the top k scoring peaks within +/- 56 Da
PRM_RANK_FILTER
any integer >0 -1 Rank filters PRM spectra by removing any peaks not in the top k scoring peaks within +/- 56 Da
CORRECT_PM yes/no
no
Correct MS/MS spectra parent mass.
GUESS_CHARGE yes/no
no
Guess MS/MS spectra precursor charge.
MIN_SPECTRUM_QUALITY 0.0 - 1.0 0.15
MS/MS spectra with inferior quality scores are discarded.
CLUSTER_MIN_SIZE Any integer >=0 1
Minimum number of spectra per cluster to retain cluster-consensus spectrum for further analysis. Set to zero to disable clustering.
CLUSTER_TOOL
PrmClust/
MSCluster
MSCluster
Which clustering tool to execute if CLUSTER_MIN_SIZE > 0. PrmClust clusters PRM spectra after PepNovo PRM scoring, so it can work w/ any type of input spectra. MSCluster only works for unpaired CID or HCD spectra.
MERGE_SAME_PREC 0/1 0 If =1, then try to merge consecutive scans with the exact same parent mass before invoking PrmClust. Corroborating peaks in CID/ETD and HCD/ETD will get boosted scores.
PRM_CLUSTER_RATIO
Any number between 0 and 1 0.72 Minimum allowable matched intensity ratio of PRM spectra from the same precursor (used by PrmClust).
MAX_MOD_MASS Any number >0 100
Maximum mass for a post-translational modification (in Daltons). Use absolute values for negative mass offsets (e.g. loss of water).
MIN_OVERLAP_AREA 0.0 - 1.0 0.45
Minimum percentage of overlapping mass between two spectra to compute spectral alignments. Lower values allow for the detection of small overlaps but lead to longer run times; usually not set to less than 0.4.
MIN_RATIO 0.0 - 1.0 0.35
Minimum percentage of matched peak scores in a spectral alignment.
MIN_PEAK_INT any float > 0 50.0 Minimum peak intensity in MS2 spectrum before MSCluster
MIN_MATCHED_PEAKS Any integer >0 4
Minimum number of matched peaks in a spectral alignment.
MAX_PVALUE 0.0 - 1.0 0.05
Maximum p-value to accept spectrum/spectrum alignment. Default value is 0.05 (may be too strict for datasets with small number of spectra).
FILTER_PRECURSOR_WINDOW
0/1 0 Removes peaks with (-20,+15) Th of the theoretical precursor mass.
FILTER_TRIGS yes/no
yes
Determines whether spectral alignments need to be confirmed by transitive closure. If set to "yes" then a spectral alignment between spectra A,B is only accepted if there are at least two other alignments A,C and B,C with consistent alignment offsets. Default is "yes", should be set to "no" for spectral networks projects.
MIN_MATCHED_PEAKS_DB Any integer >=4 6
Minimum number of matched peaks when aligning contig sequences against the FASTA database.
CLUSTALW_MINSCORE Any number >0 250
Minimum ClustalW score to transfer contig/database alignments between database proteins using ClustalW protein/protein alignments (see Bandeira et al., Nature Biotechnology 2008 for details).
MIN_METACONTIG_SIZE
Any integer >=0 0 Minimum number of contigs per meta-contig after MetaSPS alignment/assembly (0 indicates not to run MetaSPS)
MIN_METACONTIG_SCORE
Any number >0 3.3 Minimum allowable score of overlaps between contigs during MetaSPS alignment/assembly (defined as minimum overlapping ratio of scores multiplied by number of matching peaks)
TAG_LEN Any integer >=3 6
Length of the sequence tags used for matching spectra/contigs against the FASTA database.
DOUBLE_AA_JUMPS Any integer >=0 1
Maximum number of gaps allowed in sequence tags used for matching spectra/contigs against the FASTA database.
TAG_MATCH_TOP_SCORING_ONLY 0/1
1
Specifies if only the top scoring tag is used for matching the spectra/contig against the FASTA database.
MAX_NUM_TAGS Any integer >=0 0
Maximum number of tags created per spectrum for matching spectra/contigs against the FASTA database. Only used if TAG_MATCH_TOP_SCORING_ONLY is 1.
MATCH_TAG_FLANKING_MASSES 0/1/2
0
Specifies if tag matches must also match flanking matches on either or both sides of the tag match. 0 specifies no flanking mass match is required. 1 means that one of the two flanking masses must match. 2 specifies both flanking matches must match.
KNOWN_MODS_FILE
Any valid path N/A
Specifies the path to the file containing the known modifications for penalty alignment
BLOSUM_PENALTY_FILE Any valid path N/A Specifies the path to the file containing the BLOSUM-style mutation penalties for penalty alignment
MAX_PARSIMONY 0/1
1
Specifies whether or not to perform a maximum parsimony reduction on the FASTA database after tag search and before contig-protein alignment.
PENALTY_ALIGNMENT 0/1
0
Specifies whether or not to use the new penalty-based alignment for contig-protein and  spectra-protein alignment.
ENFORCE_ENDPEAKS 0/1
1
Specifies whether or not to enforce the alignment of the end peaks of the peptide when performing contig and spectrum alignments.
MAX_MOD_MASS Any integer >=0 100
Maximum allowable mass of a modification during contig and spectrum alignments.
MIN_MOD_MASS Any integer <=0 -100
Minium allowable mass of a modification during contig and spectrum alignments.
MAX_ALIGN_GAP_SIZE Any integer >=1 8
Maximum allowable gap in the database protein sequence that will be attempted to be aligned during penalty-based alignment. Requires PENALTY_ALIGNMENT = 1.
This should not be modified by normal users.
MIN_PENALTY_FREQUENCY 0.0 - 1.0
0.01
Minimum frequency of automatically detected modifications to be considered when computing automatic known modification penalties. Requires PENALTY_ALIGNMENT = 1.
This should not be modified by normal users.
PEAK_EQUIVALENTS Any floating point > 0.0
2.0
Requires PENALTY_ALIGNMENT = 1.
This should not be modified by normal users.
PENALTY_ALIGNMENT_ALPHA Any floating point > 0.0 2.0
Requires PENALTY_ALIGNMENT = 1.
This should not be modified by normal users.
PENALTY_ALIGNMENT_BETA=1000.0 Any floating point > 0.0 2.0
Requires PENALTY_ALIGNMENT = 1.
This should not be modified by normal users.
INPUT_SPEC_IDS file   Path to peptide-spectrum-matches file containing IDs for INPUT_SPECS_MS. Used to generate sequencing statistics after reports, see [statprotseqs description]
SPEC_ID_FORMAT msgfdb/inspect msgfdb see [statprotseqs description]
STATS_MIN_CONTIG_AA_TAG
integer >= 0 0 see [statprotseqs description]
STATS_MIN_CONTIG_DB_MP
integer >= 0 0 see [statprotseqs description]
STATS_ENDS_CHOP
integer >= 0 0 see [statprotseqs description]
ALIGNGF 0/1 0 If 1, AlignGF is applied to assess the significance of spectral alignments. This option would work only when PARTIAL_OVERLAPS=0.
Recommendation: Use MIN_RATIO=0.15
ALIGNGF_MAX_PVALUE
0.0~1.0 1.0E-6 Maximum AlignGF p-value to accept spectrum/spectrum alignment. Recommendation: 1.0E-10
MAX_PM_NUM_COMPONENT
Any integer >=0
0 If>0, the number of unique masses of precursor ions in a single component should be equal to or less than the specified value.
FILTER_PAIR_EDGE_FDR 0.0~1.0
0.01 Retain only pairs at the specified FDR. Requirement: MSGF-DB PSM file, where Scan# column should be matched to indice of specs_ms.mgf
REMOVABLE_BAD_CLUSTER_SIZE
Any integer >=0 
0 If>0, then try to remove repeated low-quality spectra of the same peptide even after ms-clustering. Only the clusters whose sizes are less than or equal to the specified value would be removed.

Optional FT CID, HCD or ETD Parameters

Parameter name
Valid values
Default
DECONV_MS2 
0/1 0
TOLERANCE_PEAK_PPM
any float none
INSTRUMENT_TYPE 
IT/FT IT
CLUSTER_TOOL
MSCluster/PrmClust MSCluster
TOLERANCE_PM_PPM
any float none

Optional SILAC Parameters

Parameter name
Valid values
Default
Description
BOOST_SILAC_PRMS
0/1 0 If 1, look for light/heavy pairs of spectra (<= 100 scans apart) w/ 8 or 10 Da offset parent mass and boost PRM scores of matched PRM/SRM peaks. Boosted peaks will have 3x the combined score of the pair.
SILAC_SCAN_RANGE
any integer 100 When finding SILAC pairs, this is the number of consecutive scans to check after each unpaired light/heavy spectrum. If <= 0 or > the number of spectra, this will be set to the number of spectra.
FILTER_NONSILAC_PRMS
0/1 0 If BOOST_SILAC_PRMS=1, this will remove PRM spectra that do not have a light/heavy pair
MIN_SILAC_COSINE
0.0 - 1.0 0.4 If BOOST_SILAC_PRMS=1, minimum allowable cosine score between light/heavy and same-spectrum pairs

Parameters for dynamic reports on ccms-fe.ucsd.edu

The simplest way to access your report using the ccms web server is by including the following parameters in your reports

REPORT_DIR=./report
REPORT_SERVER=http://ccms-web01.ucsd.edu/cgi-bin/
REPORT_DYNAMIC=1

And create a symbolic link in the directory /data/sps_data, using the command:

ln -s /data/home/user/<project_directory>/report /data/sps_data/<my_report_name>

Which will create a symbolic link named <my_report_name> under /data/sps_data
On your browser, type:

http://http://ccms-web01.ucsd.edu/sps/<my_report_name>

Example parameters file

Below are some examples of parameters files, taken from the test that lie in quality control (qc) directory.

If these examples are to be copied and used in a particular case, at least the lines that contain paths should be changed to reflect the project and executables location, as well as the target directory where the report will lie under the usa web server.

  • Params file from test_sps_small
# System parameters

# INITIAL_STAGE=alignment
# Path to required executable files, e.g. "/usr/local/specnets/bin" or "C:/specnets/bin" (without quotation marks)
EXE_DIR=svn/sps/trunk


# Report parameters
REPORT_JOB=test_sps_small
REPORT_USER=ccms
REPORT_DIR=./report
REPORT_SERVER=http://usa.ucsd.edu:8080/cgi-bin/
REPORT_DYNAMIC=1
REPORT_DIR_SERVER=/data/sites/projects/sps/jcanhita/test_sps_small
CPUS=4


# AMINO_ACID_MASSES=AA_cys_nipia.txt
AMINO_ACID_MASSES=svn/sps/trunk/AA_cys_nipia.txt

# Input files
INPUT_SPECS_MS=../data/test_data_aBTLA/aBTLA_hybrid_HC_DTT_IAA_chymotryp_30min_100407.mgf;
FASTA_DATABASE=../data/test_data_aBTLA/homolog_prots.fasta

CLUSTER_MIN_SIZE=1
CLUSTER_MODEL=LTQ_TRYP

MIN_SPECTRUM_QUALITY=0.15
CORRECT_PM=no
# Set GUESS_CHARGE=no for Orbitrap runs
GUESS_CHARGE=no

# Parameters
# GRID_NUMNODES=-1
GRID_NUMNODES=20
GRID_EXE_DIR=svn/sps/trunk/ExecFramework
GRID_SGE_EXE_DIR=/opt/sge62/bin/lx26-amd64
MIN_OVERLAP_AREA=0.6
TOLERANCE_PEAK=0.1
TOLERANCE_PM=1.0
FILTER_TRIGS=yes
MAX_MOD_MASS=100
MIN_RATIO=0.4
MAX_PVALUE=0.5
MIN_MATCHED_PEAKS=4
MAX_AA_JUMP=2

# Tag-based database search
TAG_LEN=3

ENFORCE_ENDPEAKS=0
MAX_PARSIMONY=0
PENALTY_ALIGNMENT=1
PENALTY_ALIGNMENT_ALPHA=3.0
PENALTY_ALIGNMENT_BETA=3.0
  • Params file from test_sps_small_GenoMS
# System parameters

# INITIAL_STAGE=alignment
# Path to required executable files, e.g. "/usr/local/specnets/bin" or "C:/specnets/bin" (without quotation marks)
EXE_DIR=svn/sps/trunk

# Report parameters
REPORT_JOB=test_small_genoMS
REPORT_USER=ccms
REPORT_DIR=./report
REPORT_SERVER=http://usa.ucsd.edu:8080/cgi-bin/
REPORT_DYNAMIC=1
REPORT_DIR_SERVER=/data/sites/projects/sps/jcanhita/test_small_genoMS
CPUS=4


# AMINO_ACID_MASSES=AA_cys_nipia.txt
AMINO_ACID_MASSES=svn/sps/trunk/AA_cys_nipia.txt


# Input files
INPUT_SPECS_MS=../data/aBTLA_hybrid_HC_DTT_IAA_chymotryp_30min_100407.mgf;
FASTA_DATABASE=../data/homolog_prots.fasta


CLUSTER_MIN_SIZE=1
CLUSTER_MODEL=LTQ_TRYP
MIN_SPECTRUM_QUALITY=0.15
CORRECT_PM=no
# Set GUESS_CHARGE=no for Orbitrap runs
GUESS_CHARGE=no


# Parameters
# GRID_NUMNODES=-1
GRID_NUMNODES=20
GRID_EXE_DIR=svn/sps/trunk/ExecFramework
GRID_SGE_EXE_DIR=/opt/sge62/bin/lx26-amd64
MIN_OVERLAP_AREA=0.6
TOLERANCE_PEAK=0.1
TOLERANCE_PM=1.0
FILTER_TRIGS=yes
MAX_MOD_MASS=100
MIN_RATIO=0.4
MAX_PVALUE=0.5
MIN_MATCHED_PEAKS=4
MAX_AA_JUMP=2


# Tag-based database search
TAG_LEN=3
ENFORCE_ENDPEAKS=0
MAX_PARSIMONY=0
PENALTY_ALIGNMENT=1
PENALTY_ALIGNMENT_ALPHA=3.0
PENALTY_ALIGNMENT_BETA=3.0


#genoMS params
DBCOMBINED=svn/sps/trunk/DBs_GenoMS/IMGT_20120213_HC_LC.fasta
TEMPLATECONSTRAINTFILE=svn/sps/trunk/DBs_GenoMS/IMGT_20120213_HC_LC.constraints
FIXEDMOD=C,+57
PEAK_PENALTY=1
  • Params file from test_sps_small_MetaSPS
# System parameters

# INITIAL_STAGE=alignment
# Path to required executable files, e.g. "/usr/local/specnets/bin" or "C:/specnets/bin" (without quotation marks)
EXE_DIR=svn/sps/trunk


REPORT_USER=test_sps_small_MetaSPS
REPORT_JOB=ccms
REPORT_DIR=./report
REPORT_SERVER=http://usa.ucsd.edu:8080/cgi-bin/
REPORT_DYNAMIC=1
REPORT_DIR_SERVER=/data/sites/projects/sps/jcanhita/test_sps_small_MetaSPS
CPUS=4


# AMINO_ACID_MASSES=AA_cys_nipia.txt
AMINO_ACID_MASSES=svn/sps/trunk/AA_cys_nipia.txt

# Input files
INPUT_SPECS_MS=../data/aBTLA_hybrid_HC_DTT_IAA_chymotryp_30min_100407.mgf;
FASTA_DATABASE=../data/homolog_prots.fasta

CLUSTER_MIN_SIZE=1
CLUSTER_MODEL=LTQ_TRYP

MIN_SPECTRUM_QUALITY=0.15
CORRECT_PM=no
# Set GUESS_CHARGE=no for Orbitrap runs
GUESS_CHARGE=no


# Parameters
# GRID_NUMNODES=-1
GRID_NUMNODES=20
GRID_EXE_DIR=svn/sps/trunk/ExecFramework
GRID_SGE_EXE_DIR=/opt/sge62/bin/lx26-amd64
MIN_OVERLAP_AREA=0.6
TOLERANCE_PEAK=0.1
TOLERANCE_PM=1.0
FILTER_TRIGS=yes
MAX_MOD_MASS=100
MIN_RATIO=0.4
MAX_PVALUE=0.5
MIN_MATCHED_PEAKS=4
MAX_AA_JUMP=2


# Tag-based database search
TAG_LEN=3

ENFORCE_ENDPEAKS=0
MAX_PARSIMONY=0
PENALTY_ALIGNMENT=1
PENALTY_ALIGNMENT_ALPHA=3.0
PENALTY_ALIGNMENT_BETA=3.0

MIN_METACONTIG_SCORE=3.3
MIN_METACONTIG_SIZE=1
  • Params file from test_sps_small_MetaSPS
# System parameters

# INITIAL_STAGE=alignment
# Path to required executable files, e.g. "/usr/local/specnets/bin" or "C:/specnets/bin" (without quotation marks)
EXE_DIR=svn/sps/trunk

# Report parameters
REPORT_USER=test_sps_small_noclusters
REPORT_JOB=ccms
REPORT_DIR=./report
REPORT_SERVER=http://usa.ucsd.edu:8080/cgi-bin/
REPORT_DYNAMIC=1
REPORT_DIR_SERVER=/data/sites/projects/sps/jcanhita/test_sps_small_noclusters
CPUS=4


# AMINO_ACID_MASSES=AA_cys_nipia.txt
AMINO_ACID_MASSES=svn/sps/trunk/AA_cys_nipia.txt

RESOLUTION=1

# Input files
INPUT_SPECS_MS=../data/aBTLA_hybrid_HC_DTT_IAA_chymotryp_30min_100407.mgf;
FASTA_DATABASE=../data/homolog_prots.fasta

CLUSTER_MIN_SIZE=1
CLUSTER_MODEL=LTQ_TRYP

MIN_SPECTRUM_QUALITY=0.15
CORRECT_PM=no
# Set GUESS_CHARGE=no for Orbitrap runs
GUESS_CHARGE=no


# Parameters
# GRID_NUMNODES=-1
GRID_NUMNODES=20
# GRID_NUMCPUS=2
GRID_EXE_DIR=svn/sps/trunk/ExecFramework
GRID_SGE_EXE_DIR=/opt/sge62/bin/lx26-amd64
MIN_OVERLAP_AREA=0.6
TOLERANCE_PEAK=0.1
TOLERANCE_PM=1.0
FILTER_TRIGS=yes
MAX_MOD_MASS=100
MIN_RATIO=0.4
MAX_PVALUE=0.5
MIN_MATCHED_PEAKS=4
MAX_AA_JUMP=2


# Tag-based database search
TAG_LEN=3

ENFORCE_ENDPEAKS=0
MAX_PARSIMONY=0
PENALTY_ALIGNMENT=1
PENALTY_ALIGNMENT_ALPHA=3.0
PENALTY_ALIGNMENT_BETA=3.0
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.