MS-GF+

MS-GF+

(Download MS-GF+)

(How to migrate from MS-GFDB to MS-GF)

ChangeLog

Usage: java -Xmx3500M -jar MSGFPlus.jar
	-s SpectrumFile (*.mzML, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt)
	   Spectra should be centroided. Profile spectra will be ignored.
	-d DatabaseFile (*.fasta or *.fa)
	[-o OutputFile (*.mzid)] (Default: SpectrumFileName.mzid)
	[-t PrecursorMassTolerance] (e.g. 2.5Da, 20ppm or 0.5Da,2.5Da, Default: 20ppm)
	   Use comma to set asymmetric values. E.g. "-t 0.5Da,2.5Da" will set 0.5Da to the minus (expMass<theoMass) and 2.5Da to plus (expMass>theoMass)
	[-ti IsotopeErrorRange] (Range of allowed isotope peak errors, Default:0,1)
	   Takes into account of the error introduced by chooosing a non-monoisotopic peak for fragmentation.
	   On Windows, put the range inside "" (e.g. "0,1").
	   The combination of -t and -ti determins the precursor mass tolerance.
	   E.g. "-t 20ppm -ti -1,2" tests abs(exp-calc-n*1.00335Da)<20ppm for n=-1, 0, 1, 2.
	[-thread NumThreads] (Number of concurrent threads to be executed, Default: Number of available cores)
	[-tda 0/1] (0: don't search decoy database (Default), 1: search decoy database)
	[-m FragmentMethodID] (0: As written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD)
	[-inst InstrumentID] (0: Low-res LCQ/LTQ (Default), 1: High-res LTQ, 2: TOF, 3: Q-Exactive)
	[-e EnzymeID] (0: unspecific cleavage, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage)
	[-protocol ProtocolID] (0: NoProtocol (Default), 1: Phosphorylation, 2: iTRAQ, 3: iTRAQPhospho)
	[-ntt 0/1/2] (Number of Tolerable Termini, Default: 2)
	   E.g. For trypsin, 0: non-tryptic, 1: semi-tryptic, 2: fully-tryptic peptides only.
	[-mod ModificationFileName] (Modification file, Default: standard amino acids with fixed C+57)
	[-minLength MinPepLength] (Minimum peptide length to consider, Default: 6)
	[-maxLength MaxPepLength] (Maximum peptide length to consider, Default: 40)
	[-minCharge MinCharge] (Minimum precursor charge to consider if charges are not specified in the spectrum file, Default: 2)
	[-maxCharge MaxCharge] (Maximum precursor charge to consider if charges are not specified in the spectrum file, Default: 3)
	[-n NumMatchesPerSpec] (Number of matches per spectrum to be reported, Default: 1)
	[-addFeatures 0/1] (0: output basic scores only (Default), 1: output additional features)
Example (high-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 20ppm -ti "-1,2" -ntt 0 -tda 1 -o testMSGFPlus.mzid
Example (low-precision): java -Xmx3500M -jar MSGFPlus.jar -s test.mzXML -d IPI_human_3.79.fasta -t 0.5Da,2.5Da -ntt 0 -tda 1 -o testMSGFPlus.mzid
Parameters:
  • -s SpectrumFile (.mzML*, *.mzXML, *.mgf, *.ms2, *.pkl or *_dta.txt) - Required
    • Spectrum file name. Currently, MS-GF+ supports the following file formats: mzML, mzXML, mzML, mgf, ms2, pkl and _dta.txt.
    • We recommend to use mzML, whenever possible.
  • -d DatabaseFile (*.fasta or *.fa) - Required
    • Path to the protein database file. If the database file does not have auxiliary index files (*.canno, *.cnlcp, *.csarr, and *.cseq), MS-GF+ will create them.
    • When "-tda 1" option is used, the database specified here must contain only target protein sequences.
If multiple MS-GF+ processes access the same database file, it is strongly recommended to index the database prior to the database search by running BuildSA (see below).
  • -o OutputFile (*.mzid)
    • Filename where the output (mzIdentML 1.1 format) will be written.
    • File extension must be "mzid" (case sensitive).
    • By default, the output file name will be "[SpectrumFileName].mzid".
    • E.g. for the input spectrum file "test.mzML", the output will be written to "test.mzid" if this parameter is not specified.
  • -t ParentMassTolerance (Default: 20ppm)
    • Parent mass tolerance in Da. or ppm. There must be no space between the number and the unit. E.g. 2.5Da, 20ppm
    • To set asymmetric tolerances, use comma to separate left (experimental mass < theoretical mass) or right (experimental mass > theoretical mass) tolerances. E.g. 0.5Da,2.5Da
    • It is recommended to use a tight tolerance rather than a loose tolerance (e.g. for Orbitrap data, 10 or 20ppm usually identifies more spectra than 50ppm).
  • -ti IsotopeErrorRange (Default: 0,1)
    • Takes into account of the error introduced by choosing non-monoisotopic peak for fragmentation.
    • If the parent mass tolerance is equal to or larger than 0.5Da or 500ppm, this parameter will be ignored.
    • The combination of -t and -ti determins the precursor mass tolerance.
    • E.g. "-t 20ppm -ti -1,2" tests abs(exp-calc-n*1.00335Da)<20ppm for n=-1, 0, 1, 2.
  • -thread NumOfThreads (Number of concurrent threads to be executed, Default: Number of available cores)
    • Number of concurrent threads to be executed together.
    • Default value is the number of available logical cores (e.g. 8 for quad-core processor with hyper-threading support).
  • -tda 0/1 (0: don't search decoy database (default), 1: search decoy database to compute FDR)
    • Indicates whether to search the decoy database or not.
    • If 0, the decoy database is not searched.
    • If 1, FDRs are computed based on the target-decoy approach (i.e. reversed database is appended to the target database and MS-GF+ searches the combined database)
      • FDR(t) = #(DecoyPSMs with score equal or above t) / #(TargetPSMs with score equal or above t).
      • PSM: Peptide-Spectrum Match
      • -log(SpecProb) is used as the score to compute FDR.
If -tda 1 is specified, MS-GF+ automatically creates a combined target/reversed database file (DBFileName.revConcat.fasta). Thus, when specifying "-d" parameter, DatabaseFile must contain only target proteins.
  • -m FragmentationMethodID (0: as written in the spectrum or CID if no info (Default), 1: CID, 2: ETD, 3: HCD, 4: Merge spectra from the same precursor)
    • Fragmentation method identifier (used to determine the scoring model).
    • If the identifier is 0 and fragmentation method is written in the spectrum file (e.g. mzML files), MS-GF+ will recognize the fragmentation method and use a relevant scoring model.
    • If the identifier is 0 and there is no fragmentation method information in the spectrum (e.g. mgf files), CID model will be used by default.
    • If the identifier is non-zero and the spectrum has fragmentation method information, only the spectra that match with the identifier will be processed.
    • If the identifier is non-zero and the spectrum has no fragmentation method information, MS-GF+ will process all spectra assuming the specified fragmentation method.
    • If the identifier is 4, MS/MS spectra from the same precursor ion (e.g. CID/ETD pairs, CID/HCD/ETD triplets) will be merged and the "merged" spectrum will be used for searching instead of individual spectra. See Kim et al., MCP 2010 for details.
  • -inst InstrumentID (0: Low-res LCQ/LTQ (Default for CID and ETD), 1: High-res LTQ (Default for HCD), 2: TOF, 3: Q-Exactive)
    • Identifier of the instrument to generate MS/MS spectra (used to determine the scoring model).
    • For "hybrid" spectra with high-precision MS1 and low-precision MS2, use 0.
    • For usual low-precision instruments (e.g. Thermo LTQ), use 0.
    • If MS/MS fragment ion peaks are of high-precision (e.g. tolerance = 10ppm), use 2.
    • For TOF instruments, use 2.
    • For Q-Exactive HCD spectra, use 3.
    • For other HCD spectra, use 1.
  • -e EnzymeID (Default: 1)
    • Enzyme identifier. Trypsin (1) will be used by default.
    • 0: unspecific cleavage, 1: Trypsin (default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: glutamyl endopeptidase (Glu-C), 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: no cleavage
    • Use 9 for peptidomics studies
  • -p ProtocolID (Default: 0)
    • Protocol identifier. Protocols are used to enable scoring parameters for enriched and/or labeled samples.
    • 0: No protocol (Default)
    • 1: Phosphorylation: for phosphopeptide enriched samples
    • 2: iTRAQ: for iTRAQ-labeled samples
    • 3: iTRAQPhospho: for phosphopeptide enriched and iTRAQ-labeled samples
  • -ntt 0/1/2 (Number of tolerable (tryptic) termini, Default: 2)
    • This parameter is used to apply the enzyme cleavage specificity rule when searching the database.
    • Specifies the minimum number of termini matching the enzyme specificity rule.
      • For example, for trypsin, K.ACDEFGHR.C (NTT=2), G.ACDEFGHR.C (NTT=1), K.ACDEFGHI.C (NTT=1) and G.ACDEFGHR.C (NTT=0).
      • '-ntt 2' will search for fully tryptic peptides only.
    • By default, -ntt 2 will be used. Using -ntt 1 (or 0) will make the search significantly slower.
      The meaning and the default value has been changed after version 8442).
  • -mod ModificationFile (Default: standard amino acids with fixed C+57)]
    • Modification file name. ModificationFile contains the modifications to be considered in the search.
    • If -mod option is not specified, standard amino acids with fixed Carboamidomethylation C will be used.
    • Download an example modification file.
  • -minLength MinPepLength (Default: 6)
    • Minimum length of the peptide to be considered.
  • -maxLength MaxPepLength (Default: 40)
    • Maximum length of the peptide to be considered.
  • -minCharge MinPrecursorCharge (Default: 2)
    • Minimum precursor charge to consider. This parameter is used only for spectra with no charge.
  • -maxCharge MinPrecursorCharge (Default: 3)
    • Maximum precursor charge to consider. This parameter is used only for spectra with no charge.
  • -n NumMatchesPerSpec (Default: 1)
    • Number of peptide matches per spectrum to report.
    • Expected false discovery rates (EFDRs) will be reported only when this value is 1.
  • -addFeatures 0/1
    • If 0, only basic scores are reported.
    • If 1, the following features are reported
      • MS2IonCurrent: Summed intensity of all product ions
      • ExplainedIonCurrentRatio: Summed intensity of all matched product ions (e.g. b, b-H2O, y, etc.) divided by MS2IonCurrent
      • NTermIonCurrentRatio: Summed intensity of all matched prefix ions (e.g. b, b-H2O, etc.) divided by MS2IonCurrent
      • CTermIonCurrentRatio: Summed intensity of all matched suffix ions (e.g. y, y-H2O, etc.) divided by MS2IonCurrent
  • -showQValue 0/1
    • If 0, QValue and PepQValue are not reported.
    • If 1, QValue (PSM-level Q-value) and PepQValue (peptide-level Q-value) are reported (Default).
    • This parameter is ignored when "-tda 0".
MS-GF+ output

MS-GF+ outputs results as an mzIdentML (version 1.1) file. See http://www.psidev.info/mzidentml/ for details on the mzIdentML format. For every PSM, MS-GF+ reports the scores. 

  • MS-GF:RawScore: MS-GF+ raw score of the peptide-spectrum match 
  • MS-GF:DeNovoScore: the score of the optimal scoring peptide for the spectrum (not necessary in the database) (MS-GF:RawScore <= MS-GF:DeNovoScore)
  • MS-GF:SpecEValue: spectral E-value (spectrum level E-value) of the peptide-spectrum match - the lower the better
  • MS-GF:EValue: database level E-value (expected number of peptides in a random database having equal or better scores than the PSM score) - the lower the better
  • MS-GF:QValue
    • PSM-level Q-value estimated using the target-decoy approach.
    • MS-GF:QValue is computed solely based on MS-GF:SpecEValue.
  • MS-GF:PepQValue
    • Peptide-level Q-value estimated using the target-decoy approach.
    • Reported only if "-tda 1" is specified.
    • If multiple spectra are matched to the same peptide, only the best scoring PSM (lowest SpecProb) is retained. After that, MS-GF:PepQValue is calculated as #DecoyPSMs>s / #TargetPSMs>s among the retained PSMs. This approximates the Q-value of the set of unique peptides. In the MS-GF+ output, the same PepQValue value is given to all PSMs sharing the peptide. So, even a low-quality PSM may get a low PepQValue (if it has a high-quality "sibling" PSM sharing the peptide). Note that this should not be used to count the number of identified PSMs.
  • Using MzIDToTsv One can convert MS-GF+ output (*.mzid) into the tsv format
MS-GF+ output example

MzIdentML format 

TSV format  (converted by MzIDToTsv using MzIDToTsv)

MzIDToTsv

Converts MS-GF+ output (.mzid) into the tsv format (.tsv)

Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.ui.MzIDToTsv
	-i MzIDFile (MS-GF+ output file (*.mzid))
	[-o TSVFile] (TSV output file (*.tsv) (Default: MzIDFileName.tsv))
	[-showQValue 0/1] (0: do not show Q-values, 1: show Q-values (Default))
	[-showDecoy 0/1] (0: do not show decoy PSMs (Default), 1: show decoy PSMs)
	[-unroll 0/1] (0: merge shared peptides (Default), 1: unroll shared peptides)

Parameters:

  • -i MzIDFile
    • Path to the MS-GF+ result file (*.mzid)
  • -o TSVFile
    • Path to the tsv output file (*.tsv)
    • If not specified, for input MyFile.mzid, the output will be MyFile.tsv.
  • -showQValue 0/1
    • If 0, QValue and PepQValue are not be reported.
    • If 1, QValue and PepQValue are reported (Default).
  • -showDecoy 0/1
    • If 0, decoy PSMs will not be reported (Default).
    • If 1, decoy PSMs will be reported.
  • -unroll 0/1
    • This parameter controls the output format for shared peptides (peptides matched to multiple proteins).
    • When "-unroll 0" (Default), a PSM matched to a shared peptide will be printed as a single line.
      • Peptide column does not show neighboring amino acids (e.g. IGAYLFVDMAHVAGLIAAGVYPNPVPHAHVVTSTTHK).
      • Protein column shows all proteins in a single line.
      • Example: MyProtein(pre=K,post=T);MyProteinIsoform(pre=K,post=T)
      • Download example file
    • When "-unroll 1", a PSM matched to a shared peptide will be printed in multiple lines.
      • Peptide column shows neighboring amino acids (e.g. K.IGAYLFVDMAHVAGLIAAGVYPNPVPHAHVVTSTTHK.T).
      • Different peptide-protein matches are printed in different lines.
      • Download example file

BuildSA

Index a protein database for fast searching. 

Usage: java -Xmx3500M -cp MSGFPlus.jar edu.ucsd.msjava.msdbsearch.BuildSA 
	-d DatabaseFile (*.fasta or *.fa)
	[-tda 0/1/2] (0: target only, 1: target-decoy database only, 2: both)

Parameters:

  • -d DbPath
    • Name of a protein database (*.fasta or *.fa) 
    • Database file must ends with ".fasta" or ".fa".
  • -tda 0/1/2
    • If 0, only "DatabaseFile" will be indexed.
    • If 1, a new database file (*.revConcat.fasta) will be generated by appending reversed proteins. This forward-reverse database will be indexed.
    • If 2, both the original database and the forward-reverse database file will be indexed.

BuildSA creates a suffix array of the protein database. For a input database file DBFileName.fasta, BuildSA will generate 4 auxiliary files (DbFileName.canno, DBFileName.cnlcp, DBFileName.csarr, DBFileName.cseq). It needs to be executed only once per each database file.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Nov 07, 2012

    Anonymous says:

    Hi, I've been running MS-GF+ with -tda 1 and -tda 0. I notice with ...

    Hi, I've been running MS-GF+ with -tda 1 and -tda 0.

    I notice with -tda 0 that there is no FDR (EFDR) or QValue, even though the EFDR can be estimated from the P-Values without using the database. Being able to then filter results based on the EFDR would be useful, especially to compare between FDRs estimated from the target-decoy approach and calculated from P-values alone. Would you be able to add this functionality? Thanks. 

  2. Jan 17, 2013

    Anonymous says:

    Hello - I'm running MS-GF+ on an .mzML input file that specifies multiple p...

    Hello -

    I'm running MS-GF+ on an .mzML input file that specifies multiple possible charge states for each spectrum.  MS-GF+ seems to be ignoring the .mzML charge states, and is just using its own default charge state range of [+2, +3].  Is there a way to force it to use the charge states specified in the .mzML file?

    Please cc: responses to peaklist@u.washington.edu.  Thanks.

    Jeff Howbert

  3. Feb 26, 2013

    Anonymous says:

    Does anybody know if the CID or HCD scoring model (-m 1 or -m 3) is more app...

    Does anybody know if the CID or HCD scoring model (-m 1 or -m 3) is more appropriate for qTOF spectra?  Thanks.

  4. Mar 06, 2013

    Anonymous says:

    Use -m 1 (CID) and -inst 2 (TOF).

    Use -m 1 (CID) and -inst 2 (TOF).

  5. Mar 10, 2013

    Anonymous says:

    MCP requires inclusion of fragment ion tolerance.  What fragment-ion tolera...

    MCP requires inclusion of fragment ion tolerance.  What fragment-ion tolerance is used by MS-GF+?

  6. Mar 11, 2013

    Anonymous says:

    "If the identifier is 4, MS/MS spectra from the same precursor ion (e.g. CID/ET...

    "If the identifier is 4, MS/MS spectra from the same precursor ion (e.g. CID/ETD pairs, CID/HCD/ETD triplets) will be merged and the "merged" spectrum will be used for searching instead of individual spectra."

    Is there a bug in this option? Upon running MSGF+ I get

    [Error] Invalid value for parameter -m: 4 (must be in the range [0,4))

    cheers

  7. Mar 11, 2013

    Anonymous says:

    Hi person above me, MS-GF+ doesn't yet support merged spectral searching.&n...

    Hi person above me,

    MS-GF+ doesn't yet support merged spectral searching.  Sangtae is working on implementing that feature.  You can use MS-GFDB in the meantime.

    Best wishes

  8. Mar 12, 2013

    Anonymous says:

    Ok, I'll await the feature. Thanks for the info!

    Ok, I'll await the feature. Thanks for the info!

  9. May 09, 2013

    Anonymous says:

    Hi, I have been using MSGF+ for a while and thanks for writing this up. But ...

    Hi, I have been using MSGF+ for a while and thanks for writing this up. But I can't find the the MSGF+ webpage anymore. Did you remove that site? Are you going to keep updating this software? Thanks.

  10. May 13, 2013

    Jeremy Carver says:

    Thanks for letting us know about your problems finding the MS-GF+ web page. The...

    Thanks for letting us know about your problems finding the MS-GF+ web page. There was a small server maintenance issue that caused that page to briefly disappear. However, it should be back up now - http://proteomics.ucsd.edu/Software/MSGFPlus.html. Please post here if you have any further problems.

  11. May 31, 2013

    Anonymous says:

    Hi, I am new to MS-GF+. Does it handles SILAC labeled PO4 data?  Thanks...

    Hi, I am new to MS-GF+. Does it handles SILAC labeled PO4 data? 

    Thanks!

  12. Jul 16, 2013

    Anonymous says:

    Hi, I was wondering if MS-GF+ has or will eventually feature the ability to ...

    Hi, I was wondering if MS-GF+ has or will eventually feature the ability to have unrestricted searches like Inspect does using MS-Alignment to detect many more PTMs?

    Thanks.

  13. Aug 01, 2013

    Anonymous says:

    Any way to speed up the searches? It's taking several hours to search about 30K ...

    Any way to speed up the searches? It's taking several hours to search about 30K Q-Exactive spectra against a database with about 60K sequences in it. 

    Thanks!

  14. Aug 15, 2013

    Anonymous says:

    Is there an option to set number of miscleavage allowed? how to set Fragment ion...

    Is there an option to set number of miscleavage allowed? how to set Fragment ion mass tolerance?

  15. Aug 15, 2013

    Anonymous says:

    Is there an option to set number of miscleavage allowed? how to set Fragment ion...

    Is there an option to set number of miscleavage allowed? how to set Fragment ion mass tolerance?

  16. Oct 29, 2013

    Anonymous says:

    thank's for post.. and we can comment on your page.. keep smile!.. :) Layan...

    thank's for post.. and we can comment on your page.. keep smile!.. :)

    Layanan Mandiri Voucher Diskon BakulVoucher - Angetan Klub Diskon

    Angetan Angetan Home Voucher Diskon Langsung Peta Majalah Reseller Angetan Konfirmasi Pembelian Register Petunjuk Member Pendaftaran Merchant Keluhan BakulVoucher Cara Kerja Terlaris Semua Voucher Angetan On BakulVoucher

    "Situs yang memberikan pelayanan jasa. Kami memberikan kemudahan untuk Member dan Merchant. Angetan.com merupakan sebuah portal penyedia informasi mengenai diskon besar-besaran dan Klub Diskon."

    "Media Online Pemasangan dan Pengambilan Voucher Diskon tanpa perantara. Penyedia memasang Voucher Diskon di Web. Pengguna mengambil Voucher Diskon di Web, menerima kode Voucher,menggunakan dan membayar langsung di lokasi Penyedia Voucher."

  17. Nov 28, 2013

    Anonymous says:

    is there anyone who can tell me what is fragment-ion tolerance is used by MS-GF&...

    is there anyone who can tell me what is fragment-ion tolerance is used by MS-GF+

    Watch Movies

  18. Dec 04, 2013

    Anonymous says:

    Anyway to set the minimum number of ions? Currently MSGF is ignoring MSMS spectr...

    Anyway to set the minimum number of ions? Currently MSGF is ignoring MSMS spectra with less than 20 peaks. thanks!

  19. Feb 18, 2014

    Anonymous says:

    Hi Will MS-GF+ be updated at a later date to run on a cluster with differen...

    Hi

    Will MS-GF+ be updated at a later date to run on a cluster with different schedulers? I was informed that it only works with DRMAA. Unfortunately I only have PBS Pro and Slurm installed on the machines I have access to and I'm unable to change that. Is there a possible work around?

    The mass spectra and databases I'm working with are very large.

    Please contact me on bchapman@ccg.murdoch.edu.au .

    Thanks.

    Brett

  20. Feb 20, 2014

    Anonymous says:

    Brett, I don't know about running MSGF+ with other than DRMAA, but a slurm-...

    Brett,

    I don't know about running MSGF+ with other than DRMAA, but a slurm-drmaa plugin exists (if installing that is possible for you). http://apps.man.poznan.pl/trac/slurm-drmaa

    cheers, jorrit

  21. Mar 12, 2014

    Anonymous says:

    Hello, This has been asked a couple of times but I have not yet seen an answer....

    Hello,

    This has been asked a couple of times but I have not yet seen an answer.  What mass tolerance is MS-GF+ using for fragment ions?  I am specifically concerned with a high-low intrument (running with instrument ID 0).

    Thanks,

    Joel