Notes on ProteoSAFe extensions based on OpenMS functionality
Execution parameters per OpenMS module and recommended values
Tweaking these set of FeatureFinder parameters gave me the highest number of features for labeled and unlabeled data sets. I have tried different combinations of parameters (some of which are not shown below since tweaking them did not make much of a difference, and in fact ended up getting lesser number of features than expected and so left them as default), and have found that the below parameters made a huge difference in getting number of features, and based on that I have recommended values.
|m/z_tolerance||0.03||0.09||Def: Tolerated m/z deviation of peaks belonging to the same mass trace. This number should be below 1/charge_high, in our case it is 1/8 = 0.125, and 0.09 allows for more room to find features (made a difference of few 100 features from default value of 0.03)|
|min_spectra||10||3||Def: Number of spectra that have to show a similar peak mass in a mass trace. Here, we have decreased the minimum requirement to find similar peak masses to be considered into a feature. 10 is very restrictive. This increased number of features by few 1000s from default 10.|
|slope_bound||0.1||0.3||See comment # 1 below.|
|charge_high||4||8|| Def: Highest charge to search for.
|min_score||0.8||0|| Def: Minimum seed score a peak has to reach to be used as a seed.
|min_score||0.7||0||Def: Feature score threshold for a feature to be reported.|
|min_isotopic_fit||0.8||0.5|| Def: Minimum isotope fit of the feature before model fitting.
|rt_shape||symmetric||symmetric||See comment # 2 below.|
comment # 1:
This is the value I got from Stephan Aiche, and according to him keeping this value lower is better (I was using a much higher number, which didn't make much of a difference in number of features from default). He explains why some of the features were missing when I had set this parameter to very high:
The slope_bound parameter was set too high. If you put it down to a value of 0.3 the feature will be found correctly. Let me (try to) explain this further. When the FeatureFinder algorithm tries to collect peaks for a mass trace it starts from the maximal peak and the collected intensities should only go down, since we are moving from the apex of the elution profile downwards. So the average slope should always be negative. In some cases the elution profiles are distorted therefore you can allow also a slight increase in intensity (e.g., 0.3). But if you choose a value of 2 (like in the attached ini file) the FeatureFinder will go beyond the boundaries of the elution profile and collect also peaks that do not belong to the feature. This causes further problems.
1. the shape of the elution profile is not gaussian (or asymmetric gaussian) any more
2. the elution profiles of the other isotopic peaks are in relation to the first too small (since you will not find such a long profile for all mass traces)
Both problems cause a bad fit and with it will hinder a correct identification of the feature.
comment # 2:
rt_shape has two options to choose from: symmetric and asymmetric. Intuitively it makes sense to use asymmetric shape to gather all possible features. But in fact, using symmetric gives 5-15% more features than using asymmetric. For example, for SILAC dataset 2to1.mzXML, keeping all the recommended values except changing rt_shape to asymmetric gave me 32,376 features, which is ~4000 less than what I got with using symmetric. (I think this was worth mentioning and that's why I have it listed above.)
If the MS1 peaks, upon visualizing, are very close together (high resolution data may have this) then it is recommended that OpenMS's PeakPicker is used to reduce size of dataset. And then run the FeatureFinder, otherwise FeatureFinder does not work as desired and takes very long time. (ProteoSAFe's extension MSAssessment, already includes the PeakPicking).
If debug is set to True then you will get all the intermediate files (in a newly created folder debug/) - such as seed files, which is useful for debugging as to see which certain features were not recognized.
* "SILACFeatureFinder" \(?)
* OpenMS v X.YY
* "Binal reports" (name?) v X.YY