cascade-search
Usage:
crux cascade-search [options] <tide spectra file>+ <database-series>
Description:
Cascade search is a general procedure for incorporating information about peptide groups into the database search and confidence estimation procedure. Peptides may be grouped according to, for example, their enzymatic properties (zero, one, or two enzymatic termini) or the presence of different types of numbers of variable modifications. The algorithm works on a series of databases, each corresponding to a different peptide group. The database is searched in series, and after each search, any spectrum that is identified with a user-specified confidence threshold is sequestered from subsequent searches. The full cascade search procedure is described in this article:
Attila Kertesz-Farkas, Uri Keich and William Stafford Noble. "Tandem mass spectrum identification via cascaded search." Journal of Proteome Research. 14(8):3027-38, 2015.
Input:
tide spectra file+
– The name of one or more files from which to parse the fragmentation spectra, in any of the file formats supported by ProteoWizard. Alternatively, the argument may be one or more binary spectrum files produced by a previous run of crux tide-search using the store-spectra parameter.database-series
– Specify series of databases, each generated by tide-index and separated by comma. Cascade Search will search a set of input spectra against these index files in the given order iteratively.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
cascade-search.target.txt
– a tab-delimited text file containing the target PSMs accepted at a pre-defined fdr.cascade-search.log.txt
– a log file containing a copy of all messages that were printed to stderr.cascade-search.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.
Options:
-
cascade-search options
--q-value-threshold <float>
– The q-value threshold used by cascade search. Spectra identified with q-value less than this threshold in one search will be excluded from all subsequent searches. Default =0.01
.--estimation-method mix-max|tdc|peptide-level
– Specify the method used to estimate q-values: the mix-max procedure or target-decoy competition. peptide-level is applied for spectrum-centric search. Eliminates any PSMS for which there exists a better scoring PSM involving the same peptide. Default =tdc
.--score <string>
– Specify the column (for tab-delimited input) or tag (for XML input) used as input to the q-value estimation procedure. If this parameter is unspecified, then assign-confidence tries to seach for "xcorr score", "evalue" (comet), "exact p-value" score fields in this order in the input file. Default =<empty>
.--sidak T|F
– Adjust the score using the Sidak adjustment and reports them in a new column in the output file. Note that this adjustment only makes sense if the given scores are p-values, and that it requires the presence of the "distinct matches/spectrum" feature for each PSM. Default =false
.--combine-charge-states T|F
– Specify this parameter to T in order to combine charge states with peptide sequencesin peptide-centric search. Works only if peptide-level=T. Default =false
.--combine-modified-peptides T|F
– Specify this parameter to T in order to treat peptides carrying different or no modifications as being the same. Works only if peptide-level=T. Default =false
.--file-column T|F
– Include the file column in tab-delimited output. Default =true
.
-
Search parameters
--precursor-window <float>
– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window' of the spectrum value. The precursor window units depend upon precursor-window-type. Default =3
.--precursor-window-type mass|mz|ppm
– Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by the precursor-window option, and candidate peptides must fall within this window. For the mass window-type, the spectrum precursor m+h value is converted to mass, and the window is defined as that mass +/- precursor-window. If the m+h value is not available, then the mass is calculated from the precursor m/z and provided charge. The peptide mass is computed as the sum of the average amino acid masses plus 18 Da for the terminal OH group. The mz window-type calculates the window as spectrum precursor m/z +/- precursor-window and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm) window-type, the spectrum mass is calculated as in the mass type. The lower bound of the mass window is then defined as the spectrum mass / (1.0 + (precursor-window / 1000000)) and the upper bound is defined as spectrum mass / (1.0 - (precursor-window / 1000000)). Default =mass
.--spectrum-min-mz <float>
– The lowest spectrum m/z to search in the ms2 file. Default =0
.--spectrum-max-mz <float>
– The highest spectrum m/z to search in the ms2 file. Default =1e+09
.--min-peaks <integer>
– The minimum number of peaks a spectrum must have for it to be searched. Default =20
.--spectrum-charge 1|2|3|all
– The spectrum charges to search. With 'all' every spectrum will be searched and spectra with multiple charge states will be searched once at each charge state. With 1, 2, or 3 only spectra with that charge state will be searched. Default =all
.--scan-number <string>
– A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default =<empty>
.--compute-sp T|F
– Compute the preliminary score Sp for all candidate peptides. Report this score in the output, along with the corresponding rank, the number of matched ions and the total number of ions. This option is recommended if results are to be analyzed by Percolator or Barista. If sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Note that the Sp computation requires re-processing each observed spectrum, so turning on this switch involves significant computational overhead. Default =false
.--remove-precursor-peak T|F
– If true, all peaks around the precursor m/z will be removed, within a range specified by the --remove-precursor-tolerance option. Default =false
.--remove-precursor-tolerance <float>
– This parameter specifies the tolerance (in Th) around each precursor m/z that is removed when the --remove-precursor-peak option is invoked. Default =1.5
.--exact-p-value T|F
– Enable the calculation of exact p-values for the XCorr score as described in this article. Calculation of p-values increases the running time but increases the number of identifications at a fixed confidence threshold. The p-values will be reported in a new column with header "exact p-value", and the "xcorr score" column will be replaced with a "refactored xcorr" column. Note that, currently, p-values can only be computed when the mz-bin-width parameter is set to its default value. Variable and static mods are allowed on non-terminal residues in conjunction with p-value computation, but currently only static mods are allowed on the N-terminus, and no mods on the C-terminus. Default =false
.--use-neutral-loss-peaks T|F
– Controls whether neutral loss ions are considered in the search. Two types of neutral losses are included and are applied only to singly charged b- and y-ions: loss of ammonia (NH3, 17.0086343 Da) and H2O (18.0091422). Each neutral loss peak has intensity 1/5 of the primary peak. Default =true
.--use-flanking-peaks T|F
– Include flanking peaks around singly charged b and y theoretical ions. Each flanking peak occurs in the adjacent m/z bin and has half the intensity of the primary peak. Default =false
.--mz-bin-width <float>
– Before calculation of the XCorr score, the m/z axes of the observed and theoretical spectra are discretized. This parameter specifies the size of each bin. The exact formula for computing the discretized m/z value is floor((x/mz-bin-width) + 1.0 - mz-bin-offset), where x is the observed m/z value. For low resolution ion trap ms/ms data 1.0005079 and for high resolution ms/ms 0.02 is recommended. Default =1.0005079
.--mz-bin-offset <float>
– In the discretization of the m/z axes of the observed and theoretical spectra, this parameter specifies the location of the left edge of the first bin, relative to mass = 0 (i.e., mz-bin-offset = 0.xx means the left edge of the first bin will be located at +0.xx Da). Default =0.4
.--max-precursor-charge <integer>
– The maximum charge state of a spectra to consider in search. Default =5
.--peptide-centric-search T|F
– Carries out a peptide-centric search. For each peptide the top-scoring spectra are reported, in contrast to the standard spectrum-centric search where the top-scoring peptides are reported. Note that in this case the "xcorr rank" column will contain the rank of the given spectrum with respect to the given candidate peptide, rather than vice versa (which is the default). Default =false
.
-
CPU threads
--num-threads <integer>
– 0=poll CPU to set num threads; else specify num threads directly. Default =0
.
-
Input and output
--decoy-prefix <string>
– Specifies the prefix of the protein names that indicate a decoy. Default =decoy_
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--list-of-files T|F
– Specify that the search results are provided as lists of files, rather than as individual files. Default =false
.--fileroot <string>
– The fileroot string will be added as a prefix to all output file names. Default =<empty>
.--store-spectra <string>
– Specify the name of the file where the binarized fragmentation spectra will be stored. Subsequent runs of crux tide-search will execute more quickly if provided with the spectra in binary format. The filename is specified relative to the current working directory, not the Crux output directory (as specified by --output-dir). This option is not valid if multiple input spectrum files are given. Default =<empty>
.--store-index <string>
– When providing a FASTA file as the index, the generated binary index will be stored at the given path. This option has no effect if a binary index is provided as the index. Default =<empty>
.--concat T|F
– When set to T, target and decoy search results are reported in a single file, and only the top-scoring N matches (as specified via --top-match) are reported for each spectrum, irrespective of whether the matches involve target or decoy peptides. Default =false
.--print-search-progress <integer>
– Show search progress by printing every n spectra searched. Set to 0 to show no search progress. Default =1000
.--spectrum-parser pwiz|mstoolkit
– Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser can read the MS/MS file formats listed here. The alternative is MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default =pwiz
.--use-z-line T|F
– Specify whether, when parsing an MS2 spectrum file, Crux obtains the precursor mass information from the "S" line or the "Z" line. Default =true
.--txt-output T|F
– Output a tab-delimited results file to the output directory. Default =true
.--sqt-output T|F
– Outputs an SQT results file to the output directory. Note that if sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Default =false
.--pepxml-output T|F
– Output a pepXML results file to the output directory. Default =false
.--mzid-output T|F
– Output an mzIdentML results file to the output directory. Default =false
.--pin-output T|F
– Output a Percolator input (PIN) file to the output directory. Default =false
.