search-for-xlinks
Usage:
crux search-for-xlinks [options] <ms2 file> <protein fasta file> <link sites> <link mass>
Description:
This command compares a set of spectra to cross-linked peptides derived from a protein database in FASTA format. For each spectrum, the program generates a list of candidate molecules, including linear peptides, dead-end products, self-loop products and cross-linked products, with masses that lie within a specified range of the spectrum's precursor mass. These candidate molecules are ranked using XCorr, and the XCorr scores are assigned statistical confidence estimates using an empirical curve fitting procedure.
The algorithm is described in more detail in the following article:
Sean McIlwain, Paul Draghicescu, Pragya Singh, David R. Goodlett and William Stafford Noble. "Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs." Journal of Proteome Research. 2010.
Input:
ms2 file
– File containing spectra to be searched.protein fasta file
– The name of the file in FASTA format from which to retrieve proteins.link sites
– A comma delimited list of the amino acids to allow cross-links with. For example, "A:K,A:D" means that the cross linker can attach A to K or A to D. Cross-links involving the N-terminus of a protein can be specified as a link site by using "nterm". For example, "nterm:K" means that a cross-link can attach a protein's N-terminus to a lysine.link mass
– The mass modification of the linker when attached to a peptide.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
search-for-xlinks.target.txt
– a tab-delimited text file containing the peptide-spectrum matches (PSMs). See the txt file format for a list of the fields.search-for-xlinks.decoy.txt
– a tab-delimited text file containing the decoy PSMs. See the txt file format for a list of the fields.search-for-xlinks.qvalues.txt
– a tab-delimited text file containing the top ranked PSMs with calculated q-values. See the txt file format for a list of the fields.search-for-xlinks.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.search-for-xlinks.log.txt
– a log file containing a copy of all messages that were printed to stderr.
Options:
-
search-for-xlinks options
--cmod <string>
– Specify a variable modification to apply to C-terminus of peptides.: . Note that this parameter only takes effect when specified in the parameter file. Default = NO MODS
.--nmod <string>
– Specify a variable modification to apply to N-terminus of peptides.: Note that this parameter only takes effect when specified in the parameter file. Default = NO MODS
.
-
Peptide properties
--min-mass <float>
– The minimum mass (in Da) of peptides to consider. Default =200
.--max-mass <float>
– The maximum mass (in Da) of peptides to consider. Default =7200
.--min-length <integer>
– The minimum length of peptides to consider. Default =6
.--max-length <integer>
– The maximum length of peptides to consider. Default =50
.--isotopic-mass average|mono
– Specify the type of isotopic masses to use when calculating the peptide mass. Default =mono
.
-
Amino acid modifications
--mod <string>
– Consider modifications on any amino acid in aa list with at most max-per-peptide in one peptide. The parameter takes the form <mass change>:<aa list>:<max per peptide>:<prevents cleavage>:<prevents cross-link>. This parameter may be included with different values multiple times so long as the total number of mod, cmod, and nmod parameters does not exceed 11. The "prevents cleavage" and "prevents cross-link" arguments are optional T/F arguments for describing whether the modification prevents enzymatic cleavage of cross-linking, respectively. This option is only available when use-old-xlink=F. Note that this parameter only takes effect when specified in the parameter file. Default =NO MODS
.--max-mods <integer>
– The maximum number of modifications that can be applied to a single peptide. Default =255
.--A <float>
– Change the mass of all amino acids 'A' by the given amount. Default =0
.--C <float>
– Change the mass of all amino acids 'C' by the given amount. Default =57.021464
.--D <float>
– Change the mass of all amino acids 'D' by the given amount. Default =0
.--E <float>
– Change the mass of all amino acids 'E' by the given amount. Default =0
.--F <float>
– Change the mass of all amino acids 'F' by the given amount. Default =0
.--G <float>
– Change the mass of all amino acids 'G' by the given amount. Default =0
.--H <float>
– Change the mass of all amino acids 'H' by the given amount. Default =0
.--I <float>
– Change the mass of all amino acids 'I' by the given amount. Default =0
.--K <float>
– Change the mass of all amino acids 'K' by the given amount. Default =0
.--L <float>
– Change the mass of all amino acids 'L' by the given amount. Default =0
.--M <float>
– Change the mass of all amino acids 'M' by the given amount. Default =0
.--N <float>
– Change the mass of all amino acids 'N' by the given amount. Default =0
.--P <float>
– Change the mass of all amino acids 'P' by the given amount. Default =0
.--Q <float>
– Change the mass of all amino acids 'Q' by the given amount. Default =0
.--R <float>
– Change the mass of all amino acids 'R' by the given amount. Default =0
.--S <float>
– Change the mass of all amino acids 'S' by the given amount. Default =0
.--T <float>
– Change the mass of all amino acids 'T' by the given amount. Default =0
.--V <float>
– Change the mass of all amino acids 'V' by the given amount. Default =0
.--W <float>
– Change the mass of all amino acids 'W' by the given amount. Default =0
.--Y <float>
– Change the mass of all amino acids 'Y' by the given amount. Default =0
.
-
Decoy database generation
--seed <string>
– When given a unsigned integer value seeds the random number generator with that value. When given the string "time" seeds the random number generator with the system time. Default =1
.
-
Enzymatic digestion
--enzyme no-enzyme|trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|iodosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|custom-enzyme
– Specify the enzyme used to digest the proteins in silico. Available enzymes (with the corresponding digestion rules indicated in parentheses) include no-enzyme ([X]|[X]), trypsin ([RK]|{P}), trypsin/p ([RK]|[]), chymotrypsin ([FWYL]|{P}), elastase ([ALIV]|{P}), clostripain ([R]|[]), cyanogen-bromide ([M]|[]), iodosobenzoate ([W]|[]), proline-endopeptidase ([P]|[]), staph-protease ([E]|[]), asp-n ([]|[D]), lys-c ([K]|{P}), lys-n ([]|[K]), arg-c ([R]|{P}), glu-c ([DE]|{P}), pepsin-a ([FL]|{P}), elastase-trypsin-chymotrypsin ([ALIVKRWFY]|{P}). Specifying --enzyme no-enzyme yields a non-enzymatic digest. Warning: the resulting index may be quite large. Default =trypsin
.--custom-enzyme <string>
– Specify rules for in silico digestion of protein sequences. Overrides the enzyme option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as [RK]|{P}. AspN cuts after any residue but only before D which is represented as [X]|[D]. Default =<empty>
.--digestion full-digest|partial-digest|non-specific-digest
– Specify whether every peptide in the database must have two enzymatic termini (full-digest) or if peptides with only one enzymatic terminus are also included (partial-digest). Default =full-digest
.--missed-cleavages <integer>
– Maximum number of missed cleavages per peptide to allow in enzymatic digestion. Default =0
.
-
Search parameters
--spectrum-min-mz <float>
– The lowest spectrum m/z to search in the ms2 file. Default =0
.--spectrum-max-mz <float>
– The highest spectrum m/z to search in the ms2 file. Default =1e+09
.--spectrum-charge 1|2|3|all
– The spectrum charges to search. With 'all' every spectrum will be searched and spectra with multiple charge states will be searched once at each charge state. With 1, 2, or 3 only spectra with that charge state will be searched. Default =all
.--compute-sp T|F
– Compute the preliminary score Sp for all candidate peptides. Report this score in the output, along with the corresponding rank, the number of matched ions and the total number of ions. This option is recommended if results are to be analyzed by Percolator or Barista. If sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Note that the Sp computation requires re-processing each observed spectrum, so turning on this switch involves significant computational overhead. Default =false
.--precursor-window <float>
– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window' of the spectrum value. The precursor window units depend upon precursor-window-type. Default =3
.--precursor-window-type mass|mz|ppm
– Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by the precursor-window option, and candidate peptides must fall within this window. For the mass window-type, the spectrum precursor m+h value is converted to mass, and the window is defined as that mass +/- precursor-window. If the m+h value is not available, then the mass is calculated from the precursor m/z and provided charge. The peptide mass is computed as the sum of the average amino acid masses plus 18 Da for the terminal OH group. The mz window-type calculates the window as spectrum precursor m/z +/- precursor-window and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm) window-type, the spectrum mass is calculated as in the mass type. The lower bound of the mass window is then defined as the spectrum mass / (1.0 + (precursor-window / 1000000)) and the upper bound is defined as spectrum mass / (1.0 - (precursor-window / 1000000)). Default =mass
.--precursor-window-weibull <float>
– Search decoy peptides within +/- precursor-window-weibull of the precursor mass. The resulting scores are used only for fitting the Weibull distribution Default =20
.--precursor-window-type-weibull mass|mz|ppm
– Window type to use in conjunction with the precursor-window-weibull parameter. Default =mass
.--min-weibull-points <integer>
– Keep shuffling and collecting XCorr scores until the minimum number of points for weibull fitting (using targets and decoys) is achieved. Default =4000
.--max-ion-charge <string>
– Predict theoretical ions up to max charge state (1, 2, ... ,6) or up to the charge state of the peptide ("peptide"). If the max-ion-charge is greater than the charge state of the peptide, then the maximum is the peptide charge. Default =peptide
.--scan-number <string>
– A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default =<empty>
.--mz-bin-width <float>
– Before calculation of the XCorr score, the m/z axes of the observed and theoretical spectra are discretized. This parameter specifies the size of each bin. The exact formula for computing the discretized m/z value is floor((x/mz-bin-width) + 1.0 - mz-bin-offset), where x is the observed m/z value. For low resolution ion trap ms/ms data 1.0005079 and for high resolution ms/ms 0.02 is recommended. Default =1.0005079
.--mz-bin-offset <float>
– In the discretization of the m/z axes of the observed and theoretical spectra, this parameter specifies the location of the left edge of the first bin, relative to mass = 0 (i.e., mz-bin-offset = 0.xx means the left edge of the first bin will be located at +0.xx Da). Default =0.4
.--mod-mass-format mod-only|total|separate
– Specify how sequence modifications are reported in various output files. Each modification is reported as a number enclosed in square braces following the modified residue; however, the number may correspond to one of three different masses: (1) 'mod-only' reports the value of the mass shift induced by the modification; (2) 'total' reports the mass of the residue with the modification (residue mass plus modification mass); (3) 'separate' is the same as 'mod-only', but multiple modifications to a single amino acid are reported as a comma-separated list of values. For example, suppose amino acid D has an unmodified mass of 115 as well as two moifications of masses +14 and +2. In this case, the amino acid would be reported as D[16] with 'mod-only', D[131] with 'total', and D[14,2] with 'separate'. Default =mod-only
.--use-flanking-peaks T|F
– Include flanking peaks around singly charged b and y theoretical ions. Each flanking peak occurs in the adjacent m/z bin and has half the intensity of the primary peak. Default =false
.--fragment-mass average|mono
– Specify which isotopes to use in calculating fragment ion mass. Default =mono
.--isotope-windows <string>
– Provides a list of isotopic windows to search. For example, -1,0,1 will search in three disjoint windows: (1) precursor_mass - neutron_mass +/- window, (2) precursor_mass +/- window, and (3) precursor_mass + neutron_mass +/- window. The window size is defined from the precursor-window and precursor-window-type parameters. This option is only available when use-old-xlink=F. Default =0
.--compute-p-values T|F
– Estimate the parameters of the score distribution for each spectrum by fitting to a Weibull distribution, and compute a p-value for each xlink product. This option is only available when use-old-xlink=F. Default =false
.
-
Fragment ion parameters
--use-a-ions T|F
– Consider a-ions in the search? Note that an a-ion is equivalent to a neutral loss of CO from the b-ion. Peak height is 10 (in arbitrary units). Default =false
.--use-b-ions T|F
– Consider b-ions in the search? Peak height is 50 (in arbitrary units). Default =true
.--use-c-ions T|F
– Consider c-ions in the search? Peak height is 50 (in arbitrary units). Default =false
.--use-x-ions T|F
– Consider x-ions in the search? Peak height is 10 (in arbitrary units). Default =false
.--use-y-ions T|F
– Consider y-ions in the search? Peak height is 50 (in arbitrary units). Default =true
.--use-z-ions T|F
– Consider z-ions in the search? Peak height is 50 (in arbitrary units). Default =false
.
-
Cross-linking parameters
--use-old-xlink T|F
– Use the old version of xlink-searching algorithm. When false, a new version of the code is run. The new version supports variable modifications and can handle more complex databases. This new code is still in development and should be considered a beta release. Default =true
.--xlink-include-linears T|F
– Include linear peptides in the search. Default =true
.--xlink-include-deadends T|F
– Include dead-end peptides in the search. Default =true
.--xlink-include-selfloops T|F
– Include self-loop peptides in the search. Default =true
.--xlink-include-inter T|F
– Include inter-protein cross-link candidates within the search. Default =true
.--xlink-include-intra T|F
– Include intra-protein cross-link candiates within the search. Default =true
.--xlink-include-inter-intra T|F
– Include crosslink candidates that are both inter and intra. Default =true
.--xlink-prevents-cleavage <string>
– List of amino acids for which the cross-linker can prevent cleavage. This option is only available when use-old-xlink=F. Default =K
.--max-xlink-mods <integer>
– Specify the maximum number of modifications allowed on a crosslinked peptide. This option is only available when use-old-xlink=F. Default =0
.
-
Input and output
--spectrum-parser pwiz|mstoolkit
– Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser can read the MS/MS file formats listed here. The alternative is MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default =pwiz
.--use-z-line T|F
– Specify whether, when parsing an MS2 spectrum file, Crux obtains the precursor mass information from the "S" line or the "Z" line. Default =true
.--top-match <integer>
– Specify the number of matches to report for each spectrum. Default =5
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.