Release notes for Crux

Release notes for Crux

Changes since the last major release

Major changes

07 Sept 2016: Added isotope-error parameter to Tide.

30 Aug 2016: Updated Percolator to version 3.00.

Minor changes

8 Aug 2016: Fixed incorrect default value for the cpos parameter to percolator.

Version 3.0
August 1, 2016
Major changes

The tide-search command now supports threading. See the num-threads parameter for details.

Added the pipeline command to run a series of commands, and the cascade-search to run searches across a series of databases.
Renamed the command calibrate-scores to assign-confidence and completely revamped its functionality. The search-for-xlinks command offers parameters to control categories of candidates: xlink-include-inter, xlink-include-intra, and xlink-include-inter-intra. Added the peptide-centric-search option to tide-search. Compiling the Mac OS version now requires OS X Yosemite. It can be built from source using the Clang compilers distributed with the latest version of XCode. 32-bit and 64-bit Windows versions are now available, built using Visual Studio 2013. Updated Percolator to version 2.10. Updated Hardklor to version 2.30. Updated Comet to version 2016.01 rev. 1.
Minor changes The Windows version of Crux can be built with support for vendor specific file formats disabled. Added the temp-dir option to tide-index. The utilities create-docs, subtract-index, and psm-convert were added to Crux. The generate-peptides application has been revamped. Fixed a bug in barista that occurred when the number of peptides exceeded the number of PSMs. Precursor mass selection is fixed in tide-search when units are m/z. hardklor and bullseye now use standard logging. Version number added to log files. Additionally, the version numbers of percolator, comet, and boost are shown in the output of the version command. Short usage message is displayed on error (the full usage message is still displayed when a command is entered with no arguments). feature-in-file parameter added to percolator. The seed parameter for percolator has been renamed to percolator-seed. comet exits with return code 1 on failure, rather than 0. Default value for use_sparse_matrix changed to 1. Default value for use-neutral-loss-peaks changed to true. xlink-score-method parameter added to xlink-score-spectrum. deltaLCn column added to tide-search tab-delimited output. Bug fixed in reporting flanking amino acids in PostProcessProtein.cpp. Mix-max procedure is updated to handle ties among target and decoy scores. Default value of use-neutral-loss-peaks changed to true Exact p-value calculation takes into account the flanking and neutral loss peaks. A bug was fixed in the naming of the Percolator input (.pin) file that is produced by Comet when the output_percolatorfile option is turned on. Previously, the file was named "comet.tsv"; now it is "comet.target.pin." Assign-confidence can take multiple input files. The comet command can take multiple input files. The sidak option has been added to assign-confidence to perform a Sidak adjustment. The peptide-level option has been added to assign-confidence. The refactored XCorr score has been re-scaled (by dividing by 20) to put it into a range that is comparable to that of the original XCorr score. The default value of mz-bin-offset has been changed from 0.68 to 0.40. The percolator command outputs the pout XML format again (controlled by the pout-output parameter). The percolator command can output its native output by using the original-output parameter. Fixed a bug in tide-search causing it to fail when remove-precursor-peak removed all peaks in a spectrum. The max-precursor-charge parameter for tide-search has been introduced. Fixed a bug in candidate peptide selection when m/z tolerance is used. Added the read-tide-index command, which reads an index produced by tide-index and prints a list of peptides it contains. Added the stop-after option to print-processed-spectra, which controls the point at which to stop preprocessing. The tide-search command now accepts multiple spectrum input files. Added the use-z-line option to specify whether precursor information is taken from the S or Z line when parsing MS2 files using ProteoWizard. Added missing options to the percolator command. The tide-search command may now accept a FASTA database in place of the index, in which case tide-index will be run prior to the search. The store-index option allows the generated index to be saved. The assign-confidence command will automatically detect score type if the score parameter is not specified, searching for xcorr, e-value, or exact p-values. The cascade-search will report the source index database of the peptide for all identified PSMs. The top-match parameter has been removed from cascade-search. Assign-confidence has got new options: "combine-modified-peptide" and "combine-charge-states". Assign-confidence now supports top-match>1 in peptide-level filtering mode. Assign-confidence can use smoothed-p-value scores for PSM ranking. Assign-confidence now does not carry out target-decoy competition in peptide-level filtering mode. Added the xlink-assign-ions and xlink-score-spectrum commands. Rearranged estimation methods in assign-confidence. Added top-match option for estimation-method=peptide-level case in assign-confidence. Modified assign-confidence so that ties during target-decoy competition are broken randomly. Changed the Percolator output feature "file" to contain the name of the file containing the spectrum (if available), rather than the name of the file containing the PSM. Added to search-for-xlinks, xlink-assign-ions and xlink-score-spectrum parameters of the form "use-a-ion" for a, b, c, x, y, and z-ions. Various other minor bug/leak fixes and performance enhancements. Added the allow-dups option to tide-index. Version 2.1 October 8, 2014 Major enhancements The tide-search command now supports calculation of exact p-values via dynamic programming, as described in this article. The p-value calculation is controlled with the exact-p-value parameter. search-for-xlinks now supports variable modifications and larger protein databases. This new code is disabled by default, but can be turned on by setting the parameter use-old-xlink=F. Minor changes The default value of mz_bin_offset has been changed to 0.40 from 0.68. The centroided parameter for the hardklor command is now implemented. The spectrum-format parameter for the bullseye command works properly. The tide-search command now uses information from the Z line rather than the S line when reading spectrum files in the MS2 file format. The tide-search command now accepts the spectrum-parser parameter. The protein-database parameter to spectral-counts now only accepts fasta files. Parsing of a protein index is no longer supported. A bug was fixed in the spectral-counts command that prevented tide-search output files from being parsed correctly. The default-direction parameter now takes a string value of the feature name rather than an integer. The parameters mz-bin-width and mz-bin-offset have been added to tide-search. Percolator output now includes, for each PSM, the name of the file in which the spectrum resides. The parameters clip-nterm-methionine, keep-terminal-aminos, and min-mods have been added to tide-index. The parameters use-flanking-peaks and use-neutral-loss-peaks have been added to tide-search. The calculation of XCorr was simplified in Tide. In the new version, tide-index only generates peptides and sorts them, leaving the generation of theoretical peaks to be done entirely on the fly by tide-search. The tide-search command now prints search progress reports both to the screen and to the log file. The interval at which progress is printed can be controlled using the print-search-progress parameter. The calibrate-scores command now outputs files with the stem "calibrate-scores" rather than "qvalue." The use-flanking-peaks parameter now has a default value of false. The Comet search engine was updated to version 2014011. The MSToolkit parser was updated to the latest version (r73). The version command now outputs the revision number. Various improvements in performance and error handling were implemented. Version 2.0 June 6, 2014 Major enhancements Two new search engines are now included in Crux: Comet and Tide. The old search engine, search-for-matches, has been retired. Percolator has been updated to version 2.07. Crux now compiles in native Windows, rather than requiring Cygwin. Consequently, Crux running under Windows can parse vendor proprietary formats using the appropriate Proteowizard libraries. Minor changes The "enyzme" parameters "lysc", "lysn", "arg_c", "glue_c", and "pepsin_a" were renamed to "lys-c", "lys-n", "arg-c", "glue-c", and "pepsin-a". Percolator outputs decoy files in addition to target files. Percolator tab-delimited peptides output now contain a posterior error probability (PEP) column. The Percolator tab-delimited peptide output has been corrected to show the PSM with the best score, rather than the worst. The Percolator tab-delimited PSMs output "matches/spectrum" value has been corrected. A new utility command, make-pin, is provided to create input files for use by Percolator. The spectrum-min-mass and spectrum-max-mass options have been renamed to spectrum-min-mz and spectrum-max-mz, respectively. A spectrum-parser option was added to q-ranker and barista. A bug was fixed in q-ranker and barista when reading peptides with modifications. Fixed the q-ranker tab delimited output so that, for a given peptide, flanking amino acids are reported for each protein that contains that peptide, rather than only reporting one set of flanking amino acids. A bug was fixed in percolator, wherein the first and last two characters were being truncated from peptide sequences when the input was missing the flanking amino acids or when the charge state was not indicated in the PSM IDs. The crux spectrum parser has been removed. Q-ranker now outputs decoy files. The Boost.Random library is now being used for random number generation. calibrate-scores no longer requires a protein input. calibrate-scores now only takes the top match per spectrum and charge. get-ms2-spectrum now prints z-lines with Bullseye files. Various performance and error handling improvements. Version 1.40 May 22, 2013 Major enchancements Crux Percolator was updated from version 1.05 to the latest release, version 2.04. Minor changes The formula that is used to convert fragment m/z values from real numbers into integers was modified. Previously, the conversion was floor( (x / mz-bin-size) + 0.5 + mz-bin-offset ) The new conversion formula is floor( (x / mz-bin-size) + 1.0 - mz-bin-offset ) The default values of the two parameters (mz-bin-size and mz-bin-offset) have not changed. The allowed range of the mz-bin-offset parameter was previously [-1,1], but has been changed to [0,1]. To understand the motivation for the changed formula, note that the mz-bin-offset parameter controls the location of bin edges, relative to integer masses on the mass scale. In general, mz-bin-offset should be chosen so that bin edges fall between the expected clusters of fragment masses. For fragments with 1+ charge, masses will cluster near integer values (after dividing by mz-bin-width), and the ideal value of mz-bin-offset would be 0.5. For mixtures of fragments with 1+ and 2+ charges, masses will cluster near integer and half-integer values, so mz-bin-offset near 0.25 or 0.75 would produce bin edges that best avoid the clusters of fragment masses. The old conversion formula did not correctly locate bin edges, given a chosen mz-bin-offset. The new conversion formula produces a proper translation of mz-bin-offset into bin edge locations. The seed parameter is now available to control the seeding of random number generator. The default value is 1. seed is available as a parameter from all applications. It can be set as a command line option for crux search-for-matches and crux percolator. The default value of the decoys parameter for crux search-for-matches was incorrectly documented. The default is actually peptide-shuffle, but the documentation said the default was protein-shuffle. The documentation has been corrected. The incorrectly labeled column header xcorr rank in barista and q-ranker feature files has been corrected to xcorr score. The q-ranker output file qranker_output.xml has been renamed to qranker.xml. The deltaCn computation has been modified to be consistent with Percolator. Rather than using deltaCn_i = (xcorr₁ - xcorr_i+1) / xcorr₁ it is now computed using deltaCn_i = (xcorr_i - xcorr_i+1) / max(xcorr_i, 1.0) Percolator now accepts inputs in SQT, PepXML, and tab-delimited formats in addition to the PinXML format. Flags were added to several commands allowing the user to control whether various results files are created in the output directory: --sqt-output <T|F> (search-for-matches) --mzid-output <T|F> (search-for-matches, percolator) --pinxml-output <T|F> (search-for-matches) --pepxml-output <T|F> (search-for-matches, q-ranker, barista, percolator) --txt-output <T|F> (search-for-matches, q-ranker, barista, percolator) By default, all of these flags are set to false except txt-output. Version 1.39 October 6, 2012 Major enchancements Crux is now released under an Apache license for all users. Parsing MS/MS spectra is now supported using Proteowizard (v. 3.0.3950). This allows Crux to handle MS/MS spectra in mzML (1.0 and 1.1), mzXML, MGF, MS2 and CMS2 formats. Use spectrum-parser=pwiz to enable Proteowizard parsing. Crux now uses cmake for the build process rather than autoconf/automake. Minor changes The Tide database search software is no longer distributed as part of Crux because its license is not compatible with the new Crux license. Support for PIN XML output was added to crux search-for-matches. Fixed a bug in crux create-index on Windows and Cygin that caused the program to fail with the message: WARNING: Cannot rename directory FATAL: Failed to create index The parameter protein database is now an option rather than a required argument for crux spectral-counts. mzIdentML file support has been added for crux spectral-counts. PepXML file support has been fixed for crux spectral-counts . Fixed bug in packedNorm() that sometimes generated incorrect values for the posterior error probability. Fixed bug in crux create-index on Windows and Cygwin that caused it to fail on Windows and Cygin. Version 1.38 July 20, 2012 Major enhancements Hardklor and Bullseye, tools for analyzing high-resolution precursor spectra, are now incorporated into Crux. Barista has been modified extensively so that it more closely resembles the other tools in Crux: Barista and Q-ranker accept search results in tab-delimited format, in addition to SQT format. Barista now reports posterior error probabilities, in addition to q-values, for PSMs, peptides and proteins. Barista's pep.xml output has been updated to be compatible with the TransProteomic Pipeline (TPP). The sequest-search command has been removed. The functionality of this command is still available using options for the search-for-matches command (see the frequently asked questions list for details). Minor changes crux predict-peptide-ions calculates the mass shift resulting from the nh3 or h2o neutral loss correctly. Previously, a neutral loss resulted in an addition of mass rather than subtraction. crux predict-peptide-ions reports ion-type as 'b','y','a', or 'p' rather than a number (i.e. 0, 1, 2, 3). crux predict-peptide-ions now only allows one type of neutral loss modification per ion. Previously, multiple different types of neutral loss could be applied simultaneously to a single ion. The neutral-losses option was removed from crux predict-peptide-ions, due to its redundancy with the supported the --nh3 and --h2o options. crux predict-peptide-ions supports 'bya' in the --primary-ions parameter, which will generate the a-ion series in addition to the 'b' and 'y' ions. The spectrum parsing is updated so that peaks with zero intensity are ignored. crux spectral-counts now has an option to compute the raw spectral count. crux spectral-counts now has an option to compute dNSAF values. A bug in crux spectral-counts was fixed so that it now properly normalizes the NSAF value. Multiple missed cleavages are now handled by crux search-for-xlinks using the missed-cleavages parameter. Previously, this was unsupported. Ambiguous amino acids are handled differently. 'J' is interpreted to indicate 'I' or 'L'. Peptides containing B, Z, and X are no longer be allowed, and a warning is issued when these amino acid codes are encountered. The stand-alone crux-predict-peptide-ions and crux-generate-peptides applications have been integrated into the main crux application. A bug has been fixed that affected the intensity adjustment of observed spectra when the bin width was set much larger or smaller than 1. Beginning with version 1.37, q-ranker erroneously included decoy PSMs along with target PSMs in the q-ranker.target.psms.txt file. All PSMs were sorted by q-ranker score so the targets and decoys were mixed together with no accompanying labels. This bug has been fixed so that Q-ranker once again returns only target PSMs. The search option display-summed-masses has been replaced by mod-mass-format. With this option, there is a new format for reporting modificiations: the mass in the square braces is that of the preceeding amino acid plus the modification mass. This format is compliant with TPP pep.xml output. Modifications are reported differently in the pep.xml header to be compliant with the TPP. The command crux compute-q-values now reports posterior error probabilities in addition to q-values. Consequently, the command was renamed crux calibrate-scores. Crux now provides an explicit error message when an incorrectly formatted MS2 file is parsed. Previously, crux simply reported that no spectra were found. Several additional scores were added to the q-ranker pep.xml files. Fixed the build procedure to work on both Lion and pre-Lion versions of OS X. Fixed a bug in the normalization of the observed spectra. Previously, after the 10-bin normalization of intensities, peaks with heights below 5% of the maximum were not being removed. When a fragmentation spectra file is read, the assumed peptide charge state information can be missing for the scans. In cases where this information has not been provided, Crux will now estimate the possible charge states. The feature files for Q-ranker and Barista now contain a header describing each feature. Q-ranker and Barista now output all of the fields provided in the input search files. Version 1.37 December 22, 2011 Major enhancements Added crux barista, a tool for inferring protein identifications. Reimplemented crux q-ranker with new outputs and command-line syntax. Percolator and compute-q-values now compute and report posterior error probabilities, in addition to q-values. crux create-index now generates and store decoy peptides in the index. Decoys can still be generated on-the-fly when searching .fasta files. Minor changes A pair of options, cterm-fixed and nterm-fixed, allow fixed terminal peptide modifications, i.e., a mass shift applied to every peptide on either or both termini. A bug in the spectrum processing code was fixed, which was erroneously eliminating a few of the highest m/z fragment peaks. Xcorrs have changed slightly as a result. Simplified the --mz-bin-width and --mz-offset parameters so that --xcorr-var-bin is no longer needed. Indexes now store information about any static mods used when they were created. In the tab-delimited output files the 'matches/spectrum' column gives the number of target peptides compared to the spectrum and the decoy file has an additional column, 'decoy matches/spectrum' that gives the number of decoy peptides compared to the spectrum. These two numbers may differ when decoys are not generated on-the-fly. The header lines of Barista and QRanker include all of the columns in the search results. Version 1.36 August 4, 2011 Tide changes Added support for user-specifiable variable and static amino acid modifications. Tide's version of XCorr has been modified so that it is always identical to those of a recent version SEQUEST^®. Added support for user-specifiable maximum number of missed enzyme cleavage points. Tide now performs basic charge state inference when this information is missing from the spectrum input files. User-specifiable output formatting, using new the tide-results utility, including Text format with user-specifiable fields. SQT format. PepXML format. Choose whether to display all proteins in which each peptide occurs or just one representative. Note that new file formats are created by tide-index. Index files created by previous versions of Tide should not be used with the new version. Minor changes Q-ranker now reports q-values instead of false discovery rates. Corrected several errors in the pep.xml output, related to reporting the spectrum filename, charge, and neutral mass. Changed the missed-cleavages option to allow specifying the maximum number of missed cleavages in a peptide rather than specifying none/any. Fixed a bug that did not print the spectrum file name correctly in the pep.xml files. Improved the error message produced by crux spectral-counts when the input PSM file does not contain the required q-values. Previously, Crux's theoretical spectrum includes two flanking peaks around each b- and y-ion. The new Boolean flag use-flanking-peaks, allows these to be either on or off. Flanking peaks are by default in crux sequest-search but are not used by default in crux search-for-matches. Version 1.35 March 23, 2011 Major enhancements Added the crux spectral-counts command for estimating protein quantification. Minor changes The spectrum neutral mass is now calculated from the m+h value provided by the Z-lines of an MS2 file. Also, multiple z-lines of a spectrum that have the same charge are now supported. With these changes, crux search-for-matches and crux sequest-search can now take advantage of accurate precursor masses in MS2 files generated by Bullseye (Hsieh et al. J. Proteome Res. 9(2):1138-43, 2010). Made the pep.xml file header produced by Crux compatible with the Transproteomic Pipeline. Introduced several changes to improve performance on MacOS X. Added an nterm option to the link argument of crux search-for-xlinks. Version 1.34 December 22, 2010 Minor changes The default value of the "pi-zero" parameter has been changed from 0.9 to 1.0. A new option --peptide-list has been added to crux create-index. When set to T, this option causes an ASCII file of peptides to be included in the output directory. Each line of the file lists the peptide sequence and its neutral mass. The default bin offset was returned to 0.68. It had erroneously been changed to 0 in version 1.33. A new option --compute-sp has been added to crux search-for-matches. When set to true, all candidate peptides will be scored by Sp in addition to xcorr. This option is recommended for generating input to percolator or q-ranker. Added max-ion-charge parameter to designate the maximum charge for theoretical ions in crux search-for-xlinks and crux xlink-assign-ions. Modifications may optionally indicate if they prevent cleavage or prevent cross linking. Results are printed to .pep.xml files in addition to .txt files. Added a mass-precision option which sets how many digits are written for all mass and m/z values in .txt, .sqt, and .ms2 files. The columns printed to the .txt files now vary with the crux command being run and the settings so that unused columns are not printed. Fixed a bug that limited what files were used as input for post-search commands (e.g. crux percolator) based on fileroot. Now files with different fileroots can be analyzed together. Fixed a bug that caused searches using fasta files to search peptides multiple times if they appeared in a file multiple times. Version 1.33 July 7, 2010 Major enhancements The command line syntax for compute-q-values, percolator and q-ranker was modified to allow the programs to read from and write to separate directories. A bug in q-ranker and percolator was fixed. This bug was introduced several versions ago, and it led to erroneously many PSMs receiving very small q-values. The problem relates to the scaling of the delta Cn feature, and the fix is a temporary stopgap: we remove the feature entirely. The next version should have the corrected feature in place. Minor changes A new option --min-peaks has been added to set a filter for the minimum number of peaks a spectrum must have in order for it to be searched. The default minimum is 20 peaks. The hardcoded limit for the number of proteins allowed in a database has been removed. The hardcoded limit for the number of spectra and the number of peaks/spectrum has been removed. A new option --xcorr-var-bin has been added to toggle the new binning of the m/z axis. The default bin offset was returned back to 0, but the default bin width is still 1.000508. (Note: this erroneous change was subsequently reverted in version 1.34.) The option --max-ion-charge now places a limit on the maximum charge state of the ions generated for the xcorr theoretical spectra for crux search-for-xlinks. This change is also supported by xlink-assign-ions. The option --no-xval was added to q-ranker, allowing faster execution by skipping the internal cross-validation to select hyperparameters. The precision of modification masses written to the .txt files now matches the maximum precision given in the parameter file. Sub-options prevents cleavage and prevents cross-link have been added to variable modifications and support has been provided for the "prevents cleavage" sub-option. Version 1.32 July 6, 2010 Major enhancement Version 1.32 includes changes only to Tide. A new binary is introduced — a modified version of ProteoWizard's msconvert program — that is capable of converting spectrum data from a wide range of file formats into Tide-readable .spectrumrecords input files. Minor changes tide-import-spectra is removed from the distribution. Although it was faster than msconvert, it was specific to ms2 files, whereas msconvert now works with any ProteoWizard-supported file format. A standalone utility for reading binary .spectrumrecords files is introduced, called read-spectrumrecords. This program is useful, for example, for visually checking the output of msconvert. Demo input files worm-06-10000.spectrumrecords and yeast-02-10000.spectrumrecords now replace their counterparts worm-06-10000.ms2 and yeast-02-10000.ms2. With the introduction of Tide-compatible msconvert, ms2 files are in no way special to Tide anymore. Any ProteoWizard-supported file format can be converted to the .spectrumrecords format for use with Tide. Starting with data files in Tide's spectrumrecords format eliminates the conversion step in the Tide search demo. The old demo scripts worm-demo.sh and yeast-demo.sh have each been replaced by a pair of new scripts that demonstrate the indexing and searching steps separately. This is in order to highlight the two-step process and the fact that indexing needs to be done just once for each fasta file, whereafter the index may be reused indefinitely to perform searches. Version 1.31 June 2, 2010 Major enhancements This version of Crux is released with a demo version of a new search engine, Tide. Tide is an independent reimplementation of the SEQUEST^® algorithm. The immediate ancestor of Tide is the Crux search-for-matches command, but Tide has been completely re-engineered to achieve a thousandfold improvement in speed while exactly replicating Crux XCorr scores. Currently, Tide is not fully integrated with Crux, and is available only in binary executable format. Minor changes Two options have been added to crux search-for-matches (--mz-bin-width and --mz-bin-offset) to allow control of the binning of the m/z axis. The ion-tolerance option was removed from the search tools. The default location of the left edge of the first bin along the m/z axis is 0.68, rather than 0.0. This change makes bin edges less likely to fall near fragment peak locations. The default bin width has also changed from 1.001141 to 1.000508 The log file now includes information about the date and time that the command was issued, the name of the computer on which the command was run, and the elapsed wall clock time at the end of the run. The command line option --version T has been replaced with a command crux version that prints the version number to standard output and then exits. The header for the feature file produced optionally by crux percolator and crux q-ranker now includes names for the first two columns, scan number and label (i.e. target or decoy peptide). Version 1.30 May 5, 2010 Major enhancements A new command, xlink-search, has been added. This command searches a collection of spectra against a sequence database, finding cross-linked peptide matches. The algorithm was described in the following article: Sean McIlwain, Paul Draghicescu, Pragya Singh, David R. Goodlett and William Stafford Noble. "Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs." Journal of Proteome Research. 2010. A new search command, sequest-search, was added. The original command, search-for-matches, behaves as before except that no .sqt files are printed and no Sp scoring is performed. The new sequest-search emulates SEQUEST^® searching. It first scores all candidate peptides with the Sp score, then ranks and filters the results by that score, scoring the remaining candidates with xcorr. Results are printed to .txt and .sqt files. The .mzXML file format is now supported for crux search-for-matches and crux sequest-search when the --use-mstoolkit option is set to TRUE. The .csm output files from search-for-matches are no longer produced and post-search operations take .txt files as input. Minor changes Replaced mass-window parameter with precursor-window and added precursor-window-type for selecting windows of type mass, m/z, or ppm. The feature-file option is now true/false instead of taking a file name. The file is named '<fileroot>.<qranker| percolator>.features.txt'. A header with the column names was added to the file. The new option, --scan-number, performs a search on a specified subset of the spectra in the given file. When only one decoy file is produced, the name is now 'decoy.txt' instead of 'decoy-1.txt' In the .txt files, protein names are now followed by the start index of the peptide. Modified sequences are reported differently in the .txt files. Instead of modifications being represented by symbols (*,@,#, etc.) they are indicated with the mass shift of the modification within square brackets. As before, the modification information follows the residue that is modified. If multiple modifications appear on one residue, the masses may either be summed together or printed separately in a comma separated list. This behavior is controlled by the --display-summed-mod-masses option. The modified decoy sequences are generated differently. Before, all modified peptides were generated and each one was shuffled to create a decoy peptide. Now a peptide is shuffled once for each peptide_mod and all modifications are applied to that same shuffled peptide. The 'percolator rank' column is now based on percolator scores instead of percolator q-values. For cases where two PSMs have different percolator scores but the same q-value, the ranks will reflect the score differences. The Weibull parameters used for computing p-values are now printed to the .txt files. The reporting of DeltaCn has changed. For PSMs of rank i, deltaCn used to computed be as deltaCn_i = (xcorr_0 - xcorr_i) / xcorr_0. Thus, deltaCn for the top-ranked PSM was always 0. Now deltaCn_i = (xcorr_0 - xcorr_i+1) / xcorr_0. This change only affects search-for-matches, not sequest-search. It also means that q-ranker and percolator use the correct DeltaCn values. A new command, print-processed-spectra, performs XCorr-style pre-processing on all spectra in a given file. A small bug was fixed, in which even if the user requested average mass, the search programs still used monoisotopic mass in the calculation of XCorr and Sp. Protein fasta files are now parsed protein-by-protein so that parsing large files does not use excessive amounts of memory. A limit was removed on how many peptides a protein can produce. Crux is now built with g++ instead of gcc. Version 1.22 September 16, 2009 Major enhancements Crux is now distributed in two versions, a full version that is covered by the same type of license as before (free to non-profit users, and via a licensing fee to commercial users), as well as a stripped-down version that is released under an open source license. The stripped-down version does not include the database search functionality but does include all of the post-processing tools. We are unable to release the entire Crux package under an open source license due to intellectual property issues. Both versions of Crux are available via the Crux web page: http://noble.gs.washington.edu/proj/crux/. A new tool, q-ranker, is available for estimating peptide-spectrum match q-values. This tool was described in the following article: Marina Spivak, Jason Weston, Leon Bottou, Lukas Käll and William Stafford Noble. "Improvements to the Percolator algorithm for peptide identification from shotgun proteomics data sets." Journal of Proteome Research. Version 1.05 of percolator has now been integrated into the Crux source tree. A separate installation of percolator is no longer needed for basic percolator functionality. Note, however, that percolator remains under active development. You may therefore wish to install the current, stand-alone version of percolator and run it separately to take advantage of new features. Minor changes The internal normalization of the observed spectra has been modifed to drop those peaks whose intensity is less than 1/20 of the maximum intensity in the spectrum. This brings the xcorr score for crux into closer agreement with the xcorr score for SEQUEST^®. Compute-q-values now generates three different q-values (1) from p-values using an analytical null model, (2) from decoys and xcorr using an empirical null model, or (3) from decoys and p-values using an empirical null model. All three types of q-values are computed when p-values and decoys are present in the search results. A parameter file is now automatically written to the output directory. A log file recording messages sent to stderr has been added for search-for-matches, compute-q-values, and percolator. The --use-mz-window parameter is now available for search-for-matches. When enabled, peptides must be within +/- 'm/z-window' of the spectrum m/z. The m/z-window value is taken from mass-window. A numerical bug in the Weibull p-value calculation was fixed, which had previously caused occasional erroneous NaNs to be output. The Weibull estimated p-values generated by search-for-matches are now returned as p-values instead of as -log(p-value). The corresponding q-values returned from compute-q-values are also now returned without the -log transform. The --precision option has been changed to control the total number of significant digits printed instead of the number of digits after the decimal point. The default precision has changed from 6 to 8. The parameters estimated for the Weibull distribution (used for computing p-values) now use the xcorrs from all PSMs for a spectrum instead of a random selection of 500. The estimation of Weibull distribution parameters requires a minimum number of scored PSMs. In the previous version, spectra with fewer PSMs than the minimum were not given a p-value. Crux will now generate extra decoys until there are enough scores. The p-values for decoy PSMs are now generated from the same Weibull distribution parameters as are used for the targets of the same spectrum. Version 1.21 May 14, 2009 The output for search-for-matches, compute-q-values, and percolator has been revised extensively. crux will now create a directory, and all output files will be created in that directory. By default the directory will be named crux-output, but this can be changed using the new output-dir option. The output files for search-for-matches will be: search.target.csm search.decoy-?.csm search.target.sqt search.target.txt The output files for compute-q-values will be: qvalues.target.sqt qvalues.target.txt The output files for percolator will be: percolator.target.sqt percolator.target.txt The fileroot option has been added. This option is used to specify a string which will be added as a prefix to all output files. The option cleavages was replaced with two options, enzyme which specifies the name of an enzyme (e.g.trypsin) and digestion which indicates the degree of specificity, partial or full digest. The full list of available enzymes is in the html docs and in the usage statement. See also custom-enzyme below. The option custom-enzyme allows users to define arbitrary digestion rules. This overrides the enzyme option. Syntax for the custom digestion rule is the same the syntax used by X!Tandem and is described in the html docs. The number of PSMs per spectrum printed to the output files is now controlled by one option, top-match. This makes max-sqt-result obsolete. It is now possible to control how many decoy sequences are generated and in which file(s) they are returned. There is a new option, num-decoys-per-target, which can be used to generate more than one shuffled peptide per spectrum. This replaces number-decoy-set. A new option, decoy-location has been introduced. The three possible values are 'target-file' where all PSMs (target and decoys) are sorted together for each spectrum and returned in one file, 'one-decoy-file' where target PSMs are printed to one file and all decoys are printed to another, and 'separate-decoy-files' where there are as many decoy files as there are decoys per target. Protein names for decoy matches are now prepended with 'rand_' in the SQT files as in 'L rand_Y45678'. The option unique-peptides only applies to crux-generate-matches. Each peptide is stored in the index exactly once with references to all protein sources. Searches with fasta files print each peptide only once. The precision of the masses and scores printed to the sqt and text files can now be specified by the user. The default precision changed from 2 to 6. Search progress is now reported by printing every 10th spectrum that is searched. The verbosity can be adjusted with the parameter print-search-progress. Decoy (shuffled) sequences now keep the first and last residue the same as the target sequence that was shuffled to produce it. This is a reversion to previous behavior. It is now possible to skip the Sp score and score all PSMs with xcorr. The default procedure is still to score all peptides for one spectrum with Sp, rank by Sp, and eliminate all but the best-ranking PSMs (by default, the top 500). The remaining PSMs are scored by xcorr, re-ranked by xcorr and the top results returned. By setting max-rank-preliminary=0, the Sp scoring is skipped and xcorr is computed for all PSMs. A new parameter reverse-sequence can be used to generate decoy peptides by reversing them rather than shuffling. The first and last residues are left unmoved. If the sequence is a palindrome , then a decoy will be generated by shuffling and a note to that effect will be printed at the DETAILED INFO level of output (verbosity = 40). P-values are now computed for decoy peptides. The algorithm used to calculated the xcorr score has been modified so that xcorr score will be in better agreement with scores generated by SEQUEST^®. Version 1.20 January 6, 2009 Generating peptides and searching with up to eleven different dynamic modifications is now possible. New options associated with this feature are mod, cmod, nmod, max-mods, max-aas-modified. The format of the .csm files has changed and files written by older versions of crux are not readable with crux version 1.2. When the option cleavages is set to all, peptide generation ignores all tryptic cleavage sites, effectively setting the missed-cleavages option to TRUE regardless of user settings. When one spectrum has identical xcorr scores for different sequences, the rank of all those matches will be the same. Matches with the next highest score will rank one below. The options for setting the preliminary and primary score type have been removed and are fixed as Sp and xcorr, respectively. A new option, compute-p-values=<T | F>, was added to control p-value computation. The SQT file contains the spectrum calculated mass instead of observed mass/charge on the S line. There is now a test for confirming that the file downloaded from the crux website was not corrupted. See installation instructions for details. Calculating a p-value requires a minimum of 40 matches. Spectra with fewer than 40 matches will have p-value scores returned as NaN and a warning will be printed at the DETAILED_INFO level (40) of verbosity. Fixed error in generating neutral-loss peaks created as part of the theoretical spectrum. Version 1.02 December 1, 2008 Three programs, crux-create-index, crux-search-for-matches, and crux-analyze-matches, were merged into one program named crux. Percolator is now truly optional as all Crux programs will build without it. Fragment masses can now be calculated as average or mono-isotopic. This is controlled by the fragment-mass option in the parameter file. The name of the score-type option that calculates p-values was changed from xcorr-logp to xcorr-pvalue. SQT files have two new lines in the header which describe the arrangement of values in the results. HTML documentation was updated to reflect the above changes. Version 1.01 October 15, 2008 A bug limiting the length of the name of an index file was fixed. Modifications were made so that Crux will build with version 1.05 of Percolator. This is the only supported version of Percolator. Memory leaks in crux-search-for-matches were patched. The --version option was added. Version 1.0 March 4, 2008 Initial release