Release notes for Crux
Changes since the last major release
Major changes
- 07 Sept 2016: Added isotope-error parameter to Tide.
- 30 Aug 2016: Updated Percolator to version 3.00.
Minor changes
- 8 Aug 2016: Fixed incorrect default value for the cpos parameter to percolator.
Version 3.0
August 1, 2016Major changes
- The
tide-search
command now supports threading. See thenum-threads
parameter for details.- Added the
pipeline command to run a series of commands, and the
cascade-search
to run searches across a series of databases.- Renamed the command
calibrate-scores
toassign-confidence
and completely revamped its functionality.- The
search-for-xlinks
command offers parameters to control categories of candidates:xlink-include-inter
,xlink-include-intra
, andxlink-include-inter-intra
.- Added the
peptide-centric-search
option to tide-search.- Compiling the Mac OS version now requires OS X Yosemite. It can be built from source using the Clang compilers distributed with the latest version of XCode.
- 32-bit and 64-bit Windows versions are now available, built using Visual Studio 2013.
- Updated Percolator to version 2.10.
- Updated Hardklor to version 2.30.
- Updated Comet to version 2016.01 rev. 1.
Minor changes
- The Windows version of Crux can be built with support for vendor specific file formats disabled.
- Added the
temp-dir
option totide-index
.- The utilities
create-docs
,subtract-index
, andpsm-convert
were added to Crux.- The
generate-peptides
application has been revamped.- Fixed a bug in
barista
that occurred when the number of peptides exceeded the number of PSMs.- Precursor mass selection is fixed in
tide-search
when units are m/z.hardklor
andbullseye
now use standard logging.- Version number added to log files. Additionally, the version numbers of
percolator
,comet
, andboost
are shown in the output of theversion
command.- Short usage message is displayed on error (the full usage message is still displayed when a command is entered with no arguments).
feature-in-file
parameter added topercolator
.- The
seed
parameter forpercolator
has been renamed topercolator-seed
.comet
exits with return code 1 on failure, rather than 0.- Default value for
use_sparse_matrix
changed to1
.- Default value for
use-neutral-loss-peaks
changed to
true
.xlink-score-method
parameter added toxlink-score-spectrum
.- deltaLCn column added to
tide-search
tab-delimited output.- Bug fixed in reporting flanking amino acids in
PostProcessProtein.cpp
.Mix-max
procedure is updated to handle ties among target and decoy scores.- Default value of
use-neutral-loss-peaks
changed totrue
Exact p-value
calculation takes into account the flanking and neutral loss peaks.- A bug was fixed in the naming of the Percolator input (.pin) file that is produced by Comet when the
output_percolatorfile
option is turned on. Previously, the file was named "comet.tsv"; now it is "comet.target.pin."Assign-confidence
can take multiple input files.- The
comet
command can take multiple input files.- The
sidak
option has been added toassign-confidence
to perform a Sidak adjustment.- The
peptide-level
option has been added toassign-confidence
.- The refactored XCorr score has been re-scaled (by dividing by 20) to put it into a range that is comparable to that of the original XCorr score.
- The default value of
mz-bin-offset
has been changed from 0.68 to 0.40.- The
percolator
command outputs thepout
XML format again (controlled by thepout-output
parameter).- The
percolator
command can output its native output by using theoriginal-output
parameter.- Fixed a bug in
tide-search
causing it to fail whenremove-precursor-peak
removed all peaks in a spectrum.- The
max-precursor-charge
parameter fortide-search
has been introduced.- Fixed a bug in candidate peptide selection when m/z tolerance is used.
- Added the
read-tide-index
command, which reads an index produced bytide-index
and prints a list of peptides it contains.- Added the
stop-after
option to print-processed-spectra, which controls the point at which to stop preprocessing.- The
tide-search
command now accepts multiple spectrum input files.- Added the
use-z-line
option to specify whether precursor information is taken from the S or Z line when parsing MS2 files using ProteoWizard.- Added missing options to the
percolator
command.- The
tide-search
command may now accept a FASTA database in place of the index, in which casetide-index
will be run prior to the search. Thestore-index
option allows the generated index to be saved.- The
assign-confidence
command will automatically detect score type if thescore
parameter is not specified, searching for xcorr, e-value, or exact p-values.- The
cascade-search
will report the source index database of the peptide for all identified PSMs.- The
top-match
parameter has been removed fromcascade-search
.Assign-confidence
has got new options: "combine-modified-peptide" and "combine-charge-states".Assign-confidence
now supports top-match>1 in peptide-level filtering mode.Assign-confidence
can use smoothed-p-value scores for PSM ranking.Assign-confidence
now does not carry out target-decoy competition in peptide-level filtering mode.- Added the
xlink-assign-ions
andxlink-score-spectrum
commands.- Rearranged estimation methods in
assign-confidence
.- Added top-match option for
estimation-method=peptide-level
case inassign-confidence
.- Modified assign-confidence so that ties during target-decoy competition are broken randomly.
- Changed the Percolator output feature "file" to contain the name of the file containing the spectrum (if available), rather than the name of the file containing the PSM.
- Added to
search-for-xlinks
,xlink-assign-ions
andxlink-score-spectrum
parameters of the form "use-a-ion" for a, b, c, x, y, and z-ions.- Various other minor bug/leak fixes and performance enhancements.
- Added the
allow-dups
option to tide-index.
Version 2.1
October 8, 2014Major enhancements
- The
tide-search
command now supports calculation of exact p-values via dynamic programming, as described in this article. The p-value calculation is controlled with theexact-p-value
parameter.search-for-xlinks
now supports variable modifications and larger protein databases. This new code is disabled by default, but can be turned on by setting the parameteruse-old-xlink=F
.Minor changes
- The default value of
mz_bin_offset
has been changed to 0.40 from 0.68.- The
centroided
parameter for thehardklor
command is now implemented.- The
spectrum-format
parameter for thebullseye
command works properly.- The
tide-search
command now uses information from the Z line rather than the S line when reading spectrum files in the MS2 file format.- The
tide-search
command now accepts thespectrum-parser
parameter.- The
protein-database
parameter tospectral-counts
now only accepts fasta files. Parsing of a protein index is no longer supported.- A bug was fixed in the
spectral-counts
command that preventedtide-search
output files from being parsed correctly.- The
default-direction
parameter now takes a string value of the feature name rather than an integer.- The parameters
mz-bin-width
andmz-bin-offset
have been added totide-search
.- Percolator output now includes, for each PSM, the name of the file in which the spectrum resides.
- The parameters
clip-nterm-methionine
,keep-terminal-aminos
, andmin-mods
have been added totide-index
.- The parameters
use-flanking-peaks
anduse-neutral-loss-peaks
have been added totide-search
.- The calculation of XCorr was simplified in Tide. In the new version,
tide-index
only generates peptides and sorts them, leaving the generation of theoretical peaks to be done entirely on the fly bytide-search
.- The
tide-search
command now prints search progress reports both to the screen and to the log file. The interval at which progress is printed can be controlled using theprint-search-progress
parameter.- The
calibrate-scores
command now outputs files with the stem "calibrate-scores" rather than "qvalue."- The
use-flanking-peaks
parameter now has a default value of false.- The Comet search engine was updated to version 2014011.
- The MSToolkit parser was updated to the latest version (r73).
- The
version
command now outputs the revision number.- Various improvements in performance and error handling were implemented.
Version 2.0
June 6, 2014Major enhancements
- Two new search engines are now included in Crux: Comet and Tide. The old search engine, search-for-matches, has been retired.
- Percolator has been updated to version 2.07.
- Crux now compiles in native Windows, rather than requiring Cygwin. Consequently, Crux running under Windows can parse vendor proprietary formats using the appropriate Proteowizard libraries.
Minor changes
- The "enyzme" parameters "lysc", "lysn", "arg_c", "glue_c", and "pepsin_a" were renamed to "lys-c", "lys-n", "arg-c", "glue-c", and "pepsin-a".
- Percolator outputs decoy files in addition to target files.
- Percolator tab-delimited peptides output now contain a posterior error probability (PEP) column.
- The Percolator tab-delimited peptide output has been corrected to show the PSM with the best score, rather than the worst.
- The Percolator tab-delimited PSMs output "matches/spectrum" value has been corrected.
- A new utility command,
make-pin
, is provided to create input files for use by Percolator.- The
spectrum-min-mass
andspectrum-max-mass
options have been renamed tospectrum-min-mz
andspectrum-max-mz
, respectively.- A
spectrum-parser
option was added toq-ranker
andbarista
.- A bug was fixed in
q-ranker
andbarista
when reading peptides with modifications.- Fixed the
q-ranker
tab delimited output so that, for a given peptide, flanking amino acids are reported for each protein that contains that peptide, rather than only reporting one set of flanking amino acids.- A bug was fixed in
percolator
, wherein the first and last two characters were being truncated from peptide sequences when the input was missing the flanking amino acids or when the charge state was not indicated in the PSM IDs.- The crux spectrum parser has been removed.
- Q-ranker now outputs decoy files.
- The Boost.Random library is now being used for random number generation.
calibrate-scores
no longer requires a protein input.calibrate-scores
now only takes the top match per spectrum and charge.get-ms2-spectrum
now prints z-lines with Bullseye files.- Various performance and error handling improvements.
Version 1.40
May 22, 2013Major enchancements
- Crux Percolator was updated from version 1.05 to the latest release, version 2.04.
Minor changes
The formula that is used to convert fragment m/z values from real numbers into integers was modified. Previously, the conversion was
floor( (x / mz-bin-size) + 0.5 + mz-bin-offset )The new conversion formula isfloor( (x / mz-bin-size) + 1.0 - mz-bin-offset )The default values of the two parameters (mz-bin-size and mz-bin-offset) have not changed. The allowed range of the mz-bin-offset parameter was previously [-1,1], but has been changed to [0,1].To understand the motivation for the changed formula, note that the mz-bin-offset parameter controls the location of bin edges, relative to integer masses on the mass scale. In general, mz-bin-offset should be chosen so that bin edges fall between the expected clusters of fragment masses. For fragments with 1+ charge, masses will cluster near integer values (after dividing by mz-bin-width), and the ideal value of mz-bin-offset would be 0.5. For mixtures of fragments with 1+ and 2+ charges, masses will cluster near integer and half-integer values, so mz-bin-offset near 0.25 or 0.75 would produce bin edges that best avoid the clusters of fragment masses. The old conversion formula did not correctly locate bin edges, given a chosen mz-bin-offset. The new conversion formula produces a proper translation of mz-bin-offset into bin edge locations.
- The
seed
parameter is now available to control the seeding of random number generator. The default value is 1.seed
is available as a parameter from all applications. It can be set as a command line option forcrux search-for-matches
andcrux percolator
.- The default value of the
decoys
parameter forcrux search-for-matches
was incorrectly documented. The default is actuallypeptide-shuffle
, but the documentation said the default wasprotein-shuffle
. The documentation has been corrected.- The incorrectly labeled column header
xcorr rank
in barista and q-ranker feature files has been corrected toxcorr score
.- The q-ranker output file
qranker_output.xml
has been renamed toqranker.xml
.- The deltaCn computation has been modified to be consistent with Percolator. Rather than using
deltaCni = (xcorr1 - xcorri+1) / xcorr1
it is now computed using
deltaCni = (xcorri - xcorri+1) / max(xcorri, 1.0)
- Percolator now accepts inputs in SQT, PepXML, and tab-delimited formats in addition to the PinXML format.
- Flags were added to several commands allowing the user to control whether various results files are created in the output directory:
By default, all of these flags are set to false except
--sqt-output <T|F>
(search-for-matches)--mzid-output <T|F>
(search-for-matches, percolator)--pinxml-output <T|F>
(search-for-matches)--pepxml-output <T|F>
(search-for-matches, q-ranker, barista, percolator)--txt-output <T|F>
(search-for-matches, q-ranker, barista, percolator)txt-output
.
Version 1.39
October 6, 2012Major enchancements
- Crux is now released under an Apache license for all users.
- Parsing MS/MS spectra is now supported using Proteowizard (v. 3.0.3950). This allows Crux to handle MS/MS spectra in mzML (1.0 and 1.1), mzXML, MGF, MS2 and CMS2 formats. Use
spectrum-parser=pwiz
to enable Proteowizard parsing.- Crux now uses
cmake
for the build process rather thanautoconf
/automake
.Minor changes
- The Tide database search software is no longer distributed as part of Crux because its license is not compatible with the new Crux license.
- Support for PIN XML output was added to
crux search-for-matches
.- Fixed a bug in
crux create-index
on Windows and Cygin that caused the program to fail with the message:WARNING: Cannot rename directory FATAL: Failed to create index- The parameter
protein database
is now an option rather than a required argument forcrux spectral-counts
.- mzIdentML file support has been added for
crux spectral-counts
.- PepXML file support has been fixed for
crux spectral-counts
.- Fixed bug in packedNorm() that sometimes generated incorrect values for the posterior error probability.
- Fixed bug in
crux create-index
on Windows and Cygwin that caused it to fail on Windows and Cygin.
Version 1.38
July 20, 2012Major enhancements
- Hardklor and Bullseye, tools for analyzing high-resolution precursor spectra, are now incorporated into Crux.
- Barista has been modified extensively so that it more closely resembles the other tools in Crux:
- Barista and Q-ranker accept search results in tab-delimited format, in addition to SQT format.
- Barista now reports posterior error probabilities, in addition to q-values, for PSMs, peptides and proteins.
- Barista's pep.xml output has been updated to be compatible with the TransProteomic Pipeline (TPP).
- The
sequest-search
command has been removed. The functionality of this command is still available using options for thesearch-for-matches
command (see the frequently asked questions list for details).Minor changes
crux predict-peptide-ions
calculates the mass shift resulting from the nh3 or h2o neutral loss correctly. Previously, a neutral loss resulted in an addition of mass rather than subtraction.crux predict-peptide-ions
reports ion-type as 'b','y','a', or 'p' rather than a number (i.e. 0, 1, 2, 3).crux predict-peptide-ions
now only allows one type of neutral loss modification per ion. Previously, multiple different types of neutral loss could be applied simultaneously to a single ion.- The
neutral-losses
option was removed fromcrux predict-peptide-ions
, due to its redundancy with the supported the--nh3
and--h2o
options.crux predict-peptide-ions
supports 'bya' in the--primary-ions
parameter, which will generate the a-ion series in addition to the 'b' and 'y' ions.- The spectrum parsing is updated so that peaks with zero intensity are ignored.
crux spectral-counts
now has an option to compute the raw spectral count.crux spectral-counts
now has an option to compute dNSAF values.- A bug in
crux spectral-counts
was fixed so that it now properly normalizes the NSAF value.- Multiple missed cleavages are now handled by
crux search-for-xlinks
using the missed-cleavages parameter. Previously, this was unsupported.- Ambiguous amino acids are handled differently. 'J' is interpreted to indicate 'I' or 'L'. Peptides containing B, Z, and X are no longer be allowed, and a warning is issued when these amino acid codes are encountered.
- The stand-alone
crux-predict-peptide-ions
andcrux-generate-peptides
applications have been integrated into the maincrux
application.- A bug has been fixed that affected the intensity adjustment of observed spectra when the bin width was set much larger or smaller than 1.
- Beginning with version 1.37, q-ranker erroneously included decoy PSMs along with target PSMs in the q-ranker.target.psms.txt file. All PSMs were sorted by q-ranker score so the targets and decoys were mixed together with no accompanying labels. This bug has been fixed so that Q-ranker once again returns only target PSMs.
- The search option
display-summed-masses
has been replaced bymod-mass-format
. With this option, there is a new format for reporting modificiations: the mass in the square braces is that of the preceeding amino acid plus the modification mass. This format is compliant with TPP pep.xml output.- Modifications are reported differently in the pep.xml header to be compliant with the TPP.
- The command
crux compute-q-values
now reports posterior error probabilities in addition to q-values. Consequently, the command was renamedcrux calibrate-scores
.- Crux now provides an explicit error message when an incorrectly formatted MS2 file is parsed. Previously, crux simply reported that no spectra were found.
- Several additional scores were added to the
q-ranker
pep.xml files.- Fixed the build procedure to work on both Lion and pre-Lion versions of OS X.
- Fixed a bug in the normalization of the observed spectra. Previously, after the 10-bin normalization of intensities, peaks with heights below 5% of the maximum were not being removed.
- When a fragmentation spectra file is read, the assumed peptide charge state information can be missing for the scans. In cases where this information has not been provided, Crux will now estimate the possible charge states.
- The feature files for Q-ranker and Barista now contain a header describing each feature.
- Q-ranker and Barista now output all of the fields provided in the input search files.
Version 1.37
December 22, 2011Major enhancements
- Added
crux barista
, a tool for inferring protein identifications.- Reimplemented
crux q-ranker
with new outputs and command-line syntax.- Percolator and compute-q-values now compute and report posterior error probabilities, in addition to q-values.
crux create-index
now generates and store decoy peptides in the index. Decoys can still be generated on-the-fly when searching .fasta files.Minor changes
- A pair of options,
cterm-fixed
andnterm-fixed
, allow fixed terminal peptide modifications, i.e., a mass shift applied to every peptide on either or both termini.- A bug in the spectrum processing code was fixed, which was erroneously eliminating a few of the highest m/z fragment peaks. Xcorrs have changed slightly as a result.
- Simplified the
--mz-bin-width
and --mz-offset parameters so that--xcorr-var-bin
is no longer needed.- Indexes now store information about any static mods used when they were created.
- In the tab-delimited output files the 'matches/spectrum' column gives the number of target peptides compared to the spectrum and the decoy file has an additional column, 'decoy matches/spectrum' that gives the number of decoy peptides compared to the spectrum. These two numbers may differ when decoys are not generated on-the-fly.
- The header lines of Barista and QRanker include all of the columns in the search results.
Version 1.36
August 4, 2011Tide changes
- Added support for user-specifiable variable and static amino acid modifications.
- Tide's version of XCorr has been modified so that it is always identical to those of a recent version SEQUEST®.
- Added support for user-specifiable maximum number of missed enzyme cleavage points.
- Tide now performs basic charge state inference when this information is missing from the spectrum input files.
- User-specifiable output formatting, using new the
tide-results
utility, including
- Text format with user-specifiable fields.
- SQT format.
- PepXML format.
- Choose whether to display all proteins in which each peptide occurs or just one representative.
- Note that new file formats are created by
tide-index
. Index files created by previous versions of Tide should not be used with the new version.Minor changes
- Q-ranker now reports q-values instead of false discovery rates.
- Corrected several errors in the pep.xml output, related to reporting the spectrum filename, charge, and neutral mass.
- Changed the
missed-cleavages
option to allow specifying the maximum number of missed cleavages in a peptide rather than specifying none/any.- Fixed a bug that did not print the spectrum file name correctly in the pep.xml files.
- Improved the error message produced by
crux spectral-counts
when the input PSM file does not contain the required q-values.- Previously, Crux's theoretical spectrum includes two flanking peaks around each b- and y-ion. The new Boolean flag
use-flanking-peaks
, allows these to be either on or off. Flanking peaks are by default incrux sequest-search
but are not used by default incrux search-for-matches
.
Version 1.35
March 23, 2011Major enhancements
- Added the
crux spectral-counts
command for estimating protein quantification.Minor changes
- The spectrum neutral mass is now calculated from the m+h value provided by the Z-lines of an MS2 file. Also, multiple z-lines of a spectrum that have the same charge are now supported. With these changes,
crux search-for-matches
andcrux sequest-search
can now take advantage of accurate precursor masses in MS2 files generated by Bullseye (Hsieh et al. J. Proteome Res. 9(2):1138-43, 2010).- Made the
pep.xml
file header produced by Crux compatible with the Transproteomic Pipeline.- Introduced several changes to improve performance on MacOS X.
- Added an
nterm
option to the link argument ofcrux search-for-xlinks
.
Version 1.34
December 22, 2010Minor changes
- The default value of the "pi-zero" parameter has been changed from 0.9 to 1.0.
- A new option
--peptide-list
has been added tocrux create-index
. When set to T, this option causes an ASCII file of peptides to be included in the output directory. Each line of the file lists the peptide sequence and its neutral mass.- The default bin offset was returned to 0.68. It had erroneously been changed to 0 in version 1.33.
- A new option
--compute-sp
has been added tocrux search-for-matches
. When set to true, all candidate peptides will be scored by Sp in addition to xcorr. This option is recommended for generating input topercolator
orq-ranker
.- Added
max-ion-charge
parameter to designate the maximum charge for theoretical ions incrux search-for-xlinks
andcrux xlink-assign-ions
.- Modifications may optionally indicate if they prevent cleavage or prevent cross linking.
- Results are printed to .pep.xml files in addition to .txt files.
- Added a
mass-precision
option which sets how many digits are written for all mass and m/z values in .txt, .sqt, and .ms2 files.- The columns printed to the .txt files now vary with the
crux
command being run and the settings so that unused columns are not printed.- Fixed a bug that limited what files were used as input for post-search commands (e.g.
crux percolator
) based on fileroot. Now files with different fileroots can be analyzed together.- Fixed a bug that caused searches using fasta files to search peptides multiple times if they appeared in a file multiple times.
Version 1.33
July 7, 2010Major enhancements
- The command line syntax for
compute-q-values
,percolator
andq-ranker
was modified to allow the programs to read from and write to separate directories.- A bug in q-ranker and percolator was fixed. This bug was introduced several versions ago, and it led to erroneously many PSMs receiving very small q-values. The problem relates to the scaling of the delta Cn feature, and the fix is a temporary stopgap: we remove the feature entirely. The next version should have the corrected feature in place.
Minor changes
- A new option
--min-peaks
has been added to set a filter for the minimum number of peaks a spectrum must have in order for it to be searched. The default minimum is 20 peaks.- The hardcoded limit for the number of proteins allowed in a database has been removed.
- The hardcoded limit for the number of spectra and the number of peaks/spectrum has been removed.
- A new option
--xcorr-var-bin
has been added to toggle the new binning of the m/z axis. The default bin offset was returned back to 0, but the default bin width is still 1.000508. (Note: this erroneous change was subsequently reverted in version 1.34.)- The option
--max-ion-charge
now places a limit on the maximum charge state of the ions generated for the xcorr theoretical spectra forcrux search-for-xlinks
. This change is also supported byxlink-assign-ions
.- The option
--no-xval
was added to q-ranker, allowing faster execution by skipping the internal cross-validation to select hyperparameters.- The precision of modification masses written to the .txt files now matches the maximum precision given in the parameter file.
- Sub-options
prevents cleavage
andprevents cross-link
have been added to variable modifications and support has been provided for the"prevents cleavage"
sub-option.
Version 1.32
July 6, 2010Major enhancement
- Version 1.32 includes changes only to Tide. A new binary is introduced — a modified version of ProteoWizard's msconvert program — that is capable of converting spectrum data from a wide range of file formats into Tide-readable .spectrumrecords input files.
Minor changes
- tide-import-spectra is removed from the distribution. Although it was faster than msconvert, it was specific to ms2 files, whereas msconvert now works with any ProteoWizard-supported file format.
- A standalone utility for reading binary .spectrumrecords files is introduced, called read-spectrumrecords. This program is useful, for example, for visually checking the output of msconvert.
- Demo input files worm-06-10000.spectrumrecords and yeast-02-10000.spectrumrecords now replace their counterparts worm-06-10000.ms2 and yeast-02-10000.ms2. With the introduction of Tide-compatible msconvert, ms2 files are in no way special to Tide anymore. Any ProteoWizard-supported file format can be converted to the .spectrumrecords format for use with Tide. Starting with data files in Tide's spectrumrecords format eliminates the conversion step in the Tide search demo.
- The old demo scripts worm-demo.sh and yeast-demo.sh have each been replaced by a pair of new scripts that demonstrate the indexing and searching steps separately. This is in order to highlight the two-step process and the fact that indexing needs to be done just once for each fasta file, whereafter the index may be reused indefinitely to perform searches.
Version 1.31
June 2, 2010Major enhancements
- This version of Crux is released with a demo version of a new search engine, Tide. Tide is an independent reimplementation of the SEQUEST® algorithm. The immediate ancestor of Tide is the Crux
search-for-matches
command, but Tide has been completely re-engineered to achieve a thousandfold improvement in speed while exactly replicating Crux XCorr scores. Currently, Tide is not fully integrated with Crux, and is available only in binary executable format.Minor changes
- Two options have been added to
crux search-for-matches
(--mz-bin-width
and--mz-bin-offset
) to allow control of the binning of the m/z axis.- The
ion-tolerance
option was removed from the search tools.- The default location of the left edge of the first bin along the m/z axis is 0.68, rather than 0.0. This change makes bin edges less likely to fall near fragment peak locations. The default bin width has also changed from 1.001141 to 1.000508
- The log file now includes information about the date and time that the command was issued, the name of the computer on which the command was run, and the elapsed wall clock time at the end of the run.
- The command line option
--version T
has been replaced with a commandcrux version
that prints the version number to standard output and then exits.- The header for the feature file produced optionally by
crux percolator
andcrux q-ranker
now includes names for the first two columns, scan number and label (i.e. target or decoy peptide).
Version 1.30
May 5, 2010Major enhancements
- A new command,
xlink-search
, has been added. This command searches a collection of spectra against a sequence database, finding cross-linked peptide matches. The algorithm was described in the following article:Sean McIlwain, Paul Draghicescu, Pragya Singh, David R. Goodlett and William Stafford Noble. "Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs." Journal of Proteome Research. 2010.- A new search command,
sequest-search
, was added. The original command,search-for-matches
, behaves as before except that no .sqt files are printed and no Sp scoring is performed. The newsequest-search
emulates SEQUEST® searching. It first scores all candidate peptides with the Sp score, then ranks and filters the results by that score, scoring the remaining candidates with xcorr. Results are printed to .txt and .sqt files.- The .mzXML file format is now supported for crux search-for-matches and crux sequest-search when the
--use-mstoolkit
option is set to TRUE.- The .csm output files from
search-for-matches
are no longer produced and post-search operations take .txt files as input.Minor changes
- Replaced
mass-window
parameter withprecursor-window
and addedprecursor-window-type
for selecting windows of type mass, m/z, or ppm.- The
feature-file
option is now true/false instead of taking a file name. The file is named '<fileroot>.<qranker| percolator>.features.txt'. A header with the column names was added to the file.- The new option,
--scan-number
, performs a search on a specified subset of the spectra in the given file.- When only one decoy file is produced, the name is now 'decoy.txt' instead of 'decoy-1.txt'
- In the .txt files, protein names are now followed by the start index of the peptide.
- Modified sequences are reported differently in the .txt files. Instead of modifications being represented by symbols (*,@,#, etc.) they are indicated with the mass shift of the modification within square brackets. As before, the modification information follows the residue that is modified. If multiple modifications appear on one residue, the masses may either be summed together or printed separately in a comma separated list. This behavior is controlled by the
--display-summed-mod-masses
option.- The modified decoy sequences are generated differently. Before, all modified peptides were generated and each one was shuffled to create a decoy peptide. Now a peptide is shuffled once for each peptide_mod and all modifications are applied to that same shuffled peptide.
- The 'percolator rank' column is now based on percolator scores instead of percolator q-values. For cases where two PSMs have different percolator scores but the same q-value, the ranks will reflect the score differences.
- The Weibull parameters used for computing p-values are now printed to the .txt files.
- The reporting of DeltaCn has changed. For PSMs of rank i, deltaCn used to computed be as deltaCn_i = (xcorr_0 - xcorr_i) / xcorr_0. Thus, deltaCn for the top-ranked PSM was always 0. Now deltaCn_i = (xcorr_0 - xcorr_i+1) / xcorr_0. This change only affects
search-for-matches
, notsequest-search
. It also means thatq-ranker
andpercolator
use the correct DeltaCn values.- A new command,
print-processed-spectra
, performs XCorr-style pre-processing on all spectra in a given file.- A small bug was fixed, in which even if the user requested average mass, the search programs still used monoisotopic mass in the calculation of XCorr and Sp.
- Protein fasta files are now parsed protein-by-protein so that parsing large files does not use excessive amounts of memory.
- A limit was removed on how many peptides a protein can produce.
- Crux is now built with g++ instead of gcc.
Version 1.22
September 16, 2009Major enhancements
- Crux is now distributed in two versions, a full version that is covered by the same type of license as before (free to non-profit users, and via a licensing fee to commercial users), as well as a stripped-down version that is released under an open source license. The stripped-down version does not include the database search functionality but does include all of the post-processing tools. We are unable to release the entire Crux package under an open source license due to intellectual property issues. Both versions of Crux are available via the Crux web page: http://noble.gs.washington.edu/proj/crux/.
- A new tool, q-ranker, is available for estimating peptide-spectrum match q-values. This tool was described in the following article:
Marina Spivak, Jason Weston, Leon Bottou, Lukas Käll and William Stafford Noble. "Improvements to the Percolator algorithm for peptide identification from shotgun proteomics data sets." Journal of Proteome Research.- Version 1.05 of
percolator
has now been integrated into the Crux source tree. A separate installation ofpercolator
is no longer needed for basicpercolator
functionality. Note, however, thatpercolator
remains under active development. You may therefore wish to install the current, stand-alone version ofpercolator
and run it separately to take advantage of new features.Minor changes
- The internal normalization of the observed spectra has been modifed to drop those peaks whose intensity is less than 1/20 of the maximum intensity in the spectrum. This brings the xcorr score for
crux
into closer agreement with the xcorr score for SEQUEST®.- Compute-q-values now generates three different q-values (1) from p-values using an analytical null model, (2) from decoys and xcorr using an empirical null model, or (3) from decoys and p-values using an empirical null model. All three types of q-values are computed when p-values and decoys are present in the search results.
- A parameter file is now automatically written to the output directory.
- A log file recording messages sent to stderr has been added for
search-for-matches
,compute-q-values
, andpercolator
.- The
--use-mz-window
parameter is now available forsearch-for-matches
. When enabled, peptides must be within +/- 'm/z-window' of the spectrum m/z. The m/z-window value is taken frommass-window
.- A numerical bug in the Weibull p-value calculation was fixed, which had previously caused occasional erroneous NaNs to be output.
- The Weibull estimated p-values generated by
search-for-matches
are now returned as p-values instead of as -log(p-value). The corresponding q-values returned fromcompute-q-values
are also now returned without the -log transform.- The
--precision
option has been changed to control the total number of significant digits printed instead of the number of digits after the decimal point. The default precision has changed from 6 to 8.- The parameters estimated for the Weibull distribution (used for computing p-values) now use the xcorrs from all PSMs for a spectrum instead of a random selection of 500.
- The estimation of Weibull distribution parameters requires a minimum number of scored PSMs. In the previous version, spectra with fewer PSMs than the minimum were not given a p-value. Crux will now generate extra decoys until there are enough scores.
- The p-values for decoy PSMs are now generated from the same Weibull distribution parameters as are used for the targets of the same spectrum.
Version 1.21
May 14, 2009
- The output for
search-for-matches
,compute-q-values
, andpercolator
has been revised extensively.crux
will now create a directory, and all output files will be created in that directory. By default the directory will be namedcrux-output
, but this can be changed using the newoutput-dir
option.
The output files forsearch-for-matches
will be:The output files for
search.target.csm
search.decoy-?.csm
search.target.sqt
search.target.txt
compute-q-values
will be:The output files for
qvalues.target.sqt
qvalues.target.txt
percolator
will be:
percolator.target.sqt
percolator.target.txt
- The
fileroot
option has been added. This option is used to specify a string which will be added as a prefix to all output files.- The option
cleavages
was replaced with two options,enzyme
which specifies the name of an enzyme (e.g.trypsin) anddigestion
which indicates the degree of specificity, partial or full digest. The full list of available enzymes is in the html docs and in the usage statement. See alsocustom-enzyme
below.- The option
custom-enzyme
allows users to define arbitrary digestion rules. This overrides theenzyme
option. Syntax for the custom digestion rule is the same the syntax used by X!Tandem and is described in the html docs.- The number of PSMs per spectrum printed to the output files is now controlled by one option,
top-match
. This makesmax-sqt-result
obsolete.- It is now possible to control how many decoy sequences are generated and in which file(s) they are returned. There is a new option,
num-decoys-per-target
, which can be used to generate more than one shuffled peptide per spectrum. This replacesnumber-decoy-set
.- A new option,
decoy-location
has been introduced. The three possible values are 'target-file' where all PSMs (target and decoys) are sorted together for each spectrum and returned in one file, 'one-decoy-file' where target PSMs are printed to one file and all decoys are printed to another, and 'separate-decoy-files' where there are as many decoy files as there are decoys per target.- Protein names for decoy matches are now prepended with 'rand_' in the SQT files as in 'L rand_Y45678'.
- The option
unique-peptides
only applies tocrux-generate-matches
. Each peptide is stored in the index exactly once with references to all protein sources. Searches with fasta files print each peptide only once.- The precision of the masses and scores printed to the sqt and text files can now be specified by the user. The default precision changed from 2 to 6.
- Search progress is now reported by printing every 10th spectrum that is searched. The verbosity can be adjusted with the parameter
print-search-progress
.- Decoy (shuffled) sequences now keep the first and last residue the same as the target sequence that was shuffled to produce it. This is a reversion to previous behavior.
- It is now possible to skip the Sp score and score all PSMs with xcorr. The default procedure is still to score all peptides for one spectrum with Sp, rank by Sp, and eliminate all but the best-ranking PSMs (by default, the top 500). The remaining PSMs are scored by xcorr, re-ranked by xcorr and the top results returned. By setting max-rank-preliminary=0, the Sp scoring is skipped and xcorr is computed for all PSMs.
- A new parameter
reverse-sequence
can be used to generate decoy peptides by reversing them rather than shuffling. The first and last residues are left unmoved. If the sequence is a palindrome , then a decoy will be generated by shuffling and a note to that effect will be printed at the DETAILED INFO level of output (verbosity = 40).- P-values are now computed for decoy peptides.
- The algorithm used to calculated the xcorr score has been modified so that xcorr score will be in better agreement with scores generated by SEQUEST®.
Version 1.20
January 6, 2009
- Generating peptides and searching with up to eleven different dynamic modifications is now possible. New options associated with this feature are mod, cmod, nmod, max-mods, max-aas-modified.
- The format of the .csm files has changed and files written by older versions of crux are not readable with crux version 1.2.
- When the option
cleavages
is set toall
, peptide generation ignores all tryptic cleavage sites, effectively setting themissed-cleavages
option toTRUE
regardless of user settings.- When one spectrum has identical xcorr scores for different sequences, the rank of all those matches will be the same. Matches with the next highest score will rank one below.
- The options for setting the preliminary and primary score type have been removed and are fixed as Sp and xcorr, respectively. A new option,
compute-p-values=<T | F>
, was added to control p-value computation.- The SQT file contains the spectrum calculated mass instead of observed mass/charge on the S line.
- There is now a test for confirming that the file downloaded from the crux website was not corrupted. See installation instructions for details.
- Calculating a p-value requires a minimum of 40 matches. Spectra with fewer than 40 matches will have p-value scores returned as NaN and a warning will be printed at the DETAILED_INFO level (40) of verbosity.
- Fixed error in generating neutral-loss peaks created as part of the theoretical spectrum.
Version 1.02
December 1, 2008
- Three programs,
crux-create-index
,crux-search-for-matches
, andcrux-analyze-matches
, were merged into one program namedcrux
.- Percolator is now truly optional as all Crux programs will build without it.
- Fragment masses can now be calculated as average or mono-isotopic. This is controlled by the
fragment-mass
option in the parameter file.- The name of the score-type option that calculates p-values was changed from xcorr-logp to xcorr-pvalue.
- SQT files have two new lines in the header which describe the arrangement of values in the results.
- HTML documentation was updated to reflect the above changes.
Version 1.01
October 15, 2008
- A bug limiting the length of the name of an index file was fixed.
- Modifications were made so that Crux will build with version 1.05 of Percolator. This is the only supported version of Percolator.
- Memory leaks in
crux-search-for-matches
were patched.- The
--version
option was added.
Version 1.0
March 4, 2008
Initial release