comet
Usage:
crux comet [options] <input spectra>+ <database_name>
Description:
This command searches a protein database with a set of spectra, assigning peptide sequences to the observed spectra. This search engine was developed by Jimmy Eng at the University of Washington Proteomics Resource.
Although its history goes back two decades, the Comet search engine was first made publicly available in August 2012 on SourceForge. Comet is multithreaded and supports multiple input and output formats.
"Comet: an open source tandem mass spectrometry sequence database search tool." Eng JK, Jahan TA, Hoopmann MR. Proteomics. 2012 Nov 12. doi: 10.1002/pmic201200439
Input:
input spectra+
– The name of the file from which to parse the spectra. Valid formats include mzXML, mzML, mz5, raw, ms2, and cms2. Files in mzML or mzXML may be compressed with gzip. RAW files can be parsed only under windows and if the appropriate libraries were included at compile time.database_name
– A full or relative path to the sequence database, in FASTA format, to search. Example databases include RefSeq or UniProt. The database can contain amino acid sequences or nucleic acid sequences. If sequences are amino acid sequences, set the parameter "nucleotide_reading_frame = 0". If the sequences are nucleic acid sequences, you must instruct Comet to translate these to amino acid sequences. Do this by setting nucleotide_reading_frame" to a value between 1 and 9.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
comet.target.txt
– a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.comet.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.comet.log.txt
– a log file containing a copy of all messages that were printed to standard error.
Options:
-
Database
--decoy_search <integer>
– 0=no, 1=concatenated search, 2=separate search. Default =0
.
-
CPU threads
--num_threads <integer>
– 0=poll CPU to set num threads; else specify num threads directly. Default =0
.
-
Masses
--peptide_mass_tolerance <float>
– Controls the mass tolerance value. The mass tolerance is set at +/- the specified number i.e. an entered value of "1.0" applies a -1.0 to +1.0 tolerance. The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default =3
.--peptide_mass_units <integer>
– 0=amu, 1=mmu, 2=ppm. Default =0
.--mass_type_parent <integer>
– 0=average masses, 1=monoisotopic masses. Default =1
.--mass_type_fragment <integer>
– 0=average masses, 1=monoisotopic masses. Default =1
.--precursor_tolerance_type <integer>
– 0=singly charged peptide mass, 1=precursor m/z. Default =0
.--isotope_error <integer>
– 0=off, 1=on -1/0/1/2/3 (standard C13 error), 2=-8/-4/0/4/8 (for +4/+8 labeling). Default =0
.
-
Search enzyme
--search_enzyme_number <integer>
– Specify a search enzyme from the end of the parameter file. Default =1
.--num_enzyme_termini <integer>
– valid values are 1 (semi-digested), 2 (fully digested), 8 N-term, 9 C-term. Default =2
.--allowed_missed_cleavage <integer>
– Maximum value is 5; for enzyme search. Default =2
.
-
Fragment ions
--fragment_bin_tol <float>
– Binning to use on fragment ions. Default =1.000507
.--fragment_bin_offset <float>
– Offset position to start the binning (0.0 to 1.0). Default =0.4
.--theoretical_fragment_ions <integer>
– 0=default peak shape, 1=M peak only. Default =1
.--use_A_ions <integer>
– Controls whether or not A-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_B_ions <integer>
– Controls whether or not B-ions are considered in the search (0 - no, 1 - yes). Default =1
.--use_C_ions <integer>
– Controls whether or not C-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_X_ions <integer>
– Controls whether or not X-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_Y_ions <integer>
– Controls whether or not Y-ions are considered in the search (0 - no, 1 - yes). Default =1
.--use_Z_ions <integer>
– Controls whether or not Z-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_NL_ions <integer>
– 0=no, 1= yes to consider NH3/H2O neutral loss peak. Default =1
.
-
mzXML/mzML parameters
--scan_range <string>
– Start and scan scan range to search; 0 as first entry ignores parameter. Default =0 0
.--precursor_charge <string>
– Precursor charge range to analyze; does not override mzXML charge; 0 as first entry ignores parameter. Default =0 0
.--override_charge <integer>
– Specifies the whether to override existing precursor charge state information when present in the files with the charge range specified by the "precursor_charge" parameter. Default =0
.--ms_level <integer>
– MS level to analyze, valid are levels 2 or 3. Default =2
.--activation_method ALL|CID|ECD|ETD|PQD|HCD|IRMPD
– Specifies which scan types are searched. Default =ALL
.
-
Miscellaneous parameters
--digest_mass_range <string>
– MH+ peptide mass range to analyze. Default =600.0 5000.0
.--num_results <integer>
– Number of search hits to store internally. Default =50
.--skip_researching <integer>
– For '.out' file output only, 0=search everything again, 1=don't search if .out exists. Default =1
.--max_fragment_charge <integer>
– Set maximum fragment charge state to analyze (allowed max 5). Default =3
.--max_precursor_charge <integer>
– Set maximum precursor charge state to analyze (allowed max 9). Default =6
.--nucleotide_reading_frame <integer>
– 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six. Default =0
.--clip_nterm_methionine <integer>
– 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine. Default =0
.--spectrum_batch_size <integer>
– Maximum number of spectra to search at a time; 0 to search the entire scan range in one loop. Default =0
.--decoy_prefix <string>
– Specifies the prefix of the protein names that indicates a decoy. Default =decoy_
.--output_suffix <string>
– Specifies the suffix string that is appended to the base output name for the pep.xml, pin.xml, txt and sqt output files. Default =<empty>
.--mass_offsets <string>
– Specifies one or more mass offsets to apply. This value(s) are effectively subtracted from each precursor mass such that peptides that are smaller than the precursor mass by the offset value can still be matched to the respective spectrum. Default =<empty>
.
-
Spectral processing
--minimum_peaks <integer>
– Minimum number of peaks in spectrum to search. Default =10
.--minimum_intensity <float>
– Minimum intensity value to read in. Default =0
.--remove_precursor_peak <integer>
– 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD). Default =0
.--remove_precursor_tolerance <float>
– +- Da tolerance for precursor removal. Default =1.5
.--clear_mz_range <string>
– For iTRAQ/TMT type data; will clear out all peaks in the specified m/z range. Default =0.0 0.0
.
-
Variable modifications
--variable_mod01 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod02 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod03 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod04 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod05 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod06 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod07 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod08 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--variable_mod09 <string>
– Up to 9 variable modifications are supported; format: "<0=variable/1=binary> " e.g. 79.966331 STY 0 3. Default = 0.0 null 0 4 -1 0 0
.--max_variable_mods_in_peptide <integer>
– Specifies the total/maximum number of residues that can be modified in a peptide. Default =5
.--require_variable_mod <integer>
– Controls whether the analyzed peptides must contain at least one variable modification. Default =0
.
-
Static modifications
--add_Cterm_peptide <float>
– Specifiy a static modification to the c-terminus of all peptides. Default =0
.--add_Nterm_peptide <float>
– Specify a static modification to the n-terminus of all peptides. Default =0
.--add_Cterm_protein <float>
– Specify a static modification to the c-terminal peptide of each protein. Default =0
.--add_Nterm_protein <float>
– Specify a static modification to the n-terminal peptide of each protein. Default =0
.--add_A_alanine <float>
– Specify a static modification to the residue A. Default =0
.--add_B_user_amino_acid <float>
– Specify a static modification to the residue B. Default =0
.--add_C_cysteine <float>
– Specify a static modification to the residue C. Default =57.021464
.--add_D_aspartic_acid <float>
– Specify a static modification to the residue D. Default =0
.--add_E_glutamic_acid <float>
– Specify a static modification to the residue E. Default =0
.--add_F_phenylalanine <float>
– Specify a static modification to the residue F. Default =0
.--add_G_glycine <float>
– Specify a static modification to the residue G. Default =0
.--add_H_histidine <float>
– Specify a static modification to the residue H. Default =0
.--add_I_isoleucine <float>
– Specify a static modification to the residue I. Default =0
.--add_J_user_amino_acid <float>
– Specify a static modification to the residue J. Default =0
.--add_K_lysine <float>
– Specify a static modification to the residue K. Default =0
.--add_L_leucine <float>
– Specify a static modification to the residue L. Default =0
.--add_M_methionine <float>
– Specify a static modification to the residue M. Default =0
.--add_N_asparagine <float>
– Specify a static modification to the residue N. Default =0
.--add_O_ornithine <float>
– Specify a static modification to the residue O. Default =0
.--add_P_proline <float>
– Specify a static modification to the residue P. Default =0
.--add_Q_glutamine <float>
– Specify a static modification to the residue Q. Default =0
.--add_R_arginine <float>
– Specify a static modification to the residue R. Default =0
.--add_S_serine <float>
– Specify a static modification to the residue S. Default =0
.--add_T_threonine <float>
– Specify a static modification to the residue T. Default =0
.--add_U_selenocysteine <float>
– Specify a static modification to the residue U. Default =0
.--add_V_valine <float>
– Specify a static modification to the residue V. Default =0
.--add_W_tryptophan <float>
– Specify a static modification to the residue W. Default =0
.--add_X_user_amino_acid <float>
– Specify a static modification to the residue X. Default =0
.--add_Y_tyrosine <float>
– Specify a static modification to the residue Y. Default =0
.--add_Z_user_amino_acid <float>
– Specify a static modification to the residue Z. Default =0
.
-
Input and output
--fileroot <string>
– The fileroot string will be added as a prefix to all output file names. Default =<empty>
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.--output_sqtfile <integer>
– 0=no, 1=yes write sqt file. Default =0
.--output_txtfile <integer>
– 0=no, 1=yes write tab-delimited text file. Default =1
.--output_pepxmlfile <integer>
– 0=no, 1=yes write pep.xml file. Default =1
.--output_percolatorfile <integer>
– 0=no, 1=yes write percolator file. Default =0
.--output_outfiles <integer>
– 0=no, 1=yes write .out files. Default =0
.--print_expect_score <integer>
– 0=no, 1=yes to replace Sp with expect in out & sqt. Default =1
.--num_output_lines <integer>
– num peptide results to show. Default =5
.--show_fragment_ions <integer>
– 0=no, 1=yes for out files only. Default =0
.--sample_enzyme_number <integer>
– Sample enzyme which is possibly different than the one applied to the search. Used to calculate NTT & NMC in pepXML output. Default =1
.