Getting started with Crux
This page will talk about setting up your environment and choosing your input files. Be sure to have completed the software installation first.
Your environment
For this tutorial, we'll assume you are working in a Linux/Unix type shell (cygwin is a good choice for Windows users) and already know some basic commands for changing directories, listing files, and other simple tasks. To successfully run the sample commands, you'll need to work from a directory for which you have write permission. Anywhere in your home directory should work. Create a new directory and navigate to it.
$ mkdir crux-demo
$ cd crux-demo
We will refer to this directory, 'crux-demo'
, as the
working directory.
Next, make sure that the computer knows where to look for the crux programs. Try this command.
$ which crux
If it returns a single line with a path ending in crux, then you are set. If not, review the installation instructions on setting your $PATH environment variable.
Input file: mass spectra
Included in the crux distribution in the doc/user/data directory are some sample files containing mass spectra. We will use the demo.ms2 for this tutorial. Locate the file and copy it to the current working directory.
The beginning of the file looks like this.
H CreationDate 2/14/2007 6:19:18 PM H Extractor MakeMS2 H ExtractorVersion 1.0 H Comments MakeMS2 written by Michael J. MacCoss, 2004 H ExtractorOptions MS2/MS1 S 10 10 636.34 Z 2 1271.67 187.4 12.5 193.1 19.5 194.3 13.7
The first lines beginning with H are the header lines and contain information about the program that generated the file, the date it was created, and so on. The line starting with S begins the information about the first spectrum. Following the S is the scan number (twice) and the m/z of the precursor ion. The lines beginning with Z list the possible charge states of the spectrum (in this case 2) and the mass of the peptide at that charge state. Following the Z lines is the list of peaks for the spectrum. The subsequent spectra in the file repeat this pattern of S line, Z line(s), and peak list. demo.ms2 contains 150 spectra.
Input file: protein database (fasta file)
The second input file you will need is a protein database. This file is in fasta format and contains a list of proteins you expect to find in your sample and their sequences. A sample fasta file is also comes with the distribution in doc/user/data/. We will use small-yeast.fasta. Copy it to your current working directory.
The beginning of the file looks like this.
>YBL030C PET9 SGDID:S000000126, Chr II from 164000-163044, reverse complement, Verified ORF, "Major ADP/ATP carrier of the mitochondrial inner membrane, exchanges cytosolic ADP for mitochondrially synthesized ATP; required for viability in many common lab strains carrying a mutation in the polymorphic SAL1 gene" MSSNAQVKTPLPPAPAPKKESNFLIDFLMGGVSAAVAKTAASPIERVKLLIQNQDEML KQGTLDRKYAGILDCFKRTATQEGVISFWRGNTANVIRYFPTQALNFAFKDKIKAMFG FKKEEGYAKWFAGNLASGGAAGALSLLFVYSLDYARTRLAADSKSSKKGGARQFNGLI DVYKKTLKSDGVAGLYRGFLPSVVGIVVYRGLYFGMYDSLKPLLLTGSLEGSFLASFL LGWVVTTGASTCSYPLDTVRRRMMMTSGQAVKYDGAFDCLRKIVAAEGVGSLFKGCGA NILRGVAGAGVISMYDQLQMILFGKKFK
Lines beginning with > give the name of a protein. The first word is the protein name followed by an optional description of any length. The following lines contain the protein sequence. Proteins may or may not be separated by blank lines. small-yeast.fasta contains 56 proteins.