= Two-DB: Real and Decoy Database Search
=== 1. Create file: <i>to_sequest.sld</i>

To run sequest, first create a <code>to_sequest.sld</code> file that points
sequest to your raw data files (you can use it to run sequest and in the
multi-consensus view).

=== 2. Run Sequest with a Normal and an Inverse Database

If you don't already have one, here's how to make an inverse database:
    
    fasta_mod.rb invert <yourfile.fasta>

This will create a file with the trailing tag '_INV.fasta'.  Just type
<code>fasta_mod.rb</code> for more details.  

=== 3. Export a Bioworks XML File for each Database

1. Load your sequest results in MultiConsensus results (even if you only have one run) 
    File -> 'Load MultiConsensus Results'
2. Click 'yes' to calculate peptide probabilities [optional]
3. Click 'yes' to view results without filtering.
4. Right click on the data and 'Export' to XML (name the file <code>bioworks.xml</code>). This file is fed into ProteinProphet.
5. Filter your data on the parameters you prefer and export.  
6. Do the same thing (only need to do steps 1,2,5) for the inverted database.  Make sure to filter on these same parameters and export these results, too.  (To expirement with different parameters, open two Bioworks windows and filter the normal and inverse databases until satisfied).

=== 4. Convert to pepXML

    bioworks_to_pepxml.rb bioworks.xml -p /cygdrive/c/Xcalibur/params/myparams.params -m /cygdrive/c/Xcalibur/data/mydatafolder

By default, the pepxml files will be written to a subdirectory called
'pepxml'.  Type <code>bioworks_to_pepxml.rb</code> for more details.

=== 5. Run Protein Prophet

ProteinProphet must be run in a particular directory.  If one does not exist, create an alias (in ~/.bashrc file) to simplify getting there: <code>alias isb="cd /cygdrive/c/Inetpub/wwwroot/ISB/data"</code>.  Then, to get to the isb folder, just type:

    isb     # -> takes you to /cygdrive/c/Inetpub/wwwroot/ISB/data

Then, run protein prophet:

    xinteract -N<my_run_name>.xml -Op sequest/myfolder/pepxml/*.xml

Type <code>xinteract</code> for more details.  *NOTE:* it is very important
that the path to the pepxml files be given starting with the sequest soft link
so the server thinks the data is mounted under the webserver.

The full protein results are written to '<my_run_name>-prot.xml'.
=== 6. Classification Analysis

ProteinProphet run with a normal database gives an estimate of false positive rates.  We can view a protein summary with a desired cutoff:

    protein_summary.rb -c 5.0 <my_run_name>-prot.xml

Proteins above the red cutoff line have a false positive rate of less than or equal to 5%.

We can verify Bioworks probability scores by counting the number of true hits
(from normal database) compared with false hits (from inverted db) using the same score filters for both.  This command will give a protein summary and include precision and false positive rates (two different kinds):

    protein_summary.rb bioworks_filtered.xml -f bioworks_filtered_INV.xml -p -g --fpr

Type <code>protein_summary.rb</code> for more details.

The false positive rate information can also be calculated without the protein summary:

    false_positive_rate.rb bioworks_filtered.xml -f bioworks_filtered_INV.xml -p -g 

Type <code>false_positive_rate.rb</code> for more details.