= Two-DB: Real and Decoy Database Search
=== 1. Create file: to_sequest.sld
To run sequest, first create a to_sequest.sld
file that points
sequest to your raw data files (you can use it to run sequest and in the
multi-consensus view).
=== 2. Run Sequest with a Normal and an Inverse Database
If you don't already have one, here's how to make an inverse database:
fasta_mod.rb invert
This will create a file with the trailing tag '_INV.fasta'. Just type
fasta_mod.rb
for more details.
=== 3. Export a Bioworks XML File for each Database
1. Load your sequest results in MultiConsensus results (even if you only have one run)
File -> 'Load MultiConsensus Results'
2. Click 'yes' to calculate peptide probabilities [optional]
3. Click 'yes' to view results without filtering.
4. Right click on the data and 'Export' to XML (name the file bioworks.xml
). This file is fed into ProteinProphet.
5. Filter your data on the parameters you prefer and export.
6. Do the same thing (only need to do steps 1,2,5) for the inverted database. Make sure to filter on these same parameters and export these results, too. (To expirement with different parameters, open two Bioworks windows and filter the normal and inverse databases until satisfied).
=== 4. Convert to pepXML
bioworks_to_pepxml.rb bioworks.xml -p /cygdrive/c/Xcalibur/params/myparams.params -m /cygdrive/c/Xcalibur/data/mydatafolder
By default, the pepxml files will be written to a subdirectory called
'pepxml'. Type bioworks_to_pepxml.rb
for more details.
=== 5. Run Protein Prophet
ProteinProphet must be run in a particular directory. If one does not exist, create an alias (in ~/.bashrc file) to simplify getting there: alias isb="cd /cygdrive/c/Inetpub/wwwroot/ISB/data"
. Then, to get to the isb folder, just type:
isb # -> takes you to /cygdrive/c/Inetpub/wwwroot/ISB/data
Then, run protein prophet:
xinteract -N.xml -Op sequest/myfolder/pepxml/*.xml
Type xinteract
for more details. *NOTE:* it is very important
that the path to the pepxml files be given starting with the sequest soft link
so the server thinks the data is mounted under the webserver.
The full protein results are written to '-prot.xml'.
=== 6. Classification Analysis
ProteinProphet run with a normal database gives an estimate of false positive rates. We can view a protein summary with a desired cutoff:
protein_summary.rb -c 5.0 -prot.xml
Proteins above the red cutoff line have a false positive rate of less than or equal to 5%.
We can verify Bioworks probability scores by counting the number of true hits
(from normal database) compared with false hits (from inverted db) using the same score filters for both. This command will give a protein summary and include precision and false positive rates (two different kinds):
protein_summary.rb bioworks_filtered.xml -f bioworks_filtered_INV.xml -p -g --fpr
Type protein_summary.rb
for more details.
The false positive rate information can also be calculated without the protein summary:
false_positive_rate.rb bioworks_filtered.xml -f bioworks_filtered_INV.xml -p -g
Type false_positive_rate.rb
for more details.