= Two-DB: Real and Decoy Database Search === 1. Create file: to_sequest.sld To run sequest, first create a to_sequest.sld file that points sequest to your raw data files (you can use it to run sequest and in the multi-consensus view). === 2. Run Sequest with a Normal and an Inverse Database If you don't already have one, here's how to make an inverse database: fasta_mod.rb invert This will create a file with the trailing tag '_INV.fasta'. Just type fasta_mod.rb for more details. === 3. Export a Bioworks XML File for each Database 1. Load your sequest results in MultiConsensus results (even if you only have one run) File -> 'Load MultiConsensus Results' 2. Click 'yes' to calculate peptide probabilities [optional] 3. Click 'yes' to view results without filtering. 4. Right click on the data and 'Export' to XML (name the file bioworks.xml). This file is fed into ProteinProphet. 5. Filter your data on the parameters you prefer and export. 6. Do the same thing (only need to do steps 1,2,5) for the inverted database. Make sure to filter on these same parameters and export these results, too. (To expirement with different parameters, open two Bioworks windows and filter the normal and inverse databases until satisfied). === 4. Convert to pepXML bioworks_to_pepxml.rb bioworks.xml -p /cygdrive/c/Xcalibur/params/myparams.params -m /cygdrive/c/Xcalibur/data/mydatafolder By default, the pepxml files will be written to a subdirectory called 'pepxml'. Type bioworks_to_pepxml.rb for more details. === 5. Run Protein Prophet ProteinProphet must be run in a particular directory. If one does not exist, create an alias (in ~/.bashrc file) to simplify getting there: alias isb="cd /cygdrive/c/Inetpub/wwwroot/ISB/data". Then, to get to the isb folder, just type: isb # -> takes you to /cygdrive/c/Inetpub/wwwroot/ISB/data Then, run protein prophet: xinteract -N.xml -Op sequest/myfolder/pepxml/*.xml Type xinteract for more details. *NOTE:* it is very important that the path to the pepxml files be given starting with the sequest soft link so the server thinks the data is mounted under the webserver. The full protein results are written to '-prot.xml'. === 6. Classification Analysis ProteinProphet run with a normal database gives an estimate of false positive rates. We can view a protein summary with a desired cutoff: protein_summary.rb -c 5.0 -prot.xml Proteins above the red cutoff line have a false positive rate of less than or equal to 5%. We can verify Bioworks probability scores by counting the number of true hits (from normal database) compared with false hits (from inverted db) using the same score filters for both. This command will give a protein summary and include precision and false positive rates (two different kinds): protein_summary.rb bioworks_filtered.xml -f bioworks_filtered_INV.xml -p -g --fpr Type protein_summary.rb for more details. The false positive rate information can also be calculated without the protein summary: false_positive_rate.rb bioworks_filtered.xml -f bioworks_filtered_INV.xml -p -g Type false_positive_rate.rb for more details.