README.md in parse_fasta-1.9.2 vs README.md in parse_fasta-2.0.0

- old
+ new

@@ -1,16 +1,18 @@ -# parse_fasta # +# ParseFasta # [![Gem Version](https://badge.fury.io/rb/parse_fasta.svg)](http://badge.fury.io/rb/parse_fasta) [![Build Status](https://travis-ci.org/mooreryan/parse_fasta.svg?branch=master)](https://travis-ci.org/mooreryan/parse_fasta) [![Coverage Status](https://coveralls.io/repos/mooreryan/parse_fasta/badge.svg)](https://coveralls.io/r/mooreryan/parse_fasta) So you want to parse a fasta file... ## Installation ## Add this line to your application's Gemfile: - gem 'parse_fasta' +```ruby +gem 'parse_fasta' +``` And then execute: $ bundle @@ -18,227 +20,52 @@ $ gem install parse_fasta ## Overview ## -Provides nice, programmatic access to fasta and fastq files, as well -as providing Sequence and Quality helper classes. It's more -lightweight than BioRuby. And more fun! ;) +Provides nice, programmatic access to fasta and fastq files. It's faster and more lightweight than BioRuby. And more fun! ## Documentation ## Checkout [parse_fasta docs](http://rubydoc.info/gems/parse_fasta) for the full api documentation. ## Usage ## -Some examples... +Here are some examples of using ParseFasta. Don't forget to `require "parse_fasta"` at the top of your program! -A little script to print header and length of each record. +Print header and length of each record. - require 'parse_fasta' +```ruby +ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec| + puts [rec.header, rec.seq.length].join "\t" +end +``` - FastaFile.open(ARGV[0]).each_record do |header, sequence| - puts [header, sequence.length].join("\t") - end +You can parse fastQ files in exatcly the same way. -And here, a script to calculate GC content: +```ruby +ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec| + printf "Header: %s, Sequence: %s, Description: %s, Quality: %s\n", + rec.header, + rec.seq, + rec.desc, + rec.qual +end +``` - FastaFile.open(ARGV[0]).each_record do |header, sequence| - puts [header, sequence.gc].join("\t") - end +The `Record#desc` and `Record#qual` will be `nil` if the file you are parsing is a fastA file. -Now we can parse fastq files as well! - - FastqFile.open(ARGV[0]).each_record do |head, seq, desc, qual| - puts [header, qual.qual_scores.join(',')].join("\t") - end - -What if you don't care if the input is a fastA or a fastQ? No problem! - - SeqFile.open(ARGV[0]).each_record do |head, seq| - puts [header, seq].join "\t" - end - -Read fasta file into a hash. - - seqs = FastaFile.open(ARGV[0]).to_hash - -## Versions ## - -### 1.9.2 ### - -Speed up fastA `each_record` and `each_record_fast`. - -### 1.9.1 ### - -Speed up fastQ `each_record` and `each_record_fast`. Courtesy of -[Matthew Ralston](https://github.com/MatthewRalston). - -### 1.9.0 ### - -Added "fast" versions of `each_record` methods -(`each_record_fast`). Basically, they return sequences and quality -strings as Ruby `Sring` objects instead of aa `Sequence` or `Quality` -objects. Also, if the sequence or quality string has spaces, they will -be retained. If this is a problem, use the original `each_record` -methods. - -### 1.8.2 ### - -Speed up `FastqFile#each_record`. - -### 1.8.1 ### - -An error will be raised if a fasta file has a `>` in the -sequence. Sometimes files are not terminated with a newline -character. If this is the case, then catting two fasta files will -smush the first header of the second file right in with the last -sequence of the first file. This is bad, raise an error! ;) - -Example - - >seq1 - ACTG>seq2 - ACTG - >seq3 - ACTG - -This will raise `ParseFasta::SequenceFormatError`. - -Also, headers with lots of `>` within are fine now. - -### 1.8 ### - -Add `Sequence#rev_comp`. It can handle IUPAC characters. Since -`parse_fasta` doesn't check whether the seq is AA or NA, if called on -an amino acid string, things will get weird as it will complement the -IUPAC characters in the AA string and leave others. - -### 1.7.2 ### - -Strip spaces (not all whitespace) from `Sequence` and `Quality` strings. - -Some alignment fastas have spaces for easier reading. Strip these -out. For consistency, also strips spaces from `Quality` strings. If -there are spaces that don't match in the quality and sequence in a -fastQ file, then things will get messed up in the FastQ file. FastQ -shouldn't have spaces though. - -### 1.7 ### - -Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`. - -### 1.6.2 ### - -`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files -that don't begin with a `>`. - -### 1.6.1 ### - -Better internal handling of empty sequences -- instead of raising -errors, pass empty sequences. - -### 1.6 ### - -Added `SeqFile` class, which accepts either fastA or fastQ files. It -uses FastaFile and FastqFile internally. You can use this class if you -want your scripts to accept either fastA or fastQ files. - -If you need the description and quality string, you should use -FastqFile instead. - -### 1.5 ### - -Now accepts gzipped files. Huzzah! - -### 1.4 ### - -Added methods: - - Sequence.base_counts - Sequence.base_frequencies - -### 1.3 ### - -Add additional functionality to `each_record` method. - -#### Info #### - -I often like to use the fasta format for other things like so - - >fruits - pineapple - pear - peach - >veggies - peppers - parsnip - peas - -rather than having this in a two column file like this - - fruit,pineapple - fruit,pear - fruit,peach - veggie,peppers - veggie,parsnip - veggie,peas - -So I added functionality to `each_record` to keep each line a record -separate in an array. Here's an example using the above file. - - info = [] - FastaFile.open(f, 'r').each_record(1) do |header, lines| - info << [header, lines] - end - -Then info will contain the following arrays - - ['fruits', ['pineapple', 'pear', 'peach']], - ['veggies', ['peppers', 'parsnip', 'peas']] - -### 1.2 ### - -Added `mean_qual` method to the `Quality` class. - -### 1.1.2 ### - -Dropped Ruby requirement to 1.9.3 - -(Note, if you want to build the docs with yard and you're using -Ruby 1.9.3, you may have to install the redcarpet gem.) - -### 1.1 ### - -Added: Fastq and Quality classes - -### 1.0 ### - -Added: Fasta and Sequence classes - -Removed: File monkey patch - -### 0.0.5 ### - -Last version with File monkey patch. - -## Benchmark ## - -Some quick and dirty benchmarks against `BioRuby`. - -### FastaFile#each_record ### - -You can see the test script in `benchmark.rb`. - - user system total real - parse_fasta 1.920000 0.160000 2.080000 ( 2.145932) - parse_fasta fast 1.210000 0.160000 1.370000 ( 1.377770) - bioruby 4.330000 0.290000 4.620000 ( 4.655567) - -Hot dog! It's faster :) - -## Notes ## - -Only the `SeqFile` class actually checks to make sure that you passed -in a "proper" fastA or fastQ file, so watch out. +```ruby +ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec| + if rec.qual + puts "@#{rec.header}" + puts rec.seq + puts "+#{rec.desc}" + puts rec.qual + else + puts ">#{rec.header}" + puts rec.sequence + end +end +``` \ No newline at end of file