README.md in parse_fasta-1.9.2 vs README.md in parse_fasta-2.0.0
- old
+ new
@@ -1,16 +1,18 @@
-# parse_fasta #
+# ParseFasta #
[![Gem Version](https://badge.fury.io/rb/parse_fasta.svg)](http://badge.fury.io/rb/parse_fasta) [![Build Status](https://travis-ci.org/mooreryan/parse_fasta.svg?branch=master)](https://travis-ci.org/mooreryan/parse_fasta) [![Coverage Status](https://coveralls.io/repos/mooreryan/parse_fasta/badge.svg)](https://coveralls.io/r/mooreryan/parse_fasta)
So you want to parse a fasta file...
## Installation ##
Add this line to your application's Gemfile:
- gem 'parse_fasta'
+```ruby
+gem 'parse_fasta'
+```
And then execute:
$ bundle
@@ -18,227 +20,52 @@
$ gem install parse_fasta
## Overview ##
-Provides nice, programmatic access to fasta and fastq files, as well
-as providing Sequence and Quality helper classes. It's more
-lightweight than BioRuby. And more fun! ;)
+Provides nice, programmatic access to fasta and fastq files. It's faster and more lightweight than BioRuby. And more fun!
## Documentation ##
Checkout
[parse_fasta docs](http://rubydoc.info/gems/parse_fasta)
for the full api documentation.
## Usage ##
-Some examples...
+Here are some examples of using ParseFasta. Don't forget to `require "parse_fasta"` at the top of your program!
-A little script to print header and length of each record.
+Print header and length of each record.
- require 'parse_fasta'
+```ruby
+ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
+ puts [rec.header, rec.seq.length].join "\t"
+end
+```
- FastaFile.open(ARGV[0]).each_record do |header, sequence|
- puts [header, sequence.length].join("\t")
- end
+You can parse fastQ files in exatcly the same way.
-And here, a script to calculate GC content:
+```ruby
+ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
+ printf "Header: %s, Sequence: %s, Description: %s, Quality: %s\n",
+ rec.header,
+ rec.seq,
+ rec.desc,
+ rec.qual
+end
+```
- FastaFile.open(ARGV[0]).each_record do |header, sequence|
- puts [header, sequence.gc].join("\t")
- end
+The `Record#desc` and `Record#qual` will be `nil` if the file you are parsing is a fastA file.
-Now we can parse fastq files as well!
-
- FastqFile.open(ARGV[0]).each_record do |head, seq, desc, qual|
- puts [header, qual.qual_scores.join(',')].join("\t")
- end
-
-What if you don't care if the input is a fastA or a fastQ? No problem!
-
- SeqFile.open(ARGV[0]).each_record do |head, seq|
- puts [header, seq].join "\t"
- end
-
-Read fasta file into a hash.
-
- seqs = FastaFile.open(ARGV[0]).to_hash
-
-## Versions ##
-
-### 1.9.2 ###
-
-Speed up fastA `each_record` and `each_record_fast`.
-
-### 1.9.1 ###
-
-Speed up fastQ `each_record` and `each_record_fast`. Courtesy of
-[Matthew Ralston](https://github.com/MatthewRalston).
-
-### 1.9.0 ###
-
-Added "fast" versions of `each_record` methods
-(`each_record_fast`). Basically, they return sequences and quality
-strings as Ruby `Sring` objects instead of aa `Sequence` or `Quality`
-objects. Also, if the sequence or quality string has spaces, they will
-be retained. If this is a problem, use the original `each_record`
-methods.
-
-### 1.8.2 ###
-
-Speed up `FastqFile#each_record`.
-
-### 1.8.1 ###
-
-An error will be raised if a fasta file has a `>` in the
-sequence. Sometimes files are not terminated with a newline
-character. If this is the case, then catting two fasta files will
-smush the first header of the second file right in with the last
-sequence of the first file. This is bad, raise an error! ;)
-
-Example
-
- >seq1
- ACTG>seq2
- ACTG
- >seq3
- ACTG
-
-This will raise `ParseFasta::SequenceFormatError`.
-
-Also, headers with lots of `>` within are fine now.
-
-### 1.8 ###
-
-Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
-`parse_fasta` doesn't check whether the seq is AA or NA, if called on
-an amino acid string, things will get weird as it will complement the
-IUPAC characters in the AA string and leave others.
-
-### 1.7.2 ###
-
-Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
-
-Some alignment fastas have spaces for easier reading. Strip these
-out. For consistency, also strips spaces from `Quality` strings. If
-there are spaces that don't match in the quality and sequence in a
-fastQ file, then things will get messed up in the FastQ file. FastQ
-shouldn't have spaces though.
-
-### 1.7 ###
-
-Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
-
-### 1.6.2 ###
-
-`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
-that don't begin with a `>`.
-
-### 1.6.1 ###
-
-Better internal handling of empty sequences -- instead of raising
-errors, pass empty sequences.
-
-### 1.6 ###
-
-Added `SeqFile` class, which accepts either fastA or fastQ files. It
-uses FastaFile and FastqFile internally. You can use this class if you
-want your scripts to accept either fastA or fastQ files.
-
-If you need the description and quality string, you should use
-FastqFile instead.
-
-### 1.5 ###
-
-Now accepts gzipped files. Huzzah!
-
-### 1.4 ###
-
-Added methods:
-
- Sequence.base_counts
- Sequence.base_frequencies
-
-### 1.3 ###
-
-Add additional functionality to `each_record` method.
-
-#### Info ####
-
-I often like to use the fasta format for other things like so
-
- >fruits
- pineapple
- pear
- peach
- >veggies
- peppers
- parsnip
- peas
-
-rather than having this in a two column file like this
-
- fruit,pineapple
- fruit,pear
- fruit,peach
- veggie,peppers
- veggie,parsnip
- veggie,peas
-
-So I added functionality to `each_record` to keep each line a record
-separate in an array. Here's an example using the above file.
-
- info = []
- FastaFile.open(f, 'r').each_record(1) do |header, lines|
- info << [header, lines]
- end
-
-Then info will contain the following arrays
-
- ['fruits', ['pineapple', 'pear', 'peach']],
- ['veggies', ['peppers', 'parsnip', 'peas']]
-
-### 1.2 ###
-
-Added `mean_qual` method to the `Quality` class.
-
-### 1.1.2 ###
-
-Dropped Ruby requirement to 1.9.3
-
-(Note, if you want to build the docs with yard and you're using
-Ruby 1.9.3, you may have to install the redcarpet gem.)
-
-### 1.1 ###
-
-Added: Fastq and Quality classes
-
-### 1.0 ###
-
-Added: Fasta and Sequence classes
-
-Removed: File monkey patch
-
-### 0.0.5 ###
-
-Last version with File monkey patch.
-
-## Benchmark ##
-
-Some quick and dirty benchmarks against `BioRuby`.
-
-### FastaFile#each_record ###
-
-You can see the test script in `benchmark.rb`.
-
- user system total real
- parse_fasta 1.920000 0.160000 2.080000 ( 2.145932)
- parse_fasta fast 1.210000 0.160000 1.370000 ( 1.377770)
- bioruby 4.330000 0.290000 4.620000 ( 4.655567)
-
-Hot dog! It's faster :)
-
-## Notes ##
-
-Only the `SeqFile` class actually checks to make sure that you passed
-in a "proper" fastA or fastQ file, so watch out.
+```ruby
+ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
+ if rec.qual
+ puts "@#{rec.header}"
+ puts rec.seq
+ puts "+#{rec.desc}"
+ puts rec.qual
+ else
+ puts ">#{rec.header}"
+ puts rec.sequence
+ end
+end
+```
\ No newline at end of file