README.md in parse_fasta-1.8.1 vs README.md in parse_fasta-1.8.2
- old
+ new
@@ -64,18 +64,15 @@
seqs = FastaFile.open(ARGV[0]).to_hash
## Versions ##
-### 1.8 ###
+### 1.8.2 ###
-Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
-`parse_fasta` doesn't check whether the seq is AA or NA, if called on
-an amino acid string, things will get weird as it will complement the
-IUPAC characters in the AA string and leave others.
+Speed up `FastqFile#each_record`.
-#### 1.8.1 ####
+### 1.8.1 ###
An error will be raised if a fasta file has a `>` in the
sequence. Sometimes files are not terminated with a newline
character. If this is the case, then catting two fasta files will
smush the first header of the second file right in with the last
@@ -91,44 +88,50 @@
This will raise `ParseFasta::SequenceFormatError`.
Also, headers with lots of `>` within are fine now.
+### 1.8 ###
-### 1.7 ###
+Add `Sequence#rev_comp`. It can handle IUPAC characters. Since
+`parse_fasta` doesn't check whether the seq is AA or NA, if called on
+an amino acid string, things will get weird as it will complement the
+IUPAC characters in the AA string and leave others.
-Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
+### 1.7.2 ###
-#### 1.7.2 ####
-
Strip spaces (not all whitespace) from `Sequence` and `Quality` strings.
Some alignment fastas have spaces for easier reading. Strip these
out. For consistency, also strips spaces from `Quality` strings. If
there are spaces that don't match in the quality and sequence in a
fastQ file, then things will get messed up in the FastQ file. FastQ
shouldn't have spaces though.
-### 1.6 ###
+### 1.7 ###
-Added `SeqFile` class, which accepts either fastA or fastQ files. It
-uses FastaFile and FastqFile internally. You can use this class if you
-want your scripts to accept either fastA or fastQ files.
+Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`.
-If you need the description and quality string, you should use
-FastqFile instead.
+### 1.6.2 ###
-#### 1.6.1 ####
+`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
+that don't begin with a `>`.
+### 1.6.1 ###
+
Better internal handling of empty sequences -- instead of raising
errors, pass empty sequences.
-#### 1.6.2 ####
+### 1.6 ###
-`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files
-that don't begin with a `>`.
+Added `SeqFile` class, which accepts either fastA or fastQ files. It
+uses FastaFile and FastqFile internally. You can use this class if you
+want your scripts to accept either fastA or fastQ files.
+If you need the description and quality string, you should use
+FastqFile instead.
+
### 1.5 ###
Now accepts gzipped files. Huzzah!
### 1.4 ###
@@ -202,21 +205,20 @@
Last version with File monkey patch.
## Benchmark ##
-Perhaps this isn't exactly fair since `BioRuby` is a big module with
-lots of features and error checking, whereas `parse_fasta` is meant to
-be lightweight and easy to use for my own research. Oh well ;)
+**NOTE**: These benchmarks are against an older version of
+ `parse_fasta`.
+Some quick and dirty benchmarks against `BioRuby`.
+
### FastaFile#each_record ###
-You're probably wondering...How does it compare to BioRuby in some
-super accurate benchmarking tests? Lucky for you, I calculated
-sequence length for each fasta record with both the `each_record`
-method from this gem and using the `FastaFormat` class from
-BioRuby. You can see the test script in `benchmark.rb`.
+Calculating sequence length length for each fasta record with both the
+`each_record` method from this gem and using the `FastaFormat` class
+from BioRuby. You can see the test script in `benchmark.rb`.
The test file contained 2,009,897 illumina reads and the file size
was 1.1 gigabytes. Here are the results from Ruby's `Benchmark` class:
user system total real
@@ -253,22 +255,12 @@
this_gc 3 0.120000 0.000000 0.120000 ( 0.185434)
bioruby_gc 3 8.060000 0.020000 8.080000 ( 8.659071)
Nice!
-Troll: "But Ryan, when will you find the GC of an 8,000,000 base
-sequence?"
+Troll: "When will you find the GC of an 8,000,000 base sequence?"
Me: "Step off, troll!"
-
-## Test suite & docs ##
-
-For a good time, you could clone this repo and run the test suite with
-rspec! Or if you just don't trust that it works like it should. The
-specs probably need a little clean up...so fork it and clean it up ;)
-
-Same with the docs. Clone the repo and build them yourself with `yard`
-if you are in need of some excitement.
## Notes ##
Only the `SeqFile` class actually checks to make sure that you passed
in a "proper" fastA or fastQ file, so watch out.