README.md in parse_fasta-1.8.1 vs README.md in parse_fasta-1.8.2

- old
+ new

@@ -64,18 +64,15 @@ seqs = FastaFile.open(ARGV[0]).to_hash ## Versions ## -### 1.8 ### +### 1.8.2 ### -Add `Sequence#rev_comp`. It can handle IUPAC characters. Since -`parse_fasta` doesn't check whether the seq is AA or NA, if called on -an amino acid string, things will get weird as it will complement the -IUPAC characters in the AA string and leave others. +Speed up `FastqFile#each_record`. -#### 1.8.1 #### +### 1.8.1 ### An error will be raised if a fasta file has a `>` in the sequence. Sometimes files are not terminated with a newline character. If this is the case, then catting two fasta files will smush the first header of the second file right in with the last @@ -91,44 +88,50 @@ This will raise `ParseFasta::SequenceFormatError`. Also, headers with lots of `>` within are fine now. +### 1.8 ### -### 1.7 ### +Add `Sequence#rev_comp`. It can handle IUPAC characters. Since +`parse_fasta` doesn't check whether the seq is AA or NA, if called on +an amino acid string, things will get weird as it will complement the +IUPAC characters in the AA string and leave others. -Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`. +### 1.7.2 ### -#### 1.7.2 #### - Strip spaces (not all whitespace) from `Sequence` and `Quality` strings. Some alignment fastas have spaces for easier reading. Strip these out. For consistency, also strips spaces from `Quality` strings. If there are spaces that don't match in the quality and sequence in a fastQ file, then things will get messed up in the FastQ file. FastQ shouldn't have spaces though. -### 1.6 ### +### 1.7 ### -Added `SeqFile` class, which accepts either fastA or fastQ files. It -uses FastaFile and FastqFile internally. You can use this class if you -want your scripts to accept either fastA or fastQ files. +Add `SeqFile#to_hash`, `FastaFile#to_hash` and `FastqFile#to_hash`. -If you need the description and quality string, you should use -FastqFile instead. +### 1.6.2 ### -#### 1.6.1 #### +`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files +that don't begin with a `>`. +### 1.6.1 ### + Better internal handling of empty sequences -- instead of raising errors, pass empty sequences. -#### 1.6.2 #### +### 1.6 ### -`FastaFile::open` now raises a `ParseFasta::DataFormatError` when passed files -that don't begin with a `>`. +Added `SeqFile` class, which accepts either fastA or fastQ files. It +uses FastaFile and FastqFile internally. You can use this class if you +want your scripts to accept either fastA or fastQ files. +If you need the description and quality string, you should use +FastqFile instead. + ### 1.5 ### Now accepts gzipped files. Huzzah! ### 1.4 ### @@ -202,21 +205,20 @@ Last version with File monkey patch. ## Benchmark ## -Perhaps this isn't exactly fair since `BioRuby` is a big module with -lots of features and error checking, whereas `parse_fasta` is meant to -be lightweight and easy to use for my own research. Oh well ;) +**NOTE**: These benchmarks are against an older version of + `parse_fasta`. +Some quick and dirty benchmarks against `BioRuby`. + ### FastaFile#each_record ### -You're probably wondering...How does it compare to BioRuby in some -super accurate benchmarking tests? Lucky for you, I calculated -sequence length for each fasta record with both the `each_record` -method from this gem and using the `FastaFormat` class from -BioRuby. You can see the test script in `benchmark.rb`. +Calculating sequence length length for each fasta record with both the +`each_record` method from this gem and using the `FastaFormat` class +from BioRuby. You can see the test script in `benchmark.rb`. The test file contained 2,009,897 illumina reads and the file size was 1.1 gigabytes. Here are the results from Ruby's `Benchmark` class: user system total real @@ -253,22 +255,12 @@ this_gc 3 0.120000 0.000000 0.120000 ( 0.185434) bioruby_gc 3 8.060000 0.020000 8.080000 ( 8.659071) Nice! -Troll: "But Ryan, when will you find the GC of an 8,000,000 base -sequence?" +Troll: "When will you find the GC of an 8,000,000 base sequence?" Me: "Step off, troll!" - -## Test suite & docs ## - -For a good time, you could clone this repo and run the test suite with -rspec! Or if you just don't trust that it works like it should. The -specs probably need a little clean up...so fork it and clean it up ;) - -Same with the docs. Clone the repo and build them yourself with `yard` -if you are in need of some excitement. ## Notes ## Only the `SeqFile` class actually checks to make sure that you passed in a "proper" fastA or fastQ file, so watch out.