README.md in iostreams-0.7.0 vs README.md in iostreams-0.8.0
- old
+ new
@@ -1,40 +1,146 @@
-# iostreams
+# iostreams [![Gem Version](https://badge.fury.io/rb/iostreams.svg)](http://badge.fury.io/rb/iostreams) [![Build Status](https://secure.travis-ci.org/rocketjob/iostreams.png?branch=master)](http://travis-ci.org/rocketjob/iostreams) ![](http://ruby-gem-downloads-badge.herokuapp.com/iostreams?type=total)
-Ruby Input and Output streaming with support for Zip, Gzip, and Encryption.
+Ruby Input and Output streaming for Ruby
-## Status
+## Project Status
-Alpha - Feedback on the API is welcome. API will change.
+Beta - Feedback on the API is welcome. API is subject to change.
+## Features
+
+Currently streaming classes are available for:
+
+* Zip
+* Gzip
+* Encryption using [Symmetric Encryption](https://github.com/reidmorrison/symmetric-encryption)
+
## Introduction
-`iostreams` allows files to be read and written in a streaming fashion to reduce
-memory overhead. It supports reading and writing of Zip, GZip and encrypted files.
+If all files were small, they could just be loaded into memory in their entirety. With the
+advent of very large files, often into several Gigabytes, or even Terabytes in size, loading
+them into memory is not feasible.
+
+In linux it is common to use pipes to stream data between processes.
+For example:
-These streams can be chained together just like piped programs in linux.
-This allows one stream to read the file, another stream to decrypt the file and
-then a third stream to decompress the result.
+```
+# Count the number of lines in a file that has been compressed with gzip
+cat abc.gz | gunzip -c | wc -l
+```
+For large files it is critical to be able to read and write these files as streams. Ruby has support
+for reading and writing files using streams, but has no built-in way of passing one stream through
+another to support for example compressing the data, encrypting it and then finally writing the result
+to a file. Several streaming implementations exist for languages such as `C++` and `Java` to chain
+together several streams, `iostreams` attempts to offer similar features for Ruby.
+
+```ruby
+# Read a compressed file:
+IOStreams.reader('hello.gz') do |reader|
+ data = reader.read(1024)
+ puts "Read: #{data}"
+end
+```
+
+The true power of streams is shown when many streams are chained together to achieve the end
+result, without holding the entire file in memory, or ideally without needing to create
+any temporary files to process the stream.
+
+```ruby
+# Create a file that is compressed with GZip and then encrypted with Symmetric Encryption:
+IOStreams.writer('hello.gz.enc') do |writer|
+ writer.write('Hello World')
+ writer.write('and some more')
+end
+```
+
+The power of the above example applies when the data being written starts to exceed hundreds of megabytes,
+or even gigabytes.
+
+By looking at the file name supplied above, `iostreams` is able to determine which streams to apply
+to the data being read or written. For example:
+* `hello.zip` => Compressed using Zip
+* `hello.zip.enc` => Compressed using Zip and then encrypted using Symmetric Encryption
+* `hello.gz.enc` => Compressed using GZip and then encrypted using Symmetric Encryption
+
The objective is that all of these streaming processes are performed used streaming
-so that only portions of the file are loaded into memory at a time.
+so that only the current portion of the file is loaded into memory as it moves
+through the entire file.
Where possible each stream never goes to disk, which for example could expose
un-encrypted data.
+## Architecture
+
+Streams are chained together by passing the
+
+Every Reader or Writer is invoked by calling its `.open` method and passing the block
+that must be invoked for the duration of that stream.
+
+The above block is passed the stream that needs to be encoded/decoded using that
+Reader or Writer every time the `#read` or `#write` method is called on it.
+
+### Readers
+
+Each reader stream must implement: `#read`
+
+### Writer
+
+Each writer stream must implement: `#write`
+
+### Optional methods
+
+The following methods on the stream are useful for both Readers and Writers
+
+### close
+
+Close the stream, and cleanup any buffers, etc.
+
+### closed?
+
+Has the stream already been closed? Useful, when child streams have already closed the stream
+so that `#close` is not called more than once on a stream.
+
## Notes
* Due to the nature of Zip, both its Reader and Writer methods will create
a temp file when reading from or writing to a stream.
Recommended to use Gzip over Zip since it can be streamed.
+* Zip becomes exponentially slower with very large files, especially files
+ that exceed 4GB when uncompressed. Highly recommend using GZip for large files.
-## Meta
+## Future
-* Code: `git clone git://github.com/rocketjob/iostreams.git`
-* Home: <https://github.com/rocketjob/iostreams>
-* Issues: <http://github.com/rocketjob/iostreams/issues>
-* Gems: <http://rubygems.org/gems/iostreams>
+Below are just some of the streams that are envisaged for `iostreams`:
+* PGP reader and write
+ * Read and write PGP encrypted files
+* CSV
+ * Read and write CSV data, reading data back as Arrays and writing Arrays as CSV text
+* Delimited Text Stream
+ * Autodetect Windows/Linux line endings and return a line at a time
+* MongoFS
+ * Read and write file streams to and from MongoFS
+
+For example:
+```ruby
+# Read a CSV file, delimited with Windows line endings, compressed with GZip, and encrypted with PGP:
+IOStreams.reader('hello.csv.gz.pgp', [:csv, :delimited, :gz, :pgp]) do |reader|
+ # Returns an Array at a time
+ reader.each do |row|
+ puts "Read: #{row.inspect}"
+ end
+end
+```
-This project uses [Semantic Versioning](http://semver.org/).
+To completely implement io streaming for Ruby will take a lot more input and thoughts
+from the Ruby community. This gem represents a starting point to get the discussion going.
+
+By keeping this gem in Beta state and not going V1, we can change the interface as needed
+to implement community feedback.
+
+## Versioning
+
+This project adheres to [Semantic Versioning](http://semver.org/).
## Author
[Reid Morrison](https://github.com/reidmorrison)