README.md in ETL-0.0.1 vs README.md in ETL-1.0.0.rc

- old
+ new

@@ -1,8 +1,8 @@
 # ETL
 
-TODO: Write a gem description
+Extract, transform, and load data with ruby!
 
 ## Installation
 
 Add this line to your application's Gemfile:
 
@@ -14,16 +14,372 @@
 
 Or install it yourself as:
 
     $ gem install ETL
 
-## Usage
+## ETL Dependencies
 
-TODO: Write usage instructions here
+ETL depends on having a database connection object that __must__ respond
+to `#query`. The [mysql2](https://github.com/brianmario/mysql2) gem is a good option.
+You can also proxy another library using Ruby's `SimpleDelegator` and add a `#query`
+method if need be.
 
+The gem comes bundled with a default logger. If you'd like to write your own
+just make sure that it implements `#debug` and `#info`. For more information
+on what is logged and when, view the [logger details](#logger-details).
+
+### Basic ETL
+
+Assume that we have a database connection represented by `connection`.
+
+To run a basic ETL that is composed of sequential SQL statements, start by
+creating a new ETL instance:
+
+```ruby
+etl = ETL.new(description: "a description of what this ETL does",
+              connection:  connection)
+```
+which can then be configured:
+
+```ruby
+etl.config do |etl|
+  etl.ensure_destination do |etl|
+    # For most ETLs you may want to ensure that the destination exists, so the
+    # #ensure_destination block is ideally suited to fulfill this requirement.
+    #
+    # By way of example:
+    #
+    etl.query %[
+      CREATE TABLE IF NOT EXISTS some_database.some_destination_table (
+        user_id INT UNSIGNED NOT NULL,
+        created_date DATE NOT NULL,
+        total_amount INT SIGNED NOT NULL,
+        message VARCHAR(100) DEFAULT NULL,
+        PRIMARY KEY (user_id),
+        KEY (user_id, created_date),
+        KEY (created_date)
+      )]
+  end
+
+  etl.before_etl do |etl|
+    # All pre-ETL work is performed in this block.
+    #
+    # This can be thought of as a before-ETL hook that will fire only once. When
+    # you are not leveraging the ETL iteration capabilities, the value of this
+    # block vs the #etl block is not very clear. We will see how and when to
+    # leverage this block effectively when we introduce iteration.
+    #
+    # As an example, let's say we want to get rid of all entries that have an
+    # amount less than zero before moving on to our actual etl:
+    #
+    etl.query %[DELETE FROM some_database.some_source_table WHERE amount < 0]
+  end
+
+  etl.etl do |etl|
+    # Here is where the magic happens! This block contains the main ETL
+    # operation.
+    #
+    # For example:
+    #
+    etl.query %[
+      REPLACE INTO some_database.some_destination_table
+      SELECT
+          user_id
+        , DATE(created_at) AS created_date
+        , SUM(amount) AS total_amount
+      FROM
+        some_database.some_source_table sst
+      GROUP BY
+          sst.user_id
+        , sst.DATE(created_at)]
+  end
+
+  etl.after_etl do |etl|
+    # All post-ETL work is performed in this block.
+    #
+    # Again, to finish up with an example:
+    #
+    etl.query %[
+      UPDATE some_database.some_destination_table
+      SET message = "WOW"
+      WHERE total_amount > 100]
+  end
+end
+```
+
+At this point it is possible to run the ETL instance via:
+
+```ruby
+etl.run
+```
+which executes `#ensure_destination`, `#before_etl`, `#etl`, and `#after_etl` in
+that order.
+
+### ETL with iteration
+
+To add in iteration, simply supply `#start`, `#step`, and `#stop` blocks. This
+is useful when dealing with large data sets or when executing queries that,
+while optimized, are still slow.
+
+Again, to kick things off:
+
+```ruby
+etl = ETL.new(description: "a description of what this ETL does",
+              connection:  connection)
+```
+
+where `connection` is the same as described above.
+
+Next we can configure the ETL:
+
+```ruby
+# assuming we have the ETL instance from above
+etl.config do |etl|
+  etl.ensure_destination do |etl|
+    # For most ETLs you may want to ensure that the destination exists, so the
+    # #ensure_destination block is ideally suited to fulfill this requirement.
+    #
+    # By way of example:
+    #
+    etl.query %[
+      CREATE TABLE IF NOT EXISTS some_database.some_destination_table (
+        user_id INT UNSIGNED NOT NULL,
+        created_date DATE NOT NULL,
+        total_amount INT SIGNED NOT NULL,
+        message VARCHAR(100) DEFAULT NULL,
+        PRIMARY KEY (user_id),
+        KEY (user_id, created_date),
+        KEY (created_date)
+      )]
+  end
+
+  etl.before_etl do |etl|
+    # All pre-ETL work is performed in this block.
+    #
+    # Now that we are leveraging iteration the #before_etl block becomes
+    # more useful as a way to execute an operation once before we begin
+    # our iteration.
+    #
+    # As an example, let's say we want to get rid of all entries that have an
+    # amount less than zero before moving on to our actual etl:
+    #
+    etl.query %[
+      DELETE FROM some_database.some_source_table
+      WHERE amount < 0]
+  end
+
+  etl.start do |etl|
+    # This defines where the ETL should start. This can be a flat number
+    # or date, or even SQL / other code can be executed to produce a starting
+    # value.
+    #
+    # Usually, this is the last known entry for the destination table with
+    # some sensible default if the destination does not yet contain data.
+    #
+    # As an example:
+    #
+    res = etl.query %[
+      SELECT COALESCE(MAX(created_date), '1970-01-01') AS the_max
+      FROM some_database.some_destination_table]
+
+    res.to_a.first['the_max']
+  end
+
+  etl.step do |etl|
+    # The step block defines the size of the iteration block. To iterate by
+    # ten records, the step block should be set to return 10.
+    #
+    # As an alternative example, to set the iteration to go 10,000 units
+    # at a time, the following value should be provided:
+    #
+    #   10_000 (Note: an underscore is used for readability)
+    #
+    # As an example, to iterate 7 days at a time:
+    #
+    7.days
+  end
+
+  etl.stop do |etl|
+    # The stop block defines when the iteration should halt.
+    # Again, this can be a flat value or code. Either way, one value *must* be
+    # returned.
+    #
+    # As a flat value:
+    #
+    #   1_000_000
+    #
+    # Or a date value:
+    #
+    #   Time.now.to_date
+    #
+    # Or as a code example:
+    #
+    res = etl.query %[
+      SELECT DATE(MAX(created_at)) AS the_max
+      FROM some_database.some_source_table]
+
+    res.to_a.first['the_max']
+  end
+
+  etl.etl do |etl, lbound, ubound|
+    # The etl block is the main part of the framework. Note: there are
+    # two extra args with the iterator this time around: "lbound" and "ubound"
+    #
+    # "lbound" is the lower bound of the current iteration. When iterating
+    # from 0 to 10 and stepping by 2, the lbound would equal 2 on the
+    # second iteration.
+    #
+    # "ubound" is the upper bound of the current iteration. In continuing with the
+    # example above, when iterating from 0 to 10 and stepping by 2, the ubound would
+    # equal 4 on the second iteration.
+    #
+    # These args can be used to "window" SQL queries or other code operations.
+    #
+    # As a first example, to iterate over a set of ids:
+    #
+    #   etl.query %[
+    #     REPLACE INTO some_database.some_destination_table
+    #     SELECT
+    #         user_id
+    #       , SUM(amount) AS total_amount
+    #     FROM
+    #       some_database.some_source_table sst
+    #     WHERE
+    #       sst.user_id > #{lbound} AND sst.user_id <= #{ubound}
+    #     GROUP BY
+    #       sst.user_id]
+    #
+    # To "window" a SQL query using dates:
+    #
+    etl.query %[
+      REPLACE INTO some_database.some_destination_table
+      SELECT
+          DATE(created_at)
+        , SUM(amount) AS total_amount
+      FROM
+        some_database.some_source_table sst
+      WHERE
+        -- Note the usage of quotes surrounding the lbound and ubound vars.
+        -- This is is required when dealing with dates / datetimes
+        sst.created_at >= '#{lbound}' AND sst.created_at < '#{ubound}'
+      GROUP BY
+        sst.user_id]
+
+    # Note that there is no sql sanitization here so there is *potential* for SQL
+    # injection. That being said you'll likely be using this gem in an internal
+    # tool so hopefully your co-workers are not looking to sabotage your ETL
+    # pipeline. Just be aware of this and handle it as you see fit.
+  end
+
+  etl.after_etl do |etl|
+    # All post-ETL work is performed in this block.
+    #
+    # Again, to finish up with an example:
+    #
+    etl.query %[
+      UPDATE some_database.some_destination_table
+      SET message = "WOW"
+      WHERE total_amount > 100]
+  end
+end
+```
+
+At this point it is possible to run the ETL instance via:
+
+```ruby
+etl.run
+```
+which executes `#ensure_destination`, `#before_etl`, `#etl`, and `#after_etl` in
+that order.
+
+Note that `#etl` executes `#start` and `#stop` once and memoizes the result for
+each. It then begins to iterate from what `#start` evaluated to up until what `#stop`
+evaluated to by what `#step` evaluates to.
+
+## Logger Details
+
+A logger must support two methods: `#info` and `#warn`.
+
+Both methods should accept a single hash argument. The argument will contain:
+
+- `:emitter` => a reference to the ETL instance's `self`
+- `:event_type` => a symbol that includes the type of event being logged. You
+  can use this value to derive which other data you'll have available
+
+When `:event_type` is equal to `:query_start`, you'll have the following
+available in the hash argument:
+
+- `:sql` => the sql that is going to be run
+
+These events are logged at the debug level.
+
+When `:event_type` is equal to `:query_complete`, you'll have the following
+available in the hash argument:
+
+- `:sql` => the sql that was run
+- `:runtime` => how long the query took to execute
+
+These events are logged at the info level.
+
+Following from this you could implement a simple logger as:
+
+```ruby
+class PutsLogger
+  def info data
+    @data = data
+    write!
+  end
+
+  def debug data
+    @data = data
+    write!
+  end
+
+private
+
+  def write!
+    case (event_type = @data.delete(:event_type))
+    when :query_start
+      output =  "#{@data[:emitter].description} is about to run\n"
+      output += "#{@data[:sql]}\n"
+    when :query_complete
+      output =  "#{@data[:emitter].description} executed:\n"
+      output += "#{@data[:sql]}\n"
+      output += "query completed at #{Time.now} and took #{@data[:runtime]}s\n"
+    else
+      output = "no special logging for #{event_type} event_type yet\n"
+    end
+    puts output
+    @data = nil
+  end
+end
+```
+
 ## Contributing
 
-1. Fork it
-2. Create your feature branch (`git checkout -b my-new-feature`)
-3. Commit your changes (`git commit -am 'Added some feature'`)
-4. Push to the branch (`git push origin my-new-feature`)
-5. Create new Pull Request
+If you would like to contribute code to ETL you can do so through GitHub by
+forking the repository and sending a pull request.
+
+When submitting code, please make every effort to follow existing conventions
+and style in order to keep the code as readable as possible.
+
+Before your code can be accepted into the project you must also sign the
+[Individual Contributor License Agreement (CLA)][1].
+
+
+ [1]: https://spreadsheets.google.com/spreadsheet/viewform?formkey=dDViT2xzUHAwRkI3X3k5Z0lQM091OGc6MQ&ndplr=1
+
+## License
+
+Copyright 2013 Square Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.