README.md in beso-0.1.0 vs README.md in beso-0.2.0

- old
+ new

@@ -1,8 +1,8 @@ # Beso -TODO: Write a gem description +Sync your historical events to KISSmetrics via CSV. ## Installation Add this line to your application's Gemfile: @@ -14,12 +14,116 @@ Or install it yourself as: $ gem install beso +Next, create an initializer for **beso**. There, you can set up your S3 bucket information and define your +serialization jobs: + +``` rb +# config/initializers/beso.rb +Beso.configure do |config| + + # First, set up your S3 credentials: + + config.access_key = '[your AWS access key]' + config.secret_key = '[your AWS secret key]' + config.bucket_name = 'beso' # recommended, but you can really call this anything + + # Then, define some jobs: + + config.job :message_delivered, :table => :messages do + identity { |message| message.user.id } + timestamp :created_at + prop( :message_id ) { |message| message.id } + end + + config.job :signed_up, :table => :users do + identity { |user| user.id } + timestamp :created_at + prop( :age ){ |user| user.age } + end +end +``` + ## Usage -TODO: Write usage instructions here +### Defining Jobs + +KISSmetrics events have three properties that *must* be defined: + +- Identity +- Timestamp +- Event + +The **Identity** field is some sort of identifier for your user. Even if your job +is working on another table, you should probably have a way to tie the event back +to the user who caused it. Here, you can provide one of three things: + +- A proc that should receive the record and return the identity value +- A symbol that will get passed to `record.send` +- A literal (You'll probably want to do one of the other two options) + +The **Timestamp** field is slightly different in that it should always be part of +the table you are querying, not the user. This symbol will get sent to each record, +but will also be used in determining the query for the job. + +The **Event** name is inferred by the name of your job. It will be provided and +formatted for you. + +On top of this, you can specify up to **ten** custom properties. Like `identity`, +you can pass either a proc, a symbol, or a literal: + +``` rb +config.job :signed_up, :table => :users do + identity :id + timestamp :created_at + prop( :age ){ |user| user.age } + prop( :new_user, true ) +end +``` + +### Using the rake task + +By requiring `beso`, you get the `beso:run` rake task. This task will do the following: + +- Connect to your S3 bucket +- Pull down 'beso.yml' if it exists + +> `beso.yml` contains the timestamp of the last record queried for each job. +> If it doesn't exist, it will be created after the first run. + +- Iterate over the jobs defined in the initializer you set up +- Create a CSV representation of all records newer than the timestamp found in `beso.yml` +- Upload each CSV to your S3 bucket with the event name and timestamp +- Update `beso.yml` with the latest timestamp for each job + +The rake task is designed to be used via cron. For the moment, KISSmetrics will only process +one CSV file per hour, so it makes sense that this task should be run at an interval of hours +equal to the number of jobs you have defined. For example, if you have defined 4 jobs, this +task should run once every 4 hours. + +The rake task also accepts two options that you can set via environment variables. + +`BESO_PREFIX` will change the prefix of the CSV filenames that get uploaded to S3. The default +is 'beso', so it is recommended you use that when telling KISSmetrics what your filename +pattern is. You can then adjust the prefix if you would like to upload CSV's that you don't +want KISSmetrics to recognize. + +`BESO_ORIGIN` will change the behavior of the task when there is no previous timestamp +defined for a job in `beso.yml`. + +> By default, the task will use the last timestamp in your table (which effectively +> means the first run of this task will do nothing). This is because KISSmetrics +> charges you for every event you log through their system, so you probably don't +> want to upload 8 months worth of events straight away. + +This option will accept two values to alter the behavior: + +- `now` will set the first run timestamp to now, which will obviously not create any events. +- `first` will set the first run timestamp to the first timestamp in each table. Use this with + `BESO_PREFIX` if you want to dump an entire table's worth of events to S3 without having + KISSmetrics process them. ## Contributing 1. Fork it 2. Create your feature branch (`git checkout -b my-new-feature`)