README.md in gush-0.4.1 vs README.md in gush-1.0.0
- old
+ new
@@ -1,41 +1,68 @@
# Gush [![Build Status](https://travis-ci.org/chaps-io/gush.svg?branch=master)](https://travis-ci.org/chaps-io/gush)
## [![](http://i.imgur.com/ya8Wnyl.png)](https://chaps.io) proudly made by [Chaps](https://chaps.io)
-Gush is a parallel workflow runner using only Redis as its message broker and Sidekiq for workers.
+Gush is a parallel workflow runner using only Redis as storage and [ActiveJob](http://guides.rubyonrails.org/v4.2/active_job_basics.html#introduction) for scheduling and executing jobs.
## Theory
-Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub.
+Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub to learn more about this method.
+
+## **WARNING - version notice**
+
+This README is about the `1.0.0` version, which has breaking changes compared to < 1.0.0 versions. [See here for 0.4.1 documentation](https://github.com/chaps-io/gush/blob/349c5aff0332fd14b1cb517115c26d415aa24841/README.md).
+
## Installation
-Add this line to your application's Gemfile:
+### 1. Add `gush` to Gemfile
- gem 'gush'
+```ruby
+gem 'gush', '~> 1.0.0'
+```
-And then execute:
+### 2. Create `Gushfile`
- $ bundle
+When using Gush and its CLI commands you need a `Gushfile` in the root directory.
+`Gushfile` should require all your workflows and jobs.
-Or install it yourself as:
+#### Ruby on Rails
- $ gem install gush
+For RoR it is enough to require the full environment:
-## Usage
+```ruby
+require_relative './config/environment.rb'
+```
-### Defining workflows
+and make sure your jobs and workflows are correctly loaded by adding their directories to autoload_paths, inside `config/application.rb`:
+```ruby
+config.autoload_paths += ["#{Rails.root}/app/jobs", "#{Rails.root}/app/workflows"]
+```
+
+#### Ruby
+
+Simply require any jobs and workflows manually in `Gushfile`:
+
+```ruby
+require_relative 'lib/workflows/example_workflow.rb'
+require_relative 'lib/jobs/some_job.rb'
+require_relative 'lib/jobs/some_other_job.rb'
+```
+
+
+## Example
+
The DSL for defining jobs consists of a single `run` method.
Here is a complete example of a workflow you can create:
```ruby
-# workflows/sample_workflow.rb
+# app/workflows/sample_workflow.rb
class SampleWorkflow < Gush::Workflow
def configure(url_to_fetch_from)
run FetchJob1, params: { url: url_to_fetch_from }
- run FetchJob2, params: {some_flag: true, url: 'http://url.com'}
+ run FetchJob2, params: { some_flag: true, url: 'http://url.com' }
run PersistJob1, after: FetchJob1
run PersistJob2, after: FetchJob2
run Normalize,
@@ -45,170 +72,263 @@
run Index
end
end
```
-**Hint:** For debugging purposes you can vizualize the graph using `viz` command:
+and this is how the graph will look like:
-```
-bundle exec gush viz SampleWorkflow
-```
+![SampleWorkflow](https://i.imgur.com/DFh6j51.png)
-For the Workflow above, the graph will look like this:
-![SampleWorkflow](http://i.imgur.com/SmeRRVT.png)
+## Defining workflows
+Let's start with the simplest workflow possible, consisting of a single job:
-#### Passing parameters to jobs
+```ruby
+class SimpleWorkflow < Gush::Workflow
+ def configure
+ run DownloadJob
+ end
+end
+```
-You can pass any primitive arguments into jobs while defining your workflow:
+Of course having a workflow with only a single job does not make sense, so it's time to define dependencies:
```ruby
-# app/workflows/sample_workflow.rb
-class SampleWorkflow < Gush::Workflow
+class SimpleWorkflow < Gush::Workflow
def configure
- run FetchJob1, params: { url: "http://some.com/url" }
+ run DownloadJob
+ run SaveJob, after: DownloadJob
end
end
```
-See below to learn how to access those params inside your job.
+We just told Gush to execute `SaveJob` right after `DownloadJob` finishes **successfully**.
-#### Defining jobs
+But what if your job must have multiple dependencies? That's easy, just provide an array to the `after` attribute:
-Jobs are classes inheriting from `Gush::Job`:
+```ruby
+class SimpleWorkflow < Gush::Workflow
+ def configure
+ run FirstDownloadJob
+ run SecondDownloadJob
+ run SaveJob, after: [FirstDownloadJob, SecondDownloadJob]
+ end
+end
+```
+
+Now `SaveJob` will only execute after both its parents finish without errors.
+
+With this simple syntax you can build any complex workflows you can imagine!
+
+#### Alternative way
+
+`run` method also accepts `before:` attribute to define the opposite association. So we can write the same workflow as above, but like this:
+
```ruby
-# app/jobs/fetch_job.rb
-class FetchJob < Gush::Job
- def work
- # do some fetching from remote APIs
+class SimpleWorkflow < Gush::Workflow
+ def configure
+ run FirstDownloadJob, before: SaveJob
+ run SecondDownloadJob, before: SaveJob
- params #=> {url: "http://some.com/url"}
+ run SaveJob
end
end
```
-`params` method is a hash containing your (optional) parameters passed to `run` method in the workflow.
+You can use whatever way you find more readable or even both at once :)
-#### Passing arguments to workflows
+### Passing arguments to workflows
Workflows can accept any primitive arguments in their constructor, which then will be available in your
`configure` method.
-Here's an example of a workflow responsible for publishing a book:
+Let's assume we are writing a book publishing workflow which needs to know where the PDF of the book is and under what ISBN it will be released:
```ruby
-# app/workflows/sample_workflow.rb
class PublishBookWorkflow < Gush::Workflow
def configure(url, isbn)
run FetchBook, params: { url: url }
- run PublishBook, params: { book_isbn: isbn }
+ run PublishBook, params: { book_isbn: isbn }, after: FetchBook
end
end
```
and then create your workflow with those arguments:
```ruby
-PublishBookWorkflow.new("http://url.com/book.pdf", "978-0470081204")
+PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
```
+and that's basically it for defining workflows, see below on how to define jobs:
-### Running workflows
+## Defining jobs
-Now that we have defined our workflow we can use it:
+The simplest job is a class inheriting from `Gush::Job` and responding to `perform` method. Much like any other ActiveJob class.
-#### 1. Initialize and save it
+```ruby
+class FetchBook < Gush::Job
+ def perform
+ # do some fetching from remote APIs
+ end
+end
+```
+But what about those params we passed in the previous step?
+
+## Passing parameters into jobs
+
+To do that, simply provide a `params:` attribute with a hash of parameters you'd like to have available inside the `perform` method of the job.
+
+So, inside workflow:
+
```ruby
-flow = SampleWorkflow.new(optional, arguments)
-flow.save # saves workflow and its jobs to Redis
+(...)
+run FetchBook, params: {url: "http://url.com/book.pdf"}
+(...)
```
-**or:** you can also use a shortcut:
+and within the job we can access them like this:
```ruby
-flow = SampleWorkflow.create(optional, arguments)
+class FetchBook < Gush::Job
+ def perform
+ # you can access `params` method here, for example:
+
+ params #=> {url: "http://url.com/book.pdf"}
+ end
+end
```
-#### 2. Start workflow
+## Executing workflows
-First you need to start Sidekiq workers:
+Now that we have defined our workflow and its jobs, we can use it:
+### 1. Start background worker process
+
+**Important**: The command to start background workers depends on the backend you chose for ActiveJob.
+For example, in case of Sidekiq this would be:
+
```
-bundle exec gush workers
+bundle exec sidekiq -q gush
```
-and then start your workflow:
+**[Click here to see backends section in official ActiveJob documentation about configuring backends](http://guides.rubyonrails.org/v4.2/active_job_basics.html#backends)**
+**Hint**: gush uses `gush` queue name by default. Keep that in mind, because some backends (like Sidekiq) will only run jobs from explicitly stated queues.
+
+
+### 2. Create the workflow instance
+
```ruby
+flow = PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
+```
+
+### 3. Start the workflow
+
+```ruby
flow.start!
```
-Now Gush will start processing jobs in background using Sidekiq
-in the order defined in `configure` method inside Workflow.
+Now Gush will start processing jobs in the background using ActiveJob and your chosen backend.
+### 4. Monitor its progress:
+
+```ruby
+flow.reload
+flow.status
+#=> :running|:finished|:failed
+```
+
+`reload` is needed to see the latest status, since workflows are updated asynchronously.
+
+## Advanced features
+
### Pipelining
-Gush offers a useful feature which lets you pass results of a job to its dependencies, so they can act accordingly.
+Gush offers a useful tool to pass results of a job to its dependencies, so they can act differently.
**Example:**
Let's assume you have two jobs, `DownloadVideo`, `EncodeVideo`.
-The latter needs to know where the first one downloaded the file to be able to open it.
+The latter needs to know where the first one saved the file to be able to open it.
```ruby
class DownloadVideo < Gush::Job
- def work
+ def perform
downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")
output(downloader.file_path)
end
end
```
-`output` method is Gush's way of saying: "I want to pass this down to my descendants".
+`output` method is used to ouput data from the job to all dependant jobs.
-Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload down the (pipe)line:
+Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload inside it:
```ruby
class EncodeVideo < Gush::Job
- def work
- video_path = payloads["DownloadVideo"]
+ def perform
+ video_path = payloads.first[:output]
end
end
```
-`payloads` is a hash containing outputs from all parent jobs, where job class names are the keys.
+`payloads` is an array containing outputs from all ancestor jobs. So for our `EncodeVide` job from above, the array will look like:
-**Note:** `payloads` will only contain outputs of the job's ancestors. So if job `A` depends on `B` and `C`,
-the `payloads` hash will look like this:
```ruby
-{
- "B" => (...),
- "C" => (...)
-}
+[
+ {
+ id: "DownloadVideo-41bfb730-b49f-42ac-a808-156327989294" # unique id of the ancestor job
+ class: "DownloadVideo",
+ output: "https://s3.amazonaws.com/somebucket/downloaded-file.mp4" #the payload returned by DownloadVideo job using `output()` method
+ }
+]
```
+**Note:** Keep in mind that payloads can only contain data which **can be serialized as JSON**, because that's how Gush stores them internally.
-### Checking status:
+### Dynamic workflows
-#### In Ruby:
+There might be a case when you have to construct the workflow dynamically depending on the input.
+As an example, let's write a workflow which accepts an array of users and has to send an email to each one. Additionally after it sends the e-mail to every user, it also has to notify the admin about finishing.
+
+
```ruby
-flow.reload
-flow.status
-#=> :running|:finished|:failed
+
+class NotifyWorkflow < Gush::Workflow
+ def configure(user_ids)
+ notification_jobs = user_ids.map do |user_id|
+ run NotificationJob, params: {user_id: user_id}
+ end
+
+ run AdminNotificationJob, after: notification_jobs
+ end
+end
```
-`reload` is needed to see the latest status, since workflows are updated asynchronously.
+We can achieve that because `run` method returns the id of the created job, which we can use for chaining dependencies.
-#### Via CLI:
+Now, when we create the workflow like this:
+```ruby
+flow = NotifyWorkflow.create([54, 21, 24, 154, 65]) # 5 user ids as an argument
+```
+
+it will generate a workflow with 5 `NotificationJob`s and one `AdminNotificationJob` which will depend on all of them:
+
+![DynamicWorkflow](https://i.imgur.com/HOI3fjc.png)
+
+## Command line interface (CLI)
+
+### Checking status
+
- of a specific workflow:
```
bundle exec gush show <workflow_id>
```
@@ -217,21 +337,16 @@
```
bundle exec gush list
```
+### Vizualizing workflows as image
-### Requiring workflows inside your projects
+This requires that you have imagemagick installed on your computer:
-When using Gush and its CLI commands you need a Gushfile.rb in root directory.
-Gushfile should require all your Workflows and jobs, for example:
-```ruby
-require_relative './lib/your_project'
-
-Dir[Rails.root.join("app/workflows/**/*.rb")].each do |file|
- require file
-end
+```
+bundle exec gush viz <NameOfTheWorkflow>
```
## Contributors
- [Mateusz Lenik](https://github.com/mlen)