README.md in tobox-0.5.2 vs README.md in tobox-0.6.0

- old
+ new

@@ -19,10 +19,12 @@ - [Inbox](#inbox) - [Zeitwerk](#zeitwerk) - [Sentry](#sentry) - [Datadog](#datadog) - [Stats](#stats) +- [Advanced](#advanced) + - [Batch Events Handling](#batch-events) - [Supported Rubies](#supported-rubies) - [Rails support](#rails-support) - [Why?](#why) - [Development](#development) - [Contributing](#contributing) @@ -99,12 +101,12 @@ database Sequel.connect("postgres://user:pass@dbhost/database") # table :outbox # concurrency 8 on("user_created") do |event| puts "created user #{event[:after]["id"]}" - DataLakeService.user_created(user_data_hash) - BillingService.bill_user_account(user_data_hash) + DataLakeService.user_created(event) + BillingService.bill_user_account(event) end on("user_updated") do |event| # ... end on("user_created", "user_updated") do |event| @@ -654,9 +656,59 @@ # extend to hold the lock for the next loop lock_info = lock_manager.lock("outbox", 5000, extend: lock_info) rescue Redlock::LockError # some other server already has the lock, try later + end +end +``` + +<a id="markdown-advanced" name="advanced"></a> +## Advanced + +<a id="markdown-batch-events" name="batch-events"></a> +### Batch Events Handling + +You may start hitting a scale where the workload generated by `tobox` puts the master replica under water. Particularly with PostgreSQL, which isn't optimized for writes, this manifests in CPU usage spiking due to index bypasses, or locks on accessing shared buffers. + +A way to aleviate this is by hnadling events in batches. By handling N events at a time, the database can drain events more efficiently, while you can either still handle them one by one, or batch them, if possible. For instance, the AWS SDK contains batching alternatives of several APIs, including the SNS publish API. + +You can do so by setting a batch size in your configuration, and spread the arguments in the event handler: + +```ruby +# tobox.rb + +batch_size 10 # fetching 10 events at a time + +on("user_created", "user_updated") do |*events| # 10 events at most + if events.size == 1 + DataLakeService.user_created(events.first) + else + DataLakeService.batch_users_created(events) + end +end +``` + +In case you're using a batch API solution which may fail for a subset of events, you are able to communicate which events from the batch failed by using `Tobox.raise_batch_errors` API: + +```ruby +on("user_created", "user_updated") do |*events| # 10 events at most + if events.size == 1 + DataLakeService.user_created(events.first) + else + success, failed_events_with_errors = DataLakeService.batch_users_created(events) + + # handle success first + + batch_errors = failed_events_with_errors.to_h do |event, exception| + [ + events.index(event), + exception + ] + end + + # events identified by the batch index will be retried. + Tobox.raise_batch_errors(batch_errors) end end ``` <a id="markdown-supported-rubies" name="supported-rubies"></a>