README.md in batch-loader-0.3.0 vs README.md in batch-loader-1.0.0
- old
+ new
@@ -4,39 +4,43 @@
[![Coverage Status](https://coveralls.io/repos/github/exAspArk/batch-loader/badge.svg)](https://coveralls.io/github/exAspArk/batch-loader)
[![Code Climate](https://img.shields.io/codeclimate/github/exAspArk/batch-loader.svg)](https://codeclimate.com/github/exAspArk/batch-loader)
[![Downloads](https://img.shields.io/gem/dt/batch-loader.svg)](https://rubygems.org/gems/batch-loader)
[![Latest Version](https://img.shields.io/gem/v/batch-loader.svg)](https://rubygems.org/gems/batch-loader)
-Simple tool to avoid N+1 DB queries, HTTP requests, etc.
+This gem provides a generic lazy batching mechanism to avoid N+1 DB queries, HTTP queries, etc.
## Contents
* [Highlights](#highlights)
* [Usage](#usage)
* [Why?](#why)
* [Basic example](#basic-example)
* [How it works](#how-it-works)
- * [REST API example](#rest-api-example)
+ * [RESTful API example](#restful-api-example)
* [GraphQL example](#graphql-example)
* [Caching](#caching)
* [Installation](#installation)
* [Implementation details](#implementation-details)
* [Development](#development)
* [Contributing](#contributing)
+* [Alternatives](#alternatives)
* [License](#license)
* [Code of Conduct](#code-of-conduct)
+<a href="https://www.universe.com/" target="_blank" rel="noopener noreferrer">
+ <img src="images/universe.png" height="41" width="153" alt="Sponsored by Universe" style="max-width:100%;">
+</a>
+
## Highlights
* Generic utility to avoid N+1 DB queries, HTTP requests, etc.
* Adapted Ruby implementation of battle-tested tools like [Haskell Haxl](https://github.com/facebook/Haxl), [JS DataLoader](https://github.com/facebook/dataloader), etc.
-* Parent objects don't have to know about children's requirements, batching is isolated.
-* Automatically caches previous queries.
-* Doesn't require to create custom classes.
-* Thread-safe (`BatchLoader#load`).
-* Has zero dependencies.
-* Works with any Ruby code, including REST APIs and GraphQL.
+* Batching is isolated and lazy, load data in batch where and when it's needed.
+* Automatically caches previous queries (identity map).
+* Thread-safe (`loader`).
+* No need to share batching through variables or custom defined classes.
+* No dependencies, no monkey-patches, no extra primitives such as Promises.
## Usage
### Why?
@@ -45,23 +49,19 @@
```ruby
def load_posts(ids)
Post.where(id: ids)
end
-def load_users(posts)
- posts.map { |post| post.user }
-end
-
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
# _ ↓ _
# ↙ ↓ ↘
- # U ↓ ↓ SELECT * FROM users WHERE id = 1
-users = load_users(post) # ↓ U ↓ SELECT * FROM users WHERE id = 2
- # ↓ ↓ U SELECT * FROM users WHERE id = 3
+users = posts.map do |post| # U ↓ ↓ SELECT * FROM users WHERE id = 1
+ post.user # ↓ U ↓ SELECT * FROM users WHERE id = 2
+end # ↓ ↓ U SELECT * FROM users WHERE id = 3
# ↘ ↓ ↙
# ¯ ↓ ¯
-users.map { |u| user.name } # Users
+puts users # Users
```
The naive approach would be to preload dependent objects on the top level:
```ruby
@@ -82,86 +82,80 @@
# map user to post
posts.each { |post| post.user = user_by_id[post.user_id] }
end
-def load_users(posts)
- posts.map { |post| post.user }
-end
-
posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
# _ ↓ _ SELECT * FROM users WHERE id IN (1, 2, 3)
# ↙ ↓ ↘
- # U ↓ ↓
-users = load_posts(post.user) # ↓ U ↓
- # ↓ ↓ U
+users = posts.map do |post| # U ↓ ↓
+ post.user # ↓ U ↓
+end # ↓ ↓ U
# ↘ ↓ ↙
# ¯ ↓ ¯
-users.map { |u| user.name } # Users
+puts users # Users
```
-But the problem here is that `load_posts` now depends on the child association and knows that it has to preload the data for `load_users`. And it'll do it every time, even if it's not necessary. Can we do better? Sure!
+But the problem here is that `load_posts` now depends on the child association and knows that it has to preload data for future use. And it'll do it every time, even if it's not necessary. Can we do better? Sure!
### Basic example
With `BatchLoader` we can rewrite the code above:
```ruby
def load_posts(ids)
Post.where(id: ids)
end
-def load_users(posts)
- posts.map do |post|
- BatchLoader.for(post.user_id).batch do |user_ids, batch_loader|
- User.where(id: user_ids).each { |u| batch_loader.load(u.id, user) }
- end
+def load_user(post)
+ BatchLoader.for(post.user_id).batch do |user_ids, loader|
+ User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end
end
-posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
- # _ ↓ _
- # ↙ ↓ ↘
- # BL ↓ ↓
-users = load_users(posts) # ↓ BL ↓
- # ↓ ↓ BL
- # ↘ ↓ ↙
- # ¯ ↓ ¯
-BatchLoader.sync!(users).map(&:name) # Users SELECT * FROM users WHERE id IN (1, 2, 3)
+posts = load_posts([1, 2, 3]) # Posts SELECT * FROM posts WHERE id IN (1, 2, 3)
+ # _ ↓ _
+ # ↙ ↓ ↘
+users = posts.map do |post| # BL ↓ ↓
+ load_user(post) # ↓ BL ↓
+end # ↓ ↓ BL
+ # ↘ ↓ ↙
+ # ¯ ↓ ¯
+puts users # Users SELECT * FROM users WHERE id IN (1, 2, 3)
```
As we can see, batching is isolated and described right in a place where it's needed.
### How it works
-In general, `BatchLoader` returns a lazy object. In other programming languages it usually called Promise, but I personally prefer to call it lazy, since Ruby already uses the name in standard library :) Each lazy object knows which data it needs to load and how to batch the query. When all the lazy objects are collected it's possible to resolve them once without N+1 queries.
+In general, `BatchLoader` returns a lazy object. Each lazy object knows which data it needs to load and how to batch the query. As soon as you need to use the lazy objects, they will be automatically loaded once without N+1 queries.
-So, when we call `BatchLoader.for` we pass an item (`user_id`) which should be batched. For the `batch` method, we pass a block which uses all the collected items (`user_ids`):
+So, when we call `BatchLoader.for` we pass an item (`user_id`) which should be collected and used for batching later. For the `batch` method, we pass a block which will use all the collected items (`user_ids`):
<pre>
-BatchLoader.for(post.<b>user_id</b>).batch do |<b>user_ids</b>, batch_loader|
+BatchLoader.for(post.<b>user_id</b>).batch do |<b>user_ids</b>, loader|
...
end
</pre>
-Inside the block we execute a batch query for our items (`User.where`). After that, all we have to do is to call `load` method and pass an item which was used in `BatchLoader.for` method (`user_id`) and the loaded object itself (`user`):
+Inside the block we execute a batch query for our items (`User.where`). After that, all we have to do is to call `loader` by passing an item which was used in `BatchLoader.for` method (`user_id`) and the loaded object itself (`user`):
<pre>
-BatchLoader.for(post.<b>user_id</b>).batch do |user_ids, batch_loader|
- User.where(id: user_ids).each { |u| batch_loader.load(<b>u.id</b>, <b>user</b>) }
+BatchLoader.for(post.<b>user_id</b>).batch do |user_ids, loader|
+ User.where(id: user_ids).each { |user| loader.call(<b>user.id</b>, <b>user</b>) }
end
</pre>
-Now we can resolve all the collected `BatchLoader` objects:
+When we call any method on the lazy object, it'll be automatically loaded through batching for all instantiated `BatchLoader`s:
<pre>
-BatchLoader.sync!(users) # => SELECT * FROM users WHERE id IN (1, 2, 3)
+puts users # => SELECT * FROM users WHERE id IN (1, 2, 3)
</pre>
For more information, see the [Implementation details](#implementation-details) section.
-### REST API example
+### RESTful API example
Now imagine we have a regular Rails app with N+1 HTTP requests:
```ruby
# app/models/post.rb
@@ -185,45 +179,45 @@
As we can see, the code above will make N+1 HTTP requests, one for each post. Let's batch the requests with a gem called [parallel](https://github.com/grosser/parallel):
```ruby
class Post < ApplicationRecord
def rating_lazy
- BatchLoader.for(post).batch do |posts, batch_loader|
- Parallel.each(posts, in_threads: 10) { |post| batch_loader.load(post, post.rating) }
+ BatchLoader.for(post).batch do |posts, loader|
+ Parallel.each(posts, in_threads: 10) { |post| loader.call(post, post.rating) }
end
end
# ...
end
```
-`BatchLoader#load` is thread-safe. So, if `HttpClient` is also thread-safe, then with `parallel` gem we can execute all HTTP requests concurrently in threads (there are some benchmarks for [concurrent HTTP requests](https://github.com/exAspArk/concurrent_http_requests) in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.
+`loader` is thread-safe. So, if `HttpClient` is also thread-safe, then with `parallel` gem we can execute all HTTP requests concurrently in threads (there are some benchmarks for [concurrent HTTP requests](https://github.com/exAspArk/concurrent_http_requests) in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.
-Now we can resolve all `BatchLoader` objects in the controller:
+In the controller, all we have to do is to replace `post.rating` with the lazy `post.rating_lazy`:
```ruby
class PostsController < ApplicationController
def index
posts = Post.limit(10)
serialized_posts = posts.map { |post| {id: post.id, rating: post.rating_lazy} }
- render json: BatchLoader.sync!(serialized_posts)
+
+ render json: serialized_posts
end
end
```
-`BatchLoader` caches the resolved values. To ensure that the cache is purged between requests in the app add the following middleware to your `config/application.rb`:
+`BatchLoader` caches the loaded values. To ensure that the cache is purged between requests in the app add the following middleware to your `config/application.rb`:
```ruby
config.middleware.use BatchLoader::Middleware
```
See the [Caching](#caching) section for more information.
### GraphQL example
-With GraphQL using batching is particularly useful. You can't use usual techniques such as preloading associations in advance to avoid N+1 queries.
-Since you don't know which fields user is going to ask in a query.
+Batching is particularly useful with GraphQL. Using such techniques as preloading data in advance to avoid N+1 queries can be very complicated, since a user can ask for any available fields in a query.
Let's take a look at the simple [graphql-ruby](https://github.com/rmosolgo/graphql-ruby) schema example:
```ruby
Schema = GraphQL::Schema.define do
@@ -244,11 +238,11 @@
name "User"
field :name, !types.String
end
```
-If we want to execute a simple query like:
+If we want to execute a simple query like the following, we will get N+1 queries for each `post.user`:
```ruby
query = "
{
posts {
@@ -256,85 +250,83 @@
name
}
}
}
"
-Schema.execute(query, variables: {}, context: {})
+Schema.execute(query)
```
-We will get N+1 queries for each `post.user`. To avoid this problem, all we have to do is to change the resolver to use `BatchLoader`:
+To avoid this problem, all we have to do is to change the resolver to return `BatchLoader`:
```ruby
PostType = GraphQL::ObjectType.define do
name "Post"
field :user, !UserType, resolve: ->(post, args, ctx) do
- BatchLoader.for(post.user_id).batch do |user_ids, batch_loader|
- User.where(id: user_ids).each { |user| batch_loader.load(user.id, user) }
+ BatchLoader.for(post.user_id).batch do |user_ids, loader|
+ User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end
end
end
```
-And setup GraphQL with built-in `lazy_resolve` method:
+And setup GraphQL to use the built-in `lazy_resolve` method:
```ruby
Schema = GraphQL::Schema.define do
query QueryType
- lazy_resolve BatchLoader, :sync
+ use BatchLoader::GraphQL
end
```
+That's it.
+
### Caching
-By default `BatchLoader` caches the resolved values. You can test it by running something like:
+By default `BatchLoader` caches the loaded values. You can test it by running something like:
```ruby
def user_lazy(id)
- BatchLoader.for(id).batch do |ids, batch_loader|
- User.where(id: ids).each { |user| batch_loader.load(user.id, user) }
+ BatchLoader.for(id).batch do |ids, loader|
+ User.where(id: ids).each { |user| loader.call(user.id, user) }
end
end
-user_lazy(1) # no request
-# => <#BatchLoader>
+puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
+# => <#User:...>
-user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
-# => <#User>
+puts user_lazy(1) # no request
+# => <#User:...>
+```
-user_lazy(1).sync # no request
-# => <#User>
+Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:
+
+```ruby
+use BatchLoader::Middleware
```
To drop the cache manually you can run:
```ruby
-user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
-user_lazy(1).sync # no request
+puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
+puts user_lazy(1) # no request
BatchLoader::Executor.clear_current
-user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
+puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
```
-Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:
-
-```ruby
-# calls "BatchLoader::Executor.clear_current" after each request
-use BatchLoader::Middleware
-```
-
In some rare cases it's useful to disable caching for `BatchLoader`. For example, in tests or after data mutations:
```ruby
def user_lazy(id)
- BatchLoader.for(id).batch(cache: false) do |ids, batch_loader|
+ BatchLoader.for(id).batch(cache: false) do |ids, loader|
# ...
end
end
-user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
-user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
+puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
+puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
```
## Installation
Add this line to your application's Gemfile:
@@ -351,20 +343,36 @@
$ gem install batch-loader
## Implementation details
-Coming soon
+See the [slides](https://speakerdeck.com/exaspark/batching-a-powerful-way-to-solve-n-plus-1-queries) [37-42].
## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/exAspArk/batch-loader. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
+
+## Alternatives
+
+There are some other Ruby implementations for batching such as:
+
+* [shopify/graphql-batch](https://github.com/shopify/graphql-batch)
+* [sheerun/dataloader](https://github.com/sheerun/dataloader)
+
+However, `batch-loader` has some differences:
+
+* It is implemented for general usage and can be used not only with GraphQL. In fact, we use it for RESTful APIs and GraphQL on production at the same time.
+* It doesn't try to mimic implementations in other programming languages which have an asynchronous nature. So, it doesn't load extra dependencies to bring such primitives as Promises, which are not very popular in Ruby community.
+Instead, it uses the idea of lazy objects, which are included in the [Ruby standard library](https://ruby-doc.org/core-2.4.1/Enumerable.html#method-i-lazy). These lazy objects allow one to return the necessary data at the end when it's necessary.
+* It doesn't force you to share batching through variables or custom defined classes, just pass a block to the `batch` method.
+* It doesn't require to return an array of the loaded objects in the same order as the passed items. I find it difficult to satisfy these constraints: to sort the loaded objects and add `nil` values for the missing ones. Instead, it provides the `loader` lambda which simply maps an item to the loaded object.
+* It doesn't depend on any other external dependencies. For example, no need to load huge external libraries for thread-safety, the gem is thread-safe out of the box.
## License
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).