= DB Charmer - ActiveRecord Connection Magic Plugin
+DbCharmer+ is a simple yet powerful plugin for ActiveRecord that does a few things:
1. Allows you to easily manage AR models' connections (+switch_connection_to+ method)
2. Allows you to switch AR models' default connections to a separate servers/databases
3. Allows you to easily choose where your query should go (Model.on_* methods family)
4. Allows you to automatically send read queries to your slaves while masters would handle all the updates.
5. Adds multiple databases migrations to ActiveRecord
== Installation
There are two options when approaching db-charmer installation:
* using the gem (recommended)
* install as a Rails plugin
To install as a gem, add this to your environment.rb:
config.gem 'db-charmer', :lib => 'db_charmer'
And then run the command:
sudo rake gems:install
To install db-charmer as a Rails plugin use this:
script/plugin install git://github.com/kovyrin/db-charmer.git
_Notice_: If you use db-charmer in a non-rails project, you may need to set DbCharmer.env to a correct value
before using any of its connection management methods. Correct value here is a valid database.yml
first-level section name.
== Easy ActiveRecord Connection Management
As a part of this plugin we've added +switch_connection_to+ method that accepts many different kinds
of db connections specifications and uses them on a model. We support:
1. Strings and symbols as the names of connection configuration blocks in database.yml.
2. ActiveRecord models (we'd use connection currently set up on a model).
3. Database connections (Model.connection)
4. Nil values to reset model to default connection.
Sample code:
class Foo < ActiveRecord::Model; end
Foo.switch_connection_to(:blah)
Foo.switch_connection_to('foo')
Foo.switch_connection_to(Bar)
Foo.switch_connection_to(Baz.connection)
Foo.switch_connection_to(nil)
The +switch_connection_to+ method has an optional second parameter +should_exist+ which is true
by default. This parameter is used when the method is called with a string or a symbol connection
name and there is no such connection configuration in the database.yml file. If this parameter
is +true+, an exception would be raised, otherwise, the error would be ignored and no connection
change would happen.
This is really useful when in development mode or in a tests you do not want to create many different
databases on your local machine and just want to put all your tables in a single database.
*Warning*: All the connection switching calls would switch connection *only* for those classes the
method called on. You can't call the +switch_connection_to+ method and switch connection for a
base class in some hierarchy (for example, you can't switch AR::Base connection and see all your
models switched to the new connection, use the classic +establish_connection+ instead).
== Multiple DB Migrations
In every application that works with many databases, there is need in a convenient schema migrations mechanism.
All Rails users already have this mechanism - rails migrations. So in +DbCharmer+, we've made it possible
to seamlessly use multiple databases in Rails migrations.
There are two methods available in migrations to operate on more than one database:
1. Global connection change method - used to switch whole migration to a non-default database.
2. Block-level connection change method - could be used to do only a part of a migration on a non-default db.
Migration class example (global connection rewrite):
class MultiDbTest < ActiveRecord::Migration
db_magic :connection => :second_db
def self.up
create_table :test_table, :force => true do |t|
t.string :test_string
t.timestamps
end
end
def self.down
drop_table :test_table
end
end
Migration class example (block-level connection rewrite):
class MultiDbTest < ActiveRecord::Migration
def self.up
on_db :second_db do
create_table :test_table, :force => true do |t|
t.string :test_string
t.timestamps
end
end
end
def self.down
on_db :second_db { drop_table :test_table }
end
end
Migration class example (global connection rewrite, multiple connections with the same table):
(NOTE: both :connection and :connections can take an array of connections)
class MultiDbTest < ActiveRecord::Migration
db_magic :connections => [:second_db, :default]
def self.up
create_table :test_table, :force => true do |t|
t.string :test_string
t.timestamps
end
end
def self.down
drop_table :test_table
end
end
=== Default Migrations Connection
Starting with DbCharmer version 1.6.10 it is possible to call ActiveRecord::Migration.db_magic
and specify default migration connection that would be used by all migrations without
excplicitly switched connections. If you want to switch your migration to the default ActiveRecord
connection, just use db_magic :connection => :default.
=== Invalid Connection Names Handling
By default in all environments on_db and db_magic statments would fail if
specified connection does not exist in database.yml. It is possible to make +DbCharmer+
ignore such situations in non-production environments so that rails would create the tables
in your single database (especially useful in test databases).
This behaviour is controlled by the DbCharmer.connections_should_exist
configuration attribute which could be set from a rails initializer.
Warning: if in test environment you use separate connections and master-slave support
in DbCharmer, make sure you disable transactional fixtures support in Rails. Without
this change you're going to see all kinds of weird data visibility problems in your tests.
== Using Models in Master-Slave Environments
Master-slave replication is the most popular scale-out technique in a medium-sized and
large database-centric applications today. There are some rails plugins out there that help
developers to use slave servers in their models but none of them were flexible enough
for us to start using them in a huge application we work on.
So, we've been using ActsAsReadonlyable plugin for a long time and have made tons
of changes in its code over the time. But since that plugin has been abandoned
by its authors, we've decided to collect all of our master-slave code in one plugin
and release it for rails 2.2+. +DbCharmer+ adds the following features to Rails models:
=== Auto-Switching all Reads to the Slave(s)
When you create a model, you could use db_magic :slave => :blah or
db_magic :slaves => [ :foo, :bar ] commands in your model to set up reads
redirection mode when all your find/count/exist/etc methods will be reading data
from your slave (or a bunch of slaves in a round-robin manner). Here is an example:
class Foo < ActiveRecord::Base
db_magic :slave => :slave01
end
class Bar < ActiveRecord::Base
db_magic :slaves => [ :slave01, :slave02 ]
end
=== Default Connection Switching
If you have more than one master-slave cluster (or simply more than one database)
in your database environment, then you might want to change the default database
connection of some of your models. You could do that by using
db_magic :connection => :foo call from your models. Example:
class Foo < ActiveRecord::Base
db_magic :connection => :foo
end
Sample model on a separate master-slave cluster (so, separate main connection +
a slave connection):
class Bar < ActiveRecord::Base
db_magic :connection => :bar, :slave => :bar_slave
end
=== Per-Query Connection Management
Sometimes you have select queries that you know you want to run on the master.
This could happen for example when you have just added some data and need to read
it back and not sure if it made it all the way to the slave yet or no. For this
situation and a few others there is a set of methods we've added to ActiveRecord models:
1) +on_master+ - this method could be used in two forms: block form and proxy form.
In the block form you could force connection switch for a block of code:
User.on_master do
user = User.find_by_login('foo')
user.update_attributes!(:activated => true)
end
In the proxy form this method could be used to force one query to be performed on
the master database server:
Comment.on_master.last(:limit => 5)
User.on_master.find_by_activation_code(code)
User.on_master.exists?(:login => login, :password => password)
2) +on_slave+ - this method is used to force a query to be run on a slave even in
situations when it's been previously forced to use the master. If there is more
than one slave, one would be selected randomly. Tis method has two forms as
well: block and proxy.
3) on_db(connection) - this method is what makes two previous methods
possible. It is used to switch a model's connection to some db for a short block
of code or even for one statement (two forms). It accepts the same range of values
as the +switch_connection_to+ method does. Example:
Comment.on_db(:olap).count
Post.on_db(:foo).find(:first)
By default in development and test environments you could use non-existing connections in your
on_db calls and rails would send all your queries to a single default database. In
production on_db won't accept non-existing names.
This behaviour is controlled by the DbCharmer.connections_should_exist
configuration attribute which could be set from a rails initializer.
=== Associations Connection Management
ActiveRecord models can have an associations with each other and since every model has its
own database connections, it becomes pretty hard to manage connections in a chained calls
like User.posts.count. With a class-only connection switching methods this call
would look like the following if we'd want to count posts on a separate database:
Post.on_db(:olap) { User.posts.count }
Apparently this is not the best way to write the code and we've implemented an on_*
methods on associations as well so you could do things like this:
@user.posts.on_db(:olap).count
@user.posts.on_slave.find(:title => 'Hello, world!')
Notice: Since ActiveRecord associations implemented as proxies for resulting
objects/collections, it is possible to use our connection switching methods even without
chained methods:
@post.user.on_slave - would return post's author
@photo.owner.on_slave - would return photo's owner
Starting with +DbCharmer+ release 1.4 it is possible to use prefix notation for has_many
and HABTM associations connection switching:
@user.on_db(:foo).posts
@user.on_slave.posts
=== Named Scopes Support
To make it easier for +DbCharmer+ users to use connections switching methods with named scopes,
we've added on_* methods support on the scopes as well. All the following scope chains
would do exactly the same way (the query would be executed on the :foo database connection):
Post.on_db(:foo).published.with_comments.spam_marked.count
Post.published.on_db(:foo).with_comments.spam_marked.count
Post.published.with_comments.on_db(:foo).spam_marked.count
Post.published.with_comments.spam_marked.on_db(:foo).count
And now, add this feature to our associations support and here is what we could do:
@user.on_db(:archive).posts.published.all
@user.posts.on_db(:olap).published.count
@user.posts.published.on_db(:foo).first
=== Bulk Connection Management
Sometimes you want to run code where a large number of tables may be used, and you'd like
them all to use an alternate database. You can now do this:
DbCharmer.with_remapped_databases(:logs => :big_logs_slave) { ... }
Any model whose default database is +:logs+ (e.g., db_charmer :connection => :logs)
will now have its connection switched to +:big_logs_slave+ in that block. This is lower
precedence than any other +DbCharmer+ method, so Model.on_db(:foo).find(...) and
such things will still use the database they specify, not the one that model was remapped
to.
You can specify any number of remappings at once, and you can also use +:master+ as a database
name that matches any model that has not had its connection set by +DbCharmer+ at all.
*Note*: +DbCharmer+ works via +alias_method_chain+ in model classes. It is very careful
to only patch the models it needs to. However, if you use +with_remapped_databases+ and
remap the default database (+:master+), then it has no choice but to patch all subclasses
of +ActiveRecord::Base+. This should not cause any serious problems or any big performance
impact, but it is worth noting.
== Simple Sharding Support
Starting with the release 1.6.0 of +DbCharmer+ we have added support for simple database sharding
to our ActiveRecord extensions. The feature is still in alpha stage and should not be used in
production without complete understanding of the principles of its work.
At this point we support three sharding methods:
1) range - really simple sharding method that allows you to take a table, slice is to a set of
smaller tables with pre-defined ranges of primary keys and then put those smaller tables to
different databases/servers. This could be useful for situations where you have a huge table that
is slowly growing and you just want to keep it simple and split the table load into a few servers
without building any complex sharding schemes.
2) hash_map - pretty simple sharding method that allows you to take a table and slice it to a set
of smaller tables by some key that has a pre-defined key of values. For example, list of US mailing
addresses could be sharded by states, where you'd be able to define which states are stored in which
databases/servers.
3) db_block_map - this is a really complex sharding method that allows you to shard your table into a
set of small fixed-size blocks that then would be assigned to a set of shards (databases/servers).
Whenever you would need an additional blocks they would be allocated automatically and then balanced
across the shards you have defined in your database. This method could be used to scale out huge
tables with hundreds of millions to billions of rows and allows relatively easy re-sharding techniques
to be implemented on top.
=== How to enable sharding?
To enable sharding extensions you need to take a few things:
1) Create a Rails initializer (on run this code when you initialize your script/application) with a
set of sharded connections defined. Each connection would have a name, sharding method and an optional
set of parameters to initialize the sharding method of your choice.
2) Specify sharding connection you want to use in your models.
3) Specify the shard you want to use before doing any operations on your models.
For more details please check out the following documentation sections.
=== Sharded Connections
Sharded connection is a simple abstractions that allows you to specify all sharding parameters for a
cluster in one place and then use this centralized configuration in your models. Here are a few examples
of sharded connections initizlization calls:
1) Sample range-based sharded connection:
TEXTS_SHARDING_RANGES = {
0...100 => :shard1,
100..200 => :shard2,
:default => :shard3
}
DbCharmer::Sharding.register_connection(
:name => :texts,
:method => :range,
:ranges => TEXTS_SHARDING_RANGES
)
2) Sample hash map sharded connection:
SHARDING_MAP = {
'US' => :us_users,
'CA' => :ca_users,
:default => :other_users
}
DbCharmer::Sharding.register_connection(
:name => :users,
:method => :hash_map,
:map => SHARDING_MAP
)
3) Sample database block map sharded connection:
DbCharmer::Sharding.register_connection(
:name => :social,
:method => :db_block_map,
:block_size => 10000, # Number of keys per block
:map_table => :event_shards_map, # Table with blocks to shards mapping
:shards_table => :event_shards_info, # Shards connection information table
:connection => :social_shard_info # What connection to use to read the map
)
After your sharded connection is defined, you could use it in your models:
class Text < ActiveRecord::Base
db_magic :sharded => {
:key => :id,
:sharded_connection => :texts
}
end
class Event < ActiveRecord::Base
set_table_name :timeline_events
db_magic :sharded => {
:key => :to_uid,
:sharded_connection => :social
}
end
=== Switching connections in sharded models
Every time you need to perform an operation on a sharded model, you need to specify on which shard
you want to do it. We have a method for this which would look familiar for the people that use
+DbCharmer+ for non-sharded environments since it looks and works just like those per-query
connection management methods:
Event.shard_for(10).find(:conditions => { :to_uid => 123 }, :limit => 5)
Text.shard_for(123).find_by_id(123)
There is another method that could be used with range and hash_map sharding methods, this method
allows you to switch to the default shard:
Text.on_default_shard.create(:body => 'hello', :user_id => 123)
And finally, there is a method that allows you to run your code on each shard in the system (at this
point the method is supported in db_block_map method only):
Event.on_each_shard { |event| event.delete_all }
=== Defining your own sharding methods
It is possing with +DbCharmer+ for the users to define their own sharding methods. You need to do a
few things to implement your very own sharding scheme:
1) Create a class with a name DbCharmer::Sharding::Method::YourOwnName
2) Implement at least a constructor initialize(config) and a lookup instance
method shard_for_key(key) that would return either a connection name from database.yml
file or just a hash of connection parameters for rails connection adapters.
3) Register your sharded connection using the following call:
DbCharmer::Sharding.register_connection(
:name => :some_name,
:method => :your_own_name, # your sharder class name in lower case
... some additional parameters if needed ...
)
4) Use your sharded connection as any standard one.
=== Adding support for default shards in your custom sharding methods
If you want to be able to use +on_default_shard+ method on your custom-sharded models, you
need to do two things:
1) implement support_default_shard? instance method on your sharded class that
would return +true+ if you do support default shard specification and +false+ otherwise.
2) implement :default symbol support as a key in your +shard_for_key+ method.
=== Adding support for shards enumeration in your custom sharding methods
To add shards enumeration support to your custom-sharded models you need to implement
just one instance method +shard_connections+ on your sharded class. This method should
return an array of sharding connection names or connection configurations to be used to
establish shard connections in a loop.
== Documentation
For more information on the plugin internals, please check out the source code. All the plugin's
code is ~100% covered with a tests that were placed in a separate staging rails project located
at http://github.com/kovyrin/db-charmer-sandbox. The project has unit tests for all or at least the
most of the parts of plugin's code.
== What Ruby and Rails implementations does it work for?
We have a continuous integration setups for this plugin on MRI 1.8.6 with Rails 2.2 and 2.3.
We use the plugin in production on Scribd.com with MRI (rubyee) 1.8.6 and Rails 2.2.
== Who are the authors?
This plugin has been created in Scribd.com for our internal use and then the sources were opened for
other people to use. All the code in this package has been developed by Alexey Kovyrin for Scribd.com
and is released under the MIT license. For more details, see the LICENSE file.