# The `test_data` gem

`test_data` does what it says on the tin: it provides a fast & reliable system
for managing your Rails application's test data.

The gem serves as both an alternative to
[fixtures](https://guides.rubyonrails.org/testing.html#the-low-down-on-fixtures)
& [factory_bot](https://github.com/thoughtbot/factory_bot), as well a broader
workflow for building test suites that will scale gracefully as your application
grows in size and complexity.

What it does:

* Establishes a fourth Rails environment (you can [define custom Rails
  environments](https://guides.rubyonrails.org/configuring.html#creating-rails-environments)!)
  named `test_data`, which you'll use to create a universe of data for your
  tests by simply running and using your application. No Ruby DSL, no YAML
  files, no precarious approximations of realism: **real data created by your
  app**

* Exposes a simple API for ensuring that your data will be pristine for each of
  your tests, whether the test depends on test_data, an empty database, or Rails
  fixtures

* Safeguards your tests from flaky failures and supercharges your build by
  providing a sophisticated transaction manager that isolates each test while
  ensuring your data is only loaded once

If you've despaired over the seeming inevitability that all Rails test suites
will eventually grow to become slow, flaky, and incomprehensible, then this gem
is for you! And even if you're [a factory_bot
fan](https://twitter.com/searls/status/1379491813099253762?s=20), we hope you'll
be open to the idea that [there might be a better way](
#but-we-use-and-like-factory_bot-and-so-i-am-inclined-to-dislike-everything-about-this-gem).

_[Full disclosure: because the gem is still brand new, it makes a number of
[assumptions](#assumptions)—chief among them being that **Postgres & Rails 6+
are required**—so it may not work for every project just yet.]_

## Documentation

This gem requires a lot of documentation—not because `test_data` does a lot of
things, but because managing one's test data is an inherently complex task. If
one reason Rails apps chronically suffer from slow tests is that other
approaches oversimplify test data management, it stands to reason that any
discomfort caused by `test_data`'s scope may not be _unnecessary complexity_ but
instead be an indication of how little of the problem's _essential complexity_
we have reckoned with to this point.

1. [Getting Started Guide](#getting-started-guide)
    1. [Install and initialize `test_data`](#step-1-install-and-initialize-test_data)
    2. [Create some test data](#step-2-create-some-test-data)
    3. [Dump your `test_data` database](#step-3-dump-your-test_data-database)
    4. [Load your data in your tests](#step-4-load-your-data-in-your-tests)
    5. [Keeping your test data up-to-date](#step-5-keeping-your-test-data-up-to-date)
2. [Factory & Fixture Interoperability Guide](#factory--fixture-interoperability-guide)
    * [Using `test_data` with `factory_bot`](#using-test_data-with-factory_bot)
    * [Using `test_data` with Rails fixtures](#using-test_data-with-rails-fixtures)
3. [Rake Task Reference](#rake-task-reference)
    * [test_data:install](#test_datainstall)
    * [test_data:configure](#test_dataconfigure)
    * [test_data:verify_config](#test_dataverify_config)
    * [test_data:initialize](#test_datainitialize)
    * [test_data:dump](#test_datadump)
    * [test_data:load](#test_dataload)
    * [test_data:create_database](#test_datacreate_database)
    * [test_data:drop_database](#test_datadrop_database)
4. [API Reference](#api-reference)
    * [TestData.uses_test_data](#testdatauses_test_data)
    * [TestData.uses_clean_slate](#testdatauses_clean_slate)
    * [TestData.uses_rails_fixtures(self)](#testdatauses_rails_fixtures)
        * [TestData.prevent_rails_fixtures_from_loading_automatically!](#testdataprevent_rails_fixtures_from_loading_automatically)
    * [TestData.config](#testdataconfig)
    * [TestData.insert_test_data_dump](#testdatainsert_test_data_dump)
5. [Assumptions](#assumptions)
6. [Fears, Uncertainties, and Doubts](#fears-uncertainties-and-doubts) (Q & A)
    * [But we're already happy with
      factory_bot!](#but-we-use-and-like-factory_bot-and-so-i-am-inclined-to-dislike-everything-about-this-gem)
    * [How will we handle merge conflicts in the schema
      dumps?](#how-will-i-handle-merge-conflicts-in-these-sql-files-if-i-have-lots-of-people-working-on-lots-of-feature-branches-all-adding-to-the-test_data-database-dumps)
    * [Why can't I manage different SQL dumps for different
      scenarios?](#why-cant-i-save-multiple-database-dumps-to-cover-different-scenarios)
    * [These SQL dumps are way too large to commit to
      git!](#are-you-sure-i-should-commit-these-sql-dumps-theyre-way-too-big)
    * [Tests shouldn't rely on shared test data if they don't need
      to](#tests-shouldnt-use-shared-test-data-they-should-instantiate-the-objects-they-need)
    * [My tests aren't as fast as they should
      be](#im-worried-my-tests-arent-as-fast-as-they-should-be)
7. [Code of Conduct](#code-of-conduct)
8. [Changelog](/CHANGELOG.md)
9. [MIT License](/LICENSE.txt)

## Getting started guide

This guide will walk you through setting up `test_data` in your application. You
might notice that it's more complicated than installing a gem and declaring some
default `Widget` attributes! The hard truth is that designing robust and
reliable test data is an inherently complex problem and takes some thoughtful
planning. There are plenty of shortcuts available, but experience has shown they
tend to collapse under their own weight as your app scales and your team
grows—exactly when having a suite of fast & reliable tests is most valuable.

And if you get stuck or need help as you're getting started, please feel free to
[ask us for help](https://github.com/testdouble/test_data/discussions/new)!

### Step 1: Install and initialize `test_data`

#### Adding the gem

First, add `test_data` to your Gemfile. Either include it in all groups or add
it to the `:development`, `:test`, and (the all new!) `:test_data` gem groups:

```ruby
group :development, :test, :test_data do
  gem "test_data"
  # … other gems available to development & test
end
```

Since the `test_data` environment is designed to be used similarly to
`development` (i.e. with a running server and interacting via a browser), any
gems in your `:development` gem group should likely be included in a
`:test_data` gem group as well.

#### Configuring the gem and initializing the database

The gem ships with a number of Rake tasks, including
[test_data:install](#test_datainstall), which will generate the necessary
configuration and initialize a `test_data` database:

```
$ bin/rake test_data:install
```

This should output something like:

```
      create  config/environments/test_data.rb
      create  config/initializers/test_data.rb
      insert  config/database.yml
      insert  config/webpacker.yml
      insert  config/webpacker.yml
Created database 'yourappname_test_data'
 set_config
------------

(1 row)

Your test_data environment and database are ready for use! You can now run
your server (or any command) to create some test data like so:

  $ RAILS_ENV=test_data bin/rails server

````

The purpose of the `test_data` database is to provide a sandbox in which you
will manually generate test data by playing around with your app. Rather than
try to imitate realistic data using factories and fixtures (a task which only
grows more difficult as your models and their associations increase in
complexity), your test data will always be realistic because your real
application will have created it!

### Step 2: Create some test data

Now comes the fun part! It's time to start up your server in the new environment
and create some records by interacting with your system.

#### Running the server (and other commands)

To run your server against the new `test_data` database, set the `RAILS_ENV`
environment variable:

```
$ RAILS_ENV=test_data bin/rails server
```

_[If you're using [webpacker](https://github.com/rails/webpacker), you may also
need to start its development server as well with `RAILS_ENV=test_data
bin/webpack-dev-server`]_

Because `test_data` creates a full-fledged Rails environment, you can run any
number of Rails commands or Rake tasks against its database by setting
`RAILS_ENV=test_data`, either in your shell environment or with each command
(e.g. `RAILS_ENV=test_data bin/rake db:migrate`)

_[Aside: If you experience any hiccups in getting your server to work, please
[open an issue](https://github.com/testdouble/test_data/issues/new) and let us
know—it may present an opportunity for us to improve the `test_data:configure`
task!]_

#### Create test data by using your app

Once the app is running, it's time to generate some test data. You'll know how
to accomplish this step better than anyone—it's your app, after all!

A few bits of advice click & type some test data into existence:

* Spend a little time thoughtfully navigating each feature of your app in order
  to generate enough data to be representative of what would be needed to test
  them (e.g. one `User` per role, one of each kind of `Order`, etc.)
* Less is more: the less test data you create, the more meaningful & memorable
  it will be to yourself and your teammates when writing tests. Don't keep
  adding test data unless it will allow you to exercise additional application
  code (e.g. enough `Project` models to require pagination, but not hundreds of
  them for the sake of looking "production-like")
* Memorable names can become memes for the team to quickly recall and reference
  later (if the admin user is named "Angela" and the manager is "Maria", that'll
  probably serve you better than generic names like "TestUser #1")

If you make a mistake when creating your initial set of test data, it's
perfectly okay to reset the database and start over! Your future tests will be
coupled to this data as your application grows and evolves, so it's worth taking
the time to ensure the foundation is solid. (But that's not to say everything
needs to be perfect; you can always change things or add more data later—you'll
just have to update your tests accordingly.)

### Step 3: Dump your `test_data` database

Once you've created a good sampling of test data by interacting with your app,
the next step is to flush it from the `test_data` database to SQL files. These
database dumps are meant to be committed to source control and versioned
alongside your tests over the life of the application. Additionally, they are
designed to be incrementally
[migrated](#step-5-keeping-your-test-data-up-to-date) over time, just like you
migrate production database with every release.

Once you have your test data how you want it, dump the schema and data to SQL
files with the `test_data:dump` Rake task:

```
$ bin/rake test_data:dump
```

This will dump three files into `test/support/test_data`:

* `schema.sql` - Schema DDL used to (re-)initialize the `test_data` environment
  database for anyone looking to update your test data

* `data.sql` - The test data itself, exported as a bunch of SQL `INSERT`
  statements, which will be executed by your tests to load your test data

* `non_test_data.sql` - Data needed to run the `test_data` environment, but
  which shouldn't be inserted by your tests (the `ar_internal_metadata` and
  `schema_migrations` tables, by default; see `config.non_test_data_tables`)

You probably won't need to, but these paths can be overridden with
[TestData.config](#testdataconfig) method. Additional details can also be found
in the [test_data:dump](#test_datadump) Rake task reference.

Once you've made your initial set of dumps, briefly inspect them and—if
everything looks good—commit them. (And if the files are gigantic or full of
noise, you might find [these ideas
helpful](#are-you-sure-i-should-commit-these-sql-dumps-theyre-way-too-big)).

Does it feel weird to dump and commit SQL files? That's okay! It's [healthy to
be skeptical](https://twitter.com/searls/status/860553435116187649?s=20)
whenever you're asked to commit a generated file! Remember that the `test_data`
environment exists only for creating your test data. Your tests will, in turn,
load the SQL dump of your data into the `test` database, and things will proceed
just as if you'd been loading [Rails' built-in
fixtures](https://guides.rubyonrails.org/testing.html#the-low-down-on-fixtures)
from a set of YAML files.

### Step 4: Load your data in your tests

Now that you've dumped the contents of your `test_data` database, you can start
writing tests that rely on this test data.

To accomplish this, you'll likely want to add hooks to run before each test to
put the database into whatever state the test needs.

For the simplest case—ensuring your test data is loaded into the `test` database
and available to your test, you'll want to call the
[TestData.uses_test_data](#testdatauses_test_data) method at the beginning of
the test. The first time `uses_test_data` is called, `test_data` will start a
transaction and insert your test data. On subsequent calls to `uses_test_data`
by later tests, the transaction will be rolled back to a save point taken just
after the data was initially loaded, so that each test gets a clean starting
point without repeatedly executing the expensive SQL operation.

#### If you want every single test to have access to your test data

If, for the sake of consistency & simplicity you want every single Rails-aware
test to have access to your test data, you
can accomplish this with a single global before-each hook.

If you're using Rails' default
[Minitest](https://github.com/seattlerb/minitest), you can load it in a `setup`
hook in `ActiveSupport::TestCase`:

```ruby
class ActiveSupport::TestCase
  setup do
    TestData.uses_test_data
  end
end
```

Likewise, if you use [RSpec](https://rspec.info), you can accomplish the same
thing with global `before(:each)` hook in your `rails_helper.rb` file:

```ruby
RSpec.configure do |config|
  config.before(:each) do
    TestData.uses_test_data
  end
end
```

#### If some tests rely on test data and others need a clean slate

Of course, for simple units of code, it may be more prudent to manually create
the test data they need inline as opposed to relying on a shared source of test
data. For these tests, you can call
[TestData.uses_clean_slate](#testdatauses_clean_slate) in a `setup` hook.

For the best performance, you might consider a mode-switching method that's
invoked at the top of each test listing like this:

```ruby
class ActiveSupport::TestCase
  def self.uses(mode)
    case mode
    when :clean_slate
      setup { TestData.uses_clean_slate }
    when :test_data
      setup { TestData.uses_test_data }
    else
      raise "Invalid test data mode: #{mode}"
    end
  end
end

# A simple model that will `create` its own data
class WidgetTest < ActiveSupport::TestCase
  uses :clean_slate
  # …
end

# An integrated test that depends on a lot of data
class KitchenSinkTest < ActionDispatch::IntegrationTest
  uses :test_data
  # …
end
```

Or, with RSpec:

```ruby
module TestDataModes
  def uses(mode)
    case mode
    when :clean_slate
      before(:each) { TestData.uses_clean_slate }
    when :test_data
      before(:each) { TestData.uses_test_data }
    else
      raise "Invalid test data mode: #{mode}"
    end
  end
end

RSpec.configure do |config|
  config.extend(TestDataModes)
end

RSpec.describe Widget, type: :model do
  uses :clean_slate
  # …
end

RSpec.describe "Kitchen sink", type: :request do
  uses :test_data
  # …
end
```

But wait, there's more! If your test suite switches between multiple modes from
test-to-test, it's important to be aware of the marginal cost _between_ each of
those tests. For example, two tests in a row that call `TestData.uses_test_data`
only need a simple rollback as test setup, but a `TestData.uses_test_data`
followed by a `TestData.uses_clean_slate` requires a rollback, a truncation, and
another savepoint. These small costs add up, so consider [speeding up your
build](#im-worried-my-tests-arent-as-fast-as-they-should-be) by grouping your
tests into sub-suites based on their source of test data.

#### If your situation is more complicated

If you're adding `test_data` to an existing application, it's likely that you
won't be able to easily adopt a one-size-fits-all approach to test setup across
your entire suite. Some points of reference, if that's the situation you're in:

* If your test suite is **already using fixtures or factories** and the above
  hooks just broke everything, check out our [interoperability
  guide](#factory--fixture-interoperability-guide) for help.
* If you need to make any changes to the data after it's loaded, truncated, or
  after Rails fixtures are loaded, you can configure [lifecycle
  hooks](#lifecycle-hooks) that will help you achieve a **very fast test suite**
  by including those changes inside the transaction savepoints
* If you **don't want `test_data` managing transactions** and cleanup for you
  and just want to load the SQL dump, you can call
  [TestData.insert_test_data_dump](#testdatainsert_test_data_dump)
* For more information on how all this works, see the [API
  reference](#api-reference).

### Step 5: Keeping your test data up-to-date

Your app relies on its tests and your tests rely on their test data. This
creates a bit of a paradox: creating & maintaining test data is _literally_ a
tertiary concern but simultaneously an inescapable responsibility that will live
with you for the life of your application. That's true whether you use this gem,
`factory_bot`, Rails fixtures, or something else as a source of shared test
data.

Fortunately, we already have a fantastic tool available for keeping our
`test_data` database up-to-date over the life of our application: [Rails
migrations](https://guides.rubyonrails.org/active_record_migrations.html). If
your migrations are resilient enough for your production database, they should
also be able to keep your `test_data` database up-to-date. (As a happy side
effect of running your migrations against your test data, this means your
`test_data` database may help you identify hard-to-catch migration bugs early,
before being deployed to production!)

Whenever you create a new migration or add a major feature, you'll probably need
to update your test data. Here's how to do it:

* If the current SQL dumps in `test/support/test_data` are newer than your local
  `test_data` database:

    1. Be sure there's nothing in your local `test_data` database that you added
       intentionally and forgot to dump, because it's about to be erased

    2. Run `rake test_data:drop_database`

    3. Run `rake test_data:load` to recreate the `test_data` database and load
       the latest SQL dumps into it

    4. Run any pending migrations with `RAILS_ENV=test_data bin/rake db:migrate`

    5. If you need to create any additional data, start up the server
       (`RAILS_ENV=test_data bin/rails s`), just like in [Step
       2](#step-2-create-some-test-data)

    6. Export your newly-updated `test_data` database with `rake test_data:dump`

    7. Ensure your tests are passing and then commit the resulting SQL files

* If the local `test_data` database is already up-to-date with the current SQL
  dumps, follow steps **4 through 7** above

It's important to keep in mind that your test data SQL dumps are a shared,
generated resource among your team (just like a `structure.sql` or `schema.rb`
file). As a result, if your team doesn't integrate code frequently or if the
test data changes frequently, you'd be right to be concerned that [the resulting
merge conflicts could become
significant](#how-will-i-handle-merge-conflicts-in-these-sql-files-if-i-have-lots-of-people-working-on-lots-of-feature-branches-all-adding-to-the-test_data-database-dumps),
so sweeping changes should be made deliberately and in collaboration with other
contributors.

_[Aside: some Rails teams are averse to using migrations to migrate data as well
as schemas, instead preferring one-off scripts and tasks. You'll have an easier
time of things if you use migrations for both schema and data changes. Here are
some notes on [how to write data migrations
safely](https://blog.testdouble.com/posts/2014-11-04-healthy-migration-habits/#habit-4-dont-reference-models).
Otherwise, you'll need to remember to run any ad hoc deployment scripts against
your `test_data` Rails environment along with each of your other deployed
environments.]_

## Factory & Fixture Interoperability Guide

Let's be real, most Rails apps already have some tests, and most of those test
suites will already be relying on
[factory_bot](https://github.com/thoughtbot/factory_bot) or Rails' built-in
[test
fixtures](https://guides.rubyonrails.org/testing.html#the-low-down-on-fixtures).
While `test_data` is designed to be an alternative to both of these approaches
to managing your test data, it wouldn't be practical to ask a team to rewrite
all their existing tests in order to migrate to a different tool. That's why the
`test_data` gem goes to great lengths to play nicely with your existing tests,
while ensuring each test is wrapped in an isolated and fast always-rolled-back
transaction—regardless if the test depends on `test_data`, factories, fixtures,
all three, or none-of-the-above.

This section will hopefully make it a little easier to incorporate new
`test_data` tests into a codebase that's already using `factory_bot` and/or
Rails fixtures, whether you choose to incrementally migrate to using `test_data`
over time.

### Using `test_data` with `factory_bot`

This section will document some thoughts and strategies for introducing
`test_data` to a test suite that's already using `factory_bot`.

#### Getting your factory tests passing after adding `test_data`

Depending on the assumptions your tests make about the state of the database
before you've loaded any factories, it's possible that everything will "just
work" after adding [TestData.uses_test_data](#testdatauses_test_data) in a
before-each hook (as shown in the [setup
guide](#step-4-load-your-data-in-your-tests)). So by all means, try running your
suite after following the initial setup guide and see if the suite just passes.

If you find that your test suite is failing after adding
`TestData.uses_test_data` to your setup, don't panic! Test failures are most
likely caused by the combination of your `test_data` SQL dump with the records
inserted by your factories.

One approach would be to attempt to resolve each such failure one-by-one—usually
by updating the offending factories or editing your `test_data` database to
ensure they steer clear of one another. Care should be taken to preserve the
conceptual encapsulation of each test, however, as naively squashing errors
risks introducing inadvertent coupling between your factories and your
`test_data` data such that neither can be used independently of the other.

Another approach that the `test_data` gem provides is an additional mode with
`TestData.uses_clean_slate`, which—when called at the top of a factory-dependent
test—will ensure that the tables that `test_data` had written to will be
truncated, allowing the test to create whatever factories it needs without fear
of conflicts.

```ruby
class AnExistingFactoryUsingTest < ActiveSupport::Testcase
  setup do
    TestData.uses_clean_slate
    # pre-existing setup
  end
  # …
end
```

If you have a lot of tests, you can find a more sophisticated approaches for
logically switching between types of test data declaratively above in the
[getting started
section](#if-some-tests-rely-on-test-data-and-others-need-a-clean-slate)

### Using `test_data` with Rails fixtures

While [Rails
fixtures](https://guides.rubyonrails.org/testing.html#the-low-down-on-fixtures)
are similar to factories, the fact that they're run globally by Rails and
permanently committed to the test database actually makes them a little trickier
to work with. This section will cover a couple approaches for integrating
`test_data` into suites that use fixtures.

It's more likely than not that all your tests will explode in dramatic fashion
as soon as you add `TestData.uses_test_data` to a `setup` or `before(:each)`
hook. Typically, your fixtures will be loaded and committed immediately with
your `test_data` dump inserted afterward, which makes it exceedingly likely that
your tests will fail with primary key and unique constraint conflicts. If that's
the case you find yourself in, `test_data` provides an API that **overrides
Rails' built-in fixtures behavior with a monkey patch**.

And if that bold text wasn't enough to scare you off, here's how to do
it:

1. Before your tests have loaded (e.g. near the top of your test helper), call:
   [TestData.prevent_rails_fixtures_from_loading_automatically!](#testdataprevent_rails_fixtures_from_loading_automatically)
   This will patch Rails'
   [setup_fixtures](https://github.com/rails/rails/blob/main/activerecord/lib/active_record/test_fixtures.rb#L105)
   and effectively render it into a no-op, which means that your test fixtures
   will not be automatically loaded into your test database

2. In tests that rely on your `test_data` dump, call
   [TestData.uses_test_data](#step-4-load-your-data-in-your-tests) as you
   normally would. Because your fixtures won't be loaded automatically, they
   won't be available to these tests

3. In tests that need fixtures, call
   [TestData.uses_rails_fixtures(self)](#testdatauses_rails_fixtures) in a
   before-each hook. This will first ensure that any tables written to by
   `test_data` are truncated (as with `TestData.uses_clean_slate`) before
   loading your Rails fixtures

For example, you might add the following to an existing fixtures-dependent
test to get it passing:

```ruby
class AnExistingFixtureUsingTest < ActiveSupport::Testcase
  setup do
    TestData.uses_rails_fixtures(self)
    # pre-existing setup
  end

  # …
end
```

If you've adopted a mode-switching helper method [like the one described
above](#if-some-tests-rely-on-test-data-and-others-need-a-clean-slate), you
could of course add a third mode to cover any tests that depend on Rails
fixtures.

## Rake Task Reference

### test_data:install

A meta-task that runs [test_data:configure](#test_dataconfigure) and [test_data:initialize](#test_datainitialize).

### test_data:configure

This task runs several generators:

* `config/environments/test_data.rb` - As you may know, Rails ships with
  `development`, `test`, and `production` environments defined by default. But
  you can [actually define custom
  environments](https://guides.rubyonrails.org/configuring.html#creating-rails-environments),
  too! This gem adds a new `test_data` environment and database that's intended
  to be used to create and dump your test data. This new environment file loads
  your `development` environment's configuration and disables migration schema
  dumps so that you can run migrations against your `test_data` database without
  affecting your app's `schema.rb` or `structure.sql`.

* `config/initializers/test_data.rb` - Creates an initializer for the gem that
  calls [TestData.config](#testdataconfig) with an empty block and comments
  documenting the currently-available options and their default values

* `config/database.yml` - This generator adds a new `test_data` section to your
  database configuration, named with the same scheme as your other databases
  (e.g. `your_app_test_data`). If your configuration resembles Rails' generated
  `database.yml` and has a working `&default` alias, then this should "just
  work"

* `config/webpacker.yml` - The gem has nothing to do with web assets, but
  [webpacker](https://github.com/rails/webpacker) will display some prominent
  warnings or errors if it is loaded without a configuration entry for the
  currently-running environment, so this generator defines an alias based on
  your `development` config and then defines `test_data` as extending it

* `config/secrets.yml` - If your app still uses (the now-deprecated)
  [secrets.yml](https://guides.rubyonrails.org/4_1_release_notes.html#config-secrets-yml)
  file introduced in Rails 4.1, this generator will ensure that the `test_data`
  environment is accounted for with a generated `secret_key_base` value. If you
  have numerous secrets in this file's `development:` stanza, you may want to
  alias and inherit it into `test_data:` like the `webpacker.yml` generator does

* `config/cable.yml` - Simply defines a `test_data:` entry that tells
  [ActionCable](https://guides.rubyonrails.org/action_cable_overview.html) to
  use the `async` adapter, since that's also the default for `development`

### test_data:verify_config

This task will verify that your configuration appears to be valid by checking
with each of the gem's generators to inspect your configuration files, and will
error whenever a configuration problem is detected.

### test_data:initialize

This task gets your local `test_data` database up-and-running, either from a set
of dump files (if they already exist), or by loading your schema and running
your seed file. Specifically:

1. Creates the `test_data` environment's database, if it doesn't already exist

2. Ensures the database is non-empty to preserve data integrity (run
   [test_data:drop_database](#test_datadrop_database) first if you intend to
   reinitialize it)

3. Checks to see if a dump of the database already exists (by default, stored in
   `test/support/test_data/`)

    * If dumps do exist, it invokes [test_data:load](#test_dataload) to load
      them into the database

    * Otherwise, it invokes the task `db:schema:load` and `db:seed` (similar to
      Rails' built-in `db:setup` task)

### test_data:dump

This task is designed to be run after you've created or updated your test data
in the `test_data` database and you're ready to run your tests against it. The
task creates several plain SQL dumps from your `test_data` environment's
database:

* A schema-only dump, by default in `test/support/test_data/schema.sql`

* A data-only dump of records you want to be loaded in your tests, by default in
  `test/support/test_data/data.sql`

* A data-only dump of records that you *don't* want loaded in your tests in
  `test/support/test_data/non_test_data.sql`. By default, this includes Rails'
  internal tables: `ar_internal_metadata` and `schema_migrations`, configurable
  with [TestData.config](#testdataconfig)'s `non_test_data_tables`

Each of these files are designed to be committed and versioned with the rest of
your application. [TestData.config](#testdataconfig) includes several
options to control this task.

### test_data:load

This task will load your SQL dumps into your `test_data` database by:

1. Verifying the `test_data` environment's database is empty (creating it if it
   doesn't exist and failing if it's not empty)

2. Verifying that your schema, test data, and non-test data SQL dumps can be
   found at the configured paths

3. Loading the dumps into the `test_data` database

4. Warning if there are pending migrations that haven't been run yet

If there are pending migrations, you'll probably want to run them and then
dump & commit your test data so that they're up-to-date:

```
$ RAILS_ENV=test_data bin/rake db:migrate
$ bin/rake test_data:dump
```

### test_data:create_database

This task will create the `test_data` environment's database if it does not
already exist. It also
[enhances](https://dev.to/molly/rake-task-enhance-method-explained-3bo0) Rails'
`db:create` task so that `test_data` is created along with `development` and
`test` whenever `rake db:create` is run.

### test_data:drop_database

This task will drop the `test_data` environment's database if it exists. It also
enhances Rails' `db:drop` task so that `test_data` is dropped along with
`development` and `test` whenever `rake db:drop` is run.

## API Reference

### TestData.uses_test_data

This is the method designed to be used by your tests to load your test data
into your `test` database so that your tests can rely on it. Typically, you'll
want to call it at the beginning of each test that relies on the test data
managed by this gem—most often, in a before-each hook.

For the sake of speed and integrity, `TestData.uses_test_data` is designed to
take advantage of nested transactions ([Postgres
savepoints](https://www.postgresql.org/docs/current/sql-savepoint.html)). By
default, data is loaded in a transaction and intended to be rolled back to the
point _immediately after_ the data was imported between tests. This way, your
test suite only pays the cost of importing the SQL file once, but each of your
tests can enjoy a clean slate that's free of data pollution from other tests.
(This is similar to, but separate from, Rails fixtures'
[use_transactional_tests](https://edgeguides.rubyonrails.org/testing.html#testing-parallel-transactions)
option.)

_See configuration option:
[config.after_test_data_load](#configafter_test_data_load)_

### TestData.uses_clean_slate

If a test does not rely on your `test_data` data, you can instead ensure that it
runs against empty tables by calling `TestData.uses_clean_slate`. Like
`TestData.uses_test_data`, this would normally be called at the beginning of
each such test in a before-each hook.

This method works by first ensuring that your test data is loaded (and the
correspondent savepoint created), then will truncate all affected tables and
create another savepoint. It's a little counter-intuitive that you'd first
litter your database with data only to wipe it clean again, but it's much faster
to repeatedly truncate tables than to repeatedly import large SQL files.

_See configuration options:
[config.after_test_data_truncate](#configafter_test_data_truncate),
[config.truncate_these_test_data_tables](#configtruncate_these_test_data_tables)_

### TestData.uses_rails_fixtures

As described in this README's [fixture interop
guide](#using-test_data-with-rails-fixtures), `TestData.uses_rails_fixtures`
will load your app's [Rails
fixtures](https://guides.rubyonrails.org/testing.html#the-low-down-on-fixtures)
by intercepting Rails' built-in fixture-loading code. As with the other "uses"
methods, you'll likely want to call it in a before-each hook before any test
that needs access to your Rails fixtures.

There are two additional things to keep in mind if using this method:

1. Using this feature requires that you've first invoked
   [TestData.prevent_rails_fixtures_from_loading_automatically!](#testdataprevent_rails_fixtures_from_loading_automatically)
   to override Rails' default behavior before any of your tests have loaded or
   started running

2. Because the method depends on Rails' fixture caching mechanism, it must be
   passed an instance of the running test class (e.g.
   `TestData.uses_rails_fixtures(self)`)

Under the hood, this method effectively ensures a clean slate the same way
`TestData.uses_clean_slate` does, except that after creating the truncation
savepoint, it will then load your fixtures and finally create—wait for it—yet
another savepoint that subsequent calls to `uses_rails_fixtures` can rollback
to.

_See configuration option:
[config.after_rails_fixture_load](#configafter_rails_fixture_load)_

#### TestData.prevent_rails_fixtures_from_loading_automatically!

Call this method before any tests have been loaded or executed by your test
runner if you're planning to use
[TestData.uses_rails_fixtures](#testdatauses_rails_fixtures) to load Rails
fixtures into any of your tests. This method will disable the default behavior
of loading your Rails fixtures into the test database as soon as the first test
case with fixtures enabled is executed. (Inspect the [source for the
patch](/lib/test_data/active_record_ext.rb) to make sure you're comfortable with
what it's doing.)

### TestData.config

The generated `config/initializers/test_data.rb` initializer will include a call
to `TestData.config`, which takes a block that yields a mutable configuration
object (similar to `Rails.application.config`). If anything is unclear after
reading the documentation, feel free to review the
[initializer](lib/generators/test_data/initializer_generator.rb) and the [Config
class](/lib/test_data/config.rb) themselves.

#### Lifecycle hooks

Want to shift forward several timestamp fields after your `test_data` SQL dumps
are loaded into your test database? Need to refresh a materialized view after
your Rails fixtures are loaded? You _could_ do these things after calling
`TestData.uses_test_data` and `TestData.uses_rails_fixtures`, respectively, but
you'd take the corresponding performance hit in each and every test.

Instead, you can pass a callable or a block and `test_data` will execute it just
_after_ performing the associated data operation but just _before_ creating a
transaction savepoint. That way, whenever the gem rolls back between tests, your
hook won't need to be run again.

##### config.after_test_data_load

This is hook is run immediately after `TestData.uses_test_data` has loaded your
SQL dumps into the `test` database, but before creating a savepoint. Takes a
block or anything that responds to `call`.


```ruby
TestData.config do |config|
  # Example: roll time forward
  config.after_test_data_load do
    Boop.connection.exec_update(<<~SQL, nil, [[nil, Time.zone.now - System.epoch]])
      update boops set booped_at = booped_at + $1
    SQL
  end
end
```

##### config.after_test_data_truncate

This is hook is run immediately after `TestData.uses_clean_slate` has truncated
your test data, but before creating a savepoint. Takes a block or anything that
responds to `call`.

```ruby
TestData.config do |config|
  # Example: pass a callable instead of a block
  config.after_test_data_truncate(SomethingThatRespondsToCall.new)
end
```

##### config.after_rails_fixture_load

This is hook is run immediately after `TestData.uses_rails_fixtures` has loaded
your Rails fixtures into the `test` database, but before creating a savepoint.
Takes a block or anything that responds to `call`.

```ruby
TestData.config do |config|
  # Example: refresh Postgres assets like materialized views
  config.after_rails_fixture_load do
    RefreshesMaterializedViews.new.call
  end
end
```

#### test_data:dump options

The gem provides several options governing the behavior of the
[test_data:dump](#test_datadump) Rake task. You probably won't need to set these
unless you run into a problem with the defaults.

##### config.non_test_data_tables

Your application may have some tables that are necessary for the operation of
the application, but irrelevant or incompatible with you your tests. This data
is still dumped for the sake of being able to restore the database with [rake
test_data:load](#test_dataload), but will not be loaded when your tests are
running. Defaults to `[]`, (but will always include `ar_internal_metadata` and
`schema_migrations`).

```ruby
TestData.config do |config|
  config.non_test_data_tables = []
end
```

##### config.dont_dump_these_tables

Some tables populated by your application may not be necessary to either its
proper functioning or useful to your tests (e.g. audit logs), so you can save
time and storage by preventing those tables from being dumped entirely. Defaults
to `[]`.

```ruby
TestData.config do |config|
  config.dont_dump_these_tables = []
end
```

##### config.schema_dump_path

The path to which the schema DDL of your `test_data` database will be written.
This is only used by [rake test_data:load](#test_dataload) when initializing the
`test_data` database. Defaults to `"test/support/test_data/schema.sql"`.

```ruby
TestData.config do |config|
  config.schema_dump_path = "test/support/test_data/schema.sql"
end
```

##### config.data_dump_path

The path that the SQL dump of your test data will be written. This is the dump
that will be executed by `TestData.uses_test_data` in your tests. Defaults to
`"test/support/test_data/data.sql"`.

```ruby
TestData.config do |config|
  config.data_dump_path = "test/support/test_data/data.sql"
end
```

##### config.non_test_data_dump_path

The path to which the [non_test_data_tables](#confignon_test_data_tables) in
your `test_data` database will be written. This is only used by [rake
test_data:load](#test_dataload) when initializing the `test_data` database.
Defaults to `"test/support/test_data/non_test_data.sql"`.

```ruby
TestData.config do |config|
  config.non_test_data_dump_path = "test/support/test_data/non_test_data.sql"
end
```

#### Other configuration options

##### config.truncate_these_test_data_tables

By default, when [TestData.uses_clean_slate](#testdatauses_clean_slate) is
called, it will truncate any tables for which an `INSERT` operation was
detected in your test data SQL dump. This may not be suitable for every case,
however, so this option allows you to specify which tables are truncated.
Defaults to `nil`.

```ruby
TestData.config do |config|
  config.truncate_these_test_data_tables = []
end
```

##### config.log_level

The gem outputs its messages to standard output and error by assigning a log
level to each message. Valid values are `:debug`, `:info`, `:warn`, `:error`,
`:quiet`. Defaults to `:info`.

```ruby
TestData.config do |config|
  config.log_level = :info
end
```

### TestData.insert_test_data_dump

If you just want to insert the test data in your application's SQL dumps without
any of the transaction management or test runner assumptions inherent in
[TestData.uses_test_data](#testdatauses_test_data), then you can call
`TestData.insert_test_data_dump` to load and execute the dump.

This might be necessary in a few different situations:

* Running tests in environments that can't be isolated to a single database
  transaction (e.g. orchestrating tests across multiple databases, processes,
  etc.)
* You might ant to use your test data to seed pre-production environments with
  enough data to exploratory test (as you might do in a `postdeploy` script with
  your [Heroku Review
  Apps](https://devcenter.heroku.com/articles/github-integration-review-apps))
* Your tests require complex heterogeneous sources of data that aren't a good
  fit for the assumptions and constraints of this library's default methods for
  preparing test data

In any case, since `TestData.insert_test_data_dump` is not wrapped in a
transaction, when used for automated tests, data cleanup becomes your
responsibility.

## Assumptions

The `test_data` gem is still brand new and doesn't cover every use case just
yet. Here are some existing assumptions and limitations:

* You're using Postgres

* You're using Rails 6 or higher

* Your app does not require Rails' [multi-database
  support](https://guides.rubyonrails.org/active_record_multiple_databases.html)
  in order to be tested

* Your app has the binstubs `bin/rake` and `bin/rails` that Rails generates and
  they work (protip: you can regenerate them with `rails app:update:bin`)

* Your `database.yml` defines a `&default` alias from which to extend the
  `test_data` database configuration (if your YAML file lacks one, you can
  always specify the `test_data` database configuration manually)

## Fears, Uncertainties, and Doubts

### But we use and like `factory_bot` and so I am inclined to dislike everything about this gem!

If you use `factory_bot` and all of these are true:

* Your integration tests are super fast and are not getting significantly slower
  over time

* Minor changes to existing factories rarely result in test failures that
  require unrelated tests to be read & updated to get them passing again

* The number of associated records generated between your most-used factories
  are representative of production data, as opposed to generating a sprawling
  hierarchy of models, as if your test just ordered "one of everything" off the
  menu

* Your default factories generate models that resemble real records created by
  your production application, as opposed to representing the
  sum-of-all-edge-cases with every boolean flag enabled and optional attribute
  set

* You've avoided mitigating the above problems with confusingly-named and
  confidence-eroding nested factories with names like `:user`, `:basic_user`,
  `:lite_user`, and `:plain_user_no_associations_allowed`

If none of these things are true, then congratulations! You are probably using
`factory_bot` to great effect! Unfortunately, in our experience, this outcome
is exceedingly rare, especially for large and long-lived applications.

However, if you'd answer "no" to any of the above questions, just know that
these are the sorts of failure modes the `test_data` gem was designed to
avoid—and we hope you'll consider trying it with an open mind. At the same time,
we acknowledge that large test suites can't be rewritten and migrated to a
different source of test data overnight—nor should they be! See our notes on
[migrating to `test_data`
incrementally](#factory--fixture-interoperability-guide)

### How will I handle merge conflicts in these SQL files if I have lots of people working on lots of feature branches all adding to the `test_data` database dumps?

In a word: carefully!

First, in terms of expectations-setting, you should expect your test data SQL
dumps to churn at roughly the same rate as your schema: lots of changes up
front, but tapering off as the application stabilizes.

If your schema isn't changing frequently and you're not running data migrations
against production very often, it might make the most sense to let this concern
present itself as a real problem before attempting to solve it, as you're likely
to find that other best-practices around collaboration and deployment (frequent
merges, continuous integration, coordinating breaking changes) will also manage
this risk. The reason that the dumps are stored as plain SQL (aside from the
fact that git's text compression is very good) is to make merge conflicts with
other branches feasible, if not entirely painless.

However, if your app is in the very initial stages of development or you're
otherwise making breaking changes to your schema and data very frequently, our
best advice is to hold off a bit on writing _any_ integration tests that depend
on shared sources of test data (regardless of tool), as they'll be more likely
to frustrate your ability to rapidly iterate than detect bugs. Once you you have
a reasonably stable feature working end-to-end, that's a good moment to start
adding integration tests—and perhaps pulling in a gem like this one to help you.

### Why can't I save multiple database dumps to cover different scenarios?

For the same reason you (probably) don't have multiple production databases: the
fact that Rails apps are monolithic and consolidated is a big reason why they're
so productive and comprehensible. This gem is not
[VCR](https://github.com/vcr/vcr) for databases. If you were to design separate
test data dumps for each feature, stakeholder, or concern, you'd also have more
moving parts to maintain, more complexity to communicate, and more pieces that
could someday fall into disrepair.

By having a single `test_data` database that grows up with your application just
like `production` does—with both having their schemas and data migrated
incrementally over time—your integration tests that depend on `test_data` will
have an early opportunity to catch bugs that otherwise wouldn't be found until
they were deployed into a long-lived staging or (gasp!) production environment.

### Are you sure I should commit these SQL dumps? They're way too big!

If the dump files generated by `test_data:dump` seem massive, consider the
cause:

1. If you inadvertently created more data than necessary, you might consider
   resetting (or rolling back) your changes and making another attempt at
   generating a more minimal set of test data

2. If some records persisted by your application aren't very relevant to your
   tests, you might consider either of these options:

    * If certain tables are necessary for running the app but aren't needed by
      your tests, you can add them to the `config.non_test_data_tables`
      configuration array. They'll still be committed to git, but won't loaded
      by your tests

    * If the certain tables are not needed by your application or by your tests
      (e.g. audit logs), add them to the `config.dont_dump_these_tables` array,
      and they won't be persisted by `rake test_data:dump`

3. If the dumps are _necessarily_ really big (some apps are complex!), consider
   looking into [git-lfs](https://git-lfs.github.com) for tracking them without
   impacting the size and performance of the git slug. (See [GitHub's
   documentation](https://docs.github.com/en/github/managing-large-files/working-with-large-files)
   on what their service supports)

_[Beyond these options, we'd also be interested in a solution that filtered data
in a more granular way than ignoring entire tables. If you have a proposal you'd
be interested in implementing, [suggest it in an issue](/issues/new)!]_

### Tests shouldn't use shared test data, they should instantiate the objects they need!

Agreed! Nothing is simpler than calling `new` to create an object.

If it's possible to write a test that looks like this, do it. Don't use shared
test data loaded from this gem or any other:

```ruby
def test_exclude_cancelled_orders
  good_order = Order.new
  bad_order = Order.new(cancelled: true)
  user = User.create!(orders: [good_order, bad_order])

  result = user.active_orders

  assert_includes good_order
  refute_includes bad_order
end
```

This test is simple, self-contained, clearly demarcates the
[arrange-act-assert](https://github.com/testdouble/contributing-tests/wiki/Arrange-Act-Assert)
phases, and (most importantly) will only fail if the functionality stops
working. Maximizing the number of tests that can be written expressively and
succinctly without the aid of shared test data is a laudable goal that more
teams should embrace.

However, what if the code you're writing doesn't need 3 records in the database,
but 30? Writing that much test setup would be painstaking, despite being
fully-encapsulated. Long test setup is harder for others to read and understand.
And because that setup depends on more of your system's code, it will have more
reasons to break as your codebase changes. At that point, you have two options:

1. Critically validate your design: why is it so hard to set up? Does it
   _really_ require so much persisted data to exercise this behavior? Would a
   [plain old Ruby
   object](https://steveklabnik.com/writing/the-secret-to-rails-oo-design) that
   defined a pure function have been feasible? Could a model instance or even a
   `Struct` be passed to the
   [subject](https://github.com/testdouble/contributing-tests/wiki/Subject)
   instead of loading everything from the database? When automated testing is
   saved for the very end of a feature's development, it can feel too costly to
   reexamine design decisions like this, but it can be valuable to consider all
   the same. *Easy to test code is easy to use code*

2. If the complex setup is a necessary reality of the situation that your app
   needs to handle (and it often will be!), then having _some_ kind of shared
   source of test data to use as a starting point can be hugely beneficial.
   That's why `factory_bot` is so popular, why this gem exists, etc.

As a result, there is no one-size-fits-all approach. Straightforward behavior
that can be invoked with a clear, concise test has no reason to be coupled to a
shared source of test data. Meanwhile, tests of more complex behaviors that
require lots of carefully-arranged data might be unmaintainable without a shared
source of test data to lean on. So both kinds of test clearly have their place.

But this is a pretty nuanced discussion that can be hard to keep in mind when
under deadline pressure or on a large team where building consensus around norms
is challenging. As a result, leaving the decision of which type of test to write
to spur-of-the-moment judgment is likely to result in inconsistent test design.
Instead, you might consider separating these two categories into separate test
types or suites, with simple heuristics to determine which types of code demand
which type of test.

For example, it would be completely reasonable to load this gem's test data for
integration tests, but not for basic tests of models, like so:

```ruby
class ActionDispatch::IntegrationTest
  setup do
    TestData.uses_test_data
  end
end

class ActiveSupport::TestCase
  setup do
    TestData.uses_clean_slate
  end
end
```

In short, this skepticism is generally healthy, and encapsulated tests that
forego reliance on shared sources of test data should be maximized. For
everything else, there's `test_data`.

### I'm worried my tests aren't as fast as they should be

The `test_data` gem was written to enable tests that are not only more
comprehensible and maintainable over the long-term, but also _much faster_ to
run. That said—and especially if you're adding `test_data` to an existing test
suite—care should be taken to audit everything the suite does between tests in
order to optimize its overall runtime.

#### Randomized test order leading to data churn

Generally speaking, randomizing the order in which tests run is an unmitigated
win: randomizing helps you catch any unintended dependency between two tests
early, when it's still cheap & easy to fix. However, if your tests use different
sources of test data (e.g. some call `TestData.uses_test_data` and some call
`TestData.uses_clean_slate`), it's very likely that randomizing your tests will
result in a significantly slower overall test suite. Instead, if you group tests
that use the same type of test data together (e.g. by separating them into
separate suites), you might find profound speed gains.

To illustrate why, suppose you have 5 tests that call `TestData.uses_test_data`
and 5 that call `TestData.uses_rails_fixtures`. If a test that calls
`TestData.uses_test_data` is followed by another that calls `uses_test_data`,
the only operation needed by the second call will be a rollback to the savepoint
taken after the test data was loaded. If, however, a `uses_test_data` test is
followed by a `uses_rails_fixtures` test, then a lot more work is required:
first a rollback, then the truncation of the test data, then a load of the
fixtures followed by creation of a new savepoint—which would in tunr be undone
again if the _next_ test happened to call `uses_test_data`. Switching between
tests that use different sources of test data can cause significant unnecessary
thrashing.

To illustrate the above, if all of these tests ran in random order (the
default), you might see:

```
$ bin/rails test test/example_test.rb
Run options: --seed 63999

# Running:

   test_data -- loading test_data SQL dump
.  fixtures  -- truncating tables, loading Rails fixtures
.  fixtures  -- rolling back to Rails fixtures
.  test_data -- rolling back to clean test_data
.  fixtures  -- truncating tables, loading Rails fixtures
.  test_data -- rolling back to clean test_data
.  fixtures  -- truncating tables, loading Rails fixtures
.  test_data -- rolling back to clean test_data
.  fixtures  -- truncating tables, loading Rails fixtures
.  test_data -- rolling back to clean test_data
.

Finished in 2.449957s, 4.0817 runs/s, 4.0817 assertions/s.
10 runs, 10 assertions, 0 failures, 0 errors, 0 skips
```

So, what can you do to speed this up? The most effective strategy to avoiding
this churn is to group the execution of each tests that use each source of test
data into sub-suites that are run serially, on e after the other.

* If you're using Rails' defualt Minitest, we wrote a gem called
  [minitest-suite](https://github.com/testdouble/minitest-suite) to accomplish
  exactly this. Just declare something like `suite :test_data` or `suite
  :fixtures` at the top of each test class
* If you're using RSpec, the
  [tag](https://relishapp.com/rspec/rspec-core/v/3-10/docs/command-line/tag-option)
  feature can help you organize your tests by type, but you'll likely have to
  run a separate CLI invocation for each to avoid the tests from being
  interleaved

Here's what the same example would do at run-time after adding
[minitest-suite](https://github.com/testdouble/minitest-suite):

```
$ bin/rails test test/example_test.rb
Run options: --seed 50105

# Running:

   test_data -- loading test_data SQL dump
.  test_data -- rolling back to clean test_data
.  test_data -- rolling back to clean test_data
.  test_data -- rolling back to clean test_data
.  test_data -- rolling back to clean test_data
.  fixtures -- truncating tables, loading Rails fixtures
.  fixtures -- rolling back to clean fixtures
.  fixtures -- rolling back to clean fixtures
.  fixtures -- rolling back to clean fixtures
.  fixtures -- rolling back to clean fixtures
.

Finished in 2.377050s, 4.2069 runs/s, 4.2069 assertions/s.
10 runs, 10 assertions, 0 failures, 0 errors, 0 skips
```

By grouping the execution in this way, the most expensive operations will
usually only be run once: at the beginning of the first test in each suite.

#### Expensive data manipulation

If you're doing anything repeatedly that's data-intensive in your test setup
after calling one of the `TestData.uses_*` methods, that operation is being
repeated once per test, which could be very slow. Instead, you might consider
moving that behavior into a [lifecycle hook](#lifecycle-hooks).

Any code passed to a lifecycle hook will only be executed when data is
_actually_ loaded or truncated and its effect will be included in the
transaction savepoint that the `test_data` gem rolls back between tests.
Seriously, appropriately moving data adjustments into these hooks can cut your
test suite's runtime by an order of magnitude.

#### Redundant test setup tasks

One of the most likely sources of unnecessary slowness is redundant test
cleanup. The speed gained from sandwiching every expensive operation between
transaction savepoints can be profound… but can also easily be erased by a
single before-each hook calling
[database_cleaner](https://github.com/DatabaseCleaner/database_cleaner) to
commit a truncation of the database. As a result, it's worth taking a little
time to take stock of everything that's called between tests during setup &
teardown to ensure multiple tools aren't attempting to clean up the state of the
database and potentially interfering with one another.

## Code of Conduct

This project follows Test Double's [code of
conduct](https://testdouble.com/code-of-conduct) for all community interactions,
including (but not limited to) one-on-one communications, public posts/comments,
code reviews, pull requests, and GitHub issues. If violations occur, Test Double
will take any action they deem appropriate for the infraction, up to and
including blocking a user from the organization's repositories.