PERFORMANCE.md in pupa-0.1.7 vs PERFORMANCE.md in pupa-0.1.8

- old
+ new

@@ -1,22 +1,20 @@
-# Pupa.rb: A Data Scraping Framework
+# Pupa.rb: Performance
 
-## Performance
-
 Pupa.rb offers several ways to significantly improve performance.
 
 In an example case, reducing disk I/O and skipping validation as described below reduced the time to scrape 10,000 documents from 100 cached HTTP responses from 100 seconds down to 5 seconds. Like fast tests, fast scrapers make development smoother.
 
 The `import` action's performance is currently limited by the database when a dependency graph is used to determine the evaluation order. If a dependency graph cannot be used because you don't know a related object's ID, [several optimizations](https://github.com/opennorth/pupa-ruby/issues/12) can be implemented to improve performance.
 
-### Reducing HTTP requests
+## Reducing HTTP requests
 
 HTTP requests consume the most time. To avoid repeat HTTP requests while developing a scraper, cache all HTTP responses. Pupa.rb will by default use a `web_cache` directory in the same directory as your script. You can change the directory by setting the `--cache_dir` switch on the command line, for example:
 
     ruby cat.rb --cache_dir /tmp/my_cache_dir
 
-### Parallelizing HTTP requests
+## Parallelizing HTTP requests
 
 To enable parallel requests, use the `typhoeus` gem. Unless you are using an old version of Typhoeus (< 0.5), both Faraday and Typhoeus define a Faraday adapter, but you must use the one defined by Typhoeus, like so:
 
 ```ruby
 require 'pupa'
@@ -56,15 +54,15 @@
     # If error...
   end
 end
 ```
 
-### Reducing disk I/O
+## Reducing disk I/O
 
 After HTTP requests, disk I/O is the slowest operation. Two types of files are written to disk: HTTP responses are written to the cache directory, and JSON documents are written to the output directory. Writing to memory is much faster than writing to disk.
 
-#### RAM file systems
+### RAM file systems
 
 A simple solution is to create a file system in RAM, like `tmpfs` on Linux for example, and to use it as your `output_dir` and  `cache_dir`. On OS X, you must create a RAM disk. To create a 128MB RAM disk, for example, run:
 
     ramdisk=$(hdiutil attach -nomount ram://$((128 * 2048)) | tr -d ' \t')
     diskutil erasevolume HFS+ 'ramdisk' $ramdisk
@@ -78,19 +76,19 @@
     diskutil unmount $ramdisk
     hdiutil detach $ramdisk
 
 Using a RAM disk will significantly improve performance; however, the data will be lost between reboots unless you move the data to a hard disk. Using Memcached (for caching) and Redis (for storage) is moderately faster than using a RAM disk, and Redis will not lose your output data between reboots.
 
-#### Memcached
+### Memcached
 
 You may cache HTTP responses in [Memcached](http://memcached.org/). First, require the `dalli` gem. Then:
 
     ruby cat.rb --cache_dir memcached://localhost:11211
 
 The data in Memcached will be lost between reboots.
 
-#### Redis
+### Redis
 
 You may dump JSON documents in [Redis](http://redis.io/). First, require the `redis-store` gem. Then:
 
     ruby cat.rb --output_dir redis://localhost:6379/0
 
@@ -100,20 +98,20 @@
 
 Requiring the `hiredis` gem will slightly improve performance.
 
 Note that Pupa.rb flushes the Redis database before scraping. If you use Redis, **DO NOT** share a Redis database with Pupa.rb and other applications. You can select a different database than the default `0` for use with Pupa.rb by passing an argument like `redis://localhost:6379/15`, where `15` is the database number.
 
-### Skipping validation
+## Skipping validation
 
 The `json-schema` gem is slow compared to, for example, [JSV](https://github.com/garycourt/JSV). Setting the `--no-validate` switch and running JSON Schema validations separately can further reduce a scraper's running time.
 
 The [pupa-validate](https://npmjs.org/package/pupa-validate) npm package can be used to validate JSON documents using the faster JSV. In an example case, using JSV instead of the `json-schema` gem reduced by half the time to validate 10,000 documents.
 
-### Ruby version
+## Ruby version
 
 Pupa.rb requires Ruby 2.x. If you have already made all the above optimizations, you may notice a significant improvement by using Ruby 2.1, which has better garbage collection than Ruby 2.0.
 
-### Profiling
+## Profiling
 
 You can profile your code using [perftools.rb](https://github.com/tmm1/perftools.rb). First, install the gem:
 
     gem install perftools.rb