README.md in pupa-0.0.13 vs README.md in pupa-0.1.0
- old
+ new
@@ -5,10 +5,12 @@
[![Coverage Status](https://coveralls.io/repos/opennorth/pupa-ruby/badge.png?branch=master)](https://coveralls.io/r/opennorth/pupa-ruby)
[![Code Climate](https://codeclimate.com/github/opennorth/pupa-ruby.png)](https://codeclimate.com/github/opennorth/pupa-ruby)
Pupa.rb is a Ruby 2.x fork of Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa). It implements an Extract, Transform and Load (ETL) process to scrape data from online sources, transform it, and write it to a database.
+ gem install pupa
+
## What it tries to solve
Pupa.rb's goal is to make scraping less painful by solving common problems:
* If you are updating a database by scraping a website, you can either delete and recreate records, or you can merge the scraped records with the saved records. Pupa.rb offers a simple way to merge records, by using an object's stable properties for identification.
@@ -204,9 +206,24 @@
You may want to set the `CPUPROFILE_REALTIME=1` flag; however, it seems to interfere with HTTP requests, for whatever reason.
[perftools.rb](https://github.com/tmm1/perftools.rb) has several output formats. If your code is straight-forward, you can draw a graph (changing `/tmp/PROFILE_NAME` and `/tmp/PROFILE_NAME.pdf` as appropriate):
pprof.rb --pdf /tmp/PROFILE_NAME > /tmp/PROFILE_NAME.pdf
+
+## Integration with ODMs
+
+### Mongoid
+
+`Pupa::Model` is incompatible with `Mongoid::Document`. Don't do this:
+
+```ruby
+class Cat
+ include Pupa::Model
+ include Mongoid::Document
+end
+```
+
+Instead, have a scraping model that includes `Pupa::Model` and an app model that includes `Mongoid::Document`.
## Testing
**DO NOT** run this gem's specs if you are using Redis database number 15 on `localhost`!