=data_miner
Mine remote data into your ActiveRecord models.
==Quick start
Put this in config/environment.rb:
config.gem 'data_miner'
You need to define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb:
class Country < ActiveRecord::Base
data_miner do |step|
# import country names and country codes
step.import :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do |attr|
attr.key :iso_3166, :field_name => 'country code'
attr.store :iso_3166, :field_name => 'country code'
attr.store :name, :field_name => 'country'
end
end
end
...and in app/models/airport.rb:
class Airport < ActiveRecord::Base
belongs_to :country
data_miner do |step|
# import airport iata_code, name, etc.
step.import(:url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false) do |attr|
attr.key :iata_code, :field_number => 3
attr.store :name, :field_number => 0
attr.store :city, :field_number => 1
attr.store :country, :field_number => 2, :foreign_key => :name # will use Country.find_by_name(X)
attr.store :iata_code, :field_number => 3
attr.store :latitude, :field_number => 5
attr.store :longitude, :field_number => 6
end
end
end
Put this in lib/tasks/data_miner_tasks.rake: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now)
namespace :data_miner do
task :run => :environment do
DataMiner.run :resource_names => ENV['RESOURCES'].to_s.split(/\s*,\s*/).flatten.compact
end
end
You need to specify what order to mine data. For example, in config/initializers/data_miner_config.rb:
DataMiner.enqueue do |queue|
queue << Country # class whose data should be mined 1st
queue << Airport # class whose data should be mined 2nd
# etc
end
Once you have (1) set up the order of data mining and (2) defined data_miner blocks in your classes, you can:
$ rake data_miner:run
==Complete example
~ $ rails testapp
~ $ cd testapp/
~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_id:integer latitude:float longitude:float
~/testapp $ ./script/generate model Country iso_3166:string name:string
~/testapp $ rake db:migrate
~/testapp $ touch lib/tasks/data_miner_tasks.rb
[...edit per quick start...]
~/testapp $ touch config/initializers/data_miner_config.rake
[...edit per quick start...]
~/testapp $ rake data_miner:run
Now you should have
~/testapp $ ./script/console
Loading development environment (Rails 2.3.3)
>> Airport.first.iata_code
=> "GKA"
>> Airport.first.country.name
=> "Papua New Guinea"
==Authors
* Seamus Abshere
* Andy Rossmeissl
==Copyright
Copyright (c) 2009 Brighter Planet. See LICENSE for details.