=data_miner
Mine remote data into your ActiveRecord models.
==Quick start
Put this in config/environment.rb:
config.gem 'data_miner'
You need to define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb:
class Country < ActiveRecord::Base
set_primary_key :iso_3166
data_miner do
import 'The official ISO country list', :url => 'http://www.iso.org/iso/list-en1-semic-3.txt', :skip => 2, :headers => false, :delimiter => ';' do
key 'iso_3166'
store 'iso_3166', :field_number => 1
store 'name', :field_number => 0
end
import 'A Princeton dataset with better capitalization for some countries', :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do
key 'iso_3166'
store 'iso_3166', :field_name => 'country code'
store 'name', :field_name => 'country'
end
end
end
...and in app/models/airport.rb:
class Airport < ActiveRecord::Base
set_primary_key :iata_code
data_miner do
import :url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false, :select => lambda { |row| row[4].present? } do
key 'iata_code'
store 'name', :field_number => 1
store 'city', :field_number => 2
store 'country_name', :field_number => 3
store 'iata_code', :field_number => 4
store 'latitude', :field_number => 6
store 'longitude', :field_number => 7
end
end
end
Put this in lib/tasks/data_miner_tasks.rake: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now)
namespace :data_miner do
task :run => :environment do
DataMiner.run :resource_names => ENV['RESOURCES'].to_s.split(/\s*,\s*/).flatten.compact
end
end
Once you have (1) set up the order of data mining and (2) defined data_miner blocks in your classes, you can:
$ rake data_miner:run RESOURCES=Airport,Country
==Complete example
~ $ rails testapp
~ $ cd testapp/
~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_name:string latitude:float longitude:float
[...edit migration to make iata_code the primary key...]
~/testapp $ ./script/generate model Country iso_3166:string name:string
[...edit migration to make iso_3166 the primary key...]
~/testapp $ rake db:migrate
~/testapp $ touch lib/tasks/data_miner_tasks.rb
[...edit per quick start...]
~/testapp $ rake data_miner:run RESOURCES=Airport,Country
Now you should have
~/testapp $ ./script/console
Loading development environment (Rails 2.3.3)
>> Airport.first.iata_code
=> "GKA"
>> Airport.first.country_name
=> "Papua New Guinea"
==Authors
* Seamus Abshere
* Andy Rossmeissl
==Copyright
Copyright (c) 2010 Brighter Planet. See LICENSE for details.