=data_miner Mine remote data into your ActiveRecord models. ==Quick start Put this in config/environment.rb: config.gem 'data_miner' You need to define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb: class Country < ActiveRecord::Base set_primary_key :iso_3166 data_miner do import 'The official ISO country list', :url => 'http://www.iso.org/iso/list-en1-semic-3.txt', :skip => 2, :headers => false, :delimiter => ';' do key 'iso_3166' store 'iso_3166', :field_number => 1 store 'name', :field_number => 0 end import 'A Princeton dataset with better capitalization for some countries', :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do key 'iso_3166' store 'iso_3166', :field_name => 'country code' store 'name', :field_name => 'country' end end end ...and in app/models/airport.rb: class Airport < ActiveRecord::Base set_primary_key :iata_code data_miner do import :url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false, :select => lambda { |row| row[4].present? } do key 'iata_code' store 'name', :field_number => 1 store 'city', :field_number => 2 store 'country_name', :field_number => 3 store 'iata_code', :field_number => 4 store 'latitude', :field_number => 6 store 'longitude', :field_number => 7 end end end Put this in lib/tasks/data_miner_tasks.rake: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now) namespace :data_miner do task :run => :environment do DataMiner.run :resource_names => ENV['RESOURCES'].to_s.split(/\s*,\s*/).flatten.compact end end Once you have (1) set up the order of data mining and (2) defined data_miner blocks in your classes, you can: $ rake data_miner:run RESOURCES=Airport,Country ==Complete example ~ $ rails testapp ~ $ cd testapp/ ~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_name:string latitude:float longitude:float [...edit migration to make iata_code the primary key...] ~/testapp $ ./script/generate model Country iso_3166:string name:string [...edit migration to make iso_3166 the primary key...] ~/testapp $ rake db:migrate ~/testapp $ touch lib/tasks/data_miner_tasks.rb [...edit per quick start...] ~/testapp $ rake data_miner:run RESOURCES=Airport,Country Now you should have ~/testapp $ ./script/console Loading development environment (Rails 2.3.3) >> Airport.first.iata_code => "GKA" >> Airport.first.country_name => "Papua New Guinea" ==Authors * Seamus Abshere * Andy Rossmeissl ==Copyright Copyright (c) 2010 Brighter Planet. See LICENSE for details.