README.rdoc in data_miner-0.3.13 vs README.rdoc in data_miner-0.4.0

- old
+ new

@@ -9,35 +9,41 @@ config.gem 'data_miner' You need to define <tt>data_miner</tt> blocks in your ActiveRecord models. For example, in <tt>app/models/country.rb</tt>: class Country < ActiveRecord::Base - data_miner do |step| - # import country names and country codes - step.import :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do |attr| - attr.key :iso_3166, :field_name => 'country code' - attr.store :iso_3166, :field_name => 'country code' - attr.store :name, :field_name => 'country' + set_primary_key :iso_3166 + + data_miner do + import 'The official ISO country list', :url => 'http://www.iso.org/iso/list-en1-semic-3.txt', :skip => 2, :headers => false, :delimiter => ';' do + key 'iso_3166' + store 'iso_3166', :field_number => 1 + store 'name', :field_number => 0 end + + import 'A Princeton dataset with better capitalization for some countries', :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do + key 'iso_3166' + store 'iso_3166', :field_name => 'country code' + store 'name', :field_name => 'country' + end end end ...and in <tt>app/models/airport.rb</tt>: class Airport < ActiveRecord::Base - belongs_to :country + set_primary_key :iata_code - data_miner do |step| - # import airport iata_code, name, etc. - step.import(:url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false) do |attr| - attr.key :iata_code, :field_number => 3 - attr.store :name, :field_number => 0 - attr.store :city, :field_number => 1 - attr.store :country, :field_number => 2, :foreign_key => :name # will use Country.find_by_name(X) - attr.store :iata_code, :field_number => 3 - attr.store :latitude, :field_number => 5 - attr.store :longitude, :field_number => 6 + data_miner do + import :url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false, :select => lambda { |row| row[4].present? } do + key 'iata_code' + store 'name', :field_number => 1 + store 'city', :field_number => 2 + store 'country_name', :field_number => 3 + store 'iata_code', :field_number => 4 + store 'latitude', :field_number => 6 + store 'longitude', :field_number => 7 end end end Put this in <tt>lib/tasks/data_miner_tasks.rake</tt>: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now) @@ -46,47 +52,39 @@ task :run => :environment do DataMiner.run :resource_names => ENV['RESOURCES'].to_s.split(/\s*,\s*/).flatten.compact end end -You need to specify what order to mine data. For example, in <tt>config/initializers/data_miner_config.rb</tt>: - - DataMiner.enqueue do |queue| - queue << Country # class whose data should be mined 1st - queue << Airport # class whose data should be mined 2nd - # etc - end - Once you have (1) set up the order of data mining and (2) defined <tt>data_miner</tt> blocks in your classes, you can: - $ rake data_miner:run + $ rake data_miner:run RESOURCES=Airport,Country ==Complete example ~ $ rails testapp ~ $ cd testapp/ - ~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_id:integer latitude:float longitude:float + ~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_name:string latitude:float longitude:float + [...edit migration to make iata_code the primary key...] ~/testapp $ ./script/generate model Country iso_3166:string name:string + [...edit migration to make iso_3166 the primary key...] ~/testapp $ rake db:migrate ~/testapp $ touch lib/tasks/data_miner_tasks.rb [...edit per quick start...] - ~/testapp $ touch config/initializers/data_miner_config.rake - [...edit per quick start...] - ~/testapp $ rake data_miner:run + ~/testapp $ rake data_miner:run RESOURCES=Airport,Country Now you should have ~/testapp $ ./script/console Loading development environment (Rails 2.3.3) >> Airport.first.iata_code => "GKA" - >> Airport.first.country.name + >> Airport.first.country_name => "Papua New Guinea" ==Authors * Seamus Abshere <seamus@abshere.net> * Andy Rossmeissl <andy@rossmeissl.net> ==Copyright -Copyright (c) 2009 Brighter Planet. See LICENSE for details. +Copyright (c) 2010 Brighter Planet. See LICENSE for details.