Gigantron: Processor of Data
Get Version
0.1.0→ ‘gigantron’
What
Gigantron is a simple framework for the creation and organization of data processing projects. Data-processing transforms are created as Rake tasks and data is handled through ActiveRecord models. (DataMapper was the original plan, but it has problems playing nicely with JRuby for now).
Ruby is great for exploratory data processing. Data processing projects tend to grow up and encompass large numbers of random scripts and input files. It is easy to get lost in coding and lose organization. Gigantron is an attempt to use code generation and random magic to make maintaining organized DP projects simple. Code is separated into data (models) and operations on the data (tasks). Code generators stub out these files and the associated tests for the user.
Gigantron was written for my own needs working with atmospheric data and will evolve through use to reduce the trivialities that can sometimes dominate the work of developers.
Installing
sudo gem install gigantron
The basics
# Generate new project shell> $ gigantron project create create tasks create db create models create lib create test create Rakefile create database.yml create initialize.rb create tasks/import.rake create test/test_helper.rb create test/models create test/tasks create test/tasks/test_import.rb dependency install_rubigen_scripts create script create script/generate create script/destroy shell> $ cd project # Create new model shell> $ script/generate model modis exists models/ create models/modis.rb exists test/ exists test/models/ create test/models/test_modis.rb shell> $ script/generate task modis_to_kml exists tasks/ create tasks/modis_to_kml.rake exists test/ exists test/tasks/ create test/tasks/test_modis_to_kml.rb
One can edit these files to add functionality. Gigantron by default includes ActiveSupport for convenience.
Hacking
Gigantron is super minimal now, so modifying it is pretty easy. The gigantron
application generator (that which is invoked by the gigantron
command) lives
in app_generators/gigantron/
. The template files and the template dir
structure are in app_generators/gigantron/templates/
. When adding new
templates, directories, or files, add the names first to the tests in
test/test_gigantron_generator.rb
, then the stubs to
app_generators/gigantron/templates
, and then finally describe them in the
manifest section of app_generators/gigantron/gigantron_generator.rb
. It
should look something like
def manifest record do |m| ... m.file "new_file", "new_file" #straight file copy m.directory "my_new_dir" #create directory m.template "new_thing", "new_thing" #runs file through ERB when copying end end
It might be handy to know that in gigantron_generator.rb
the name provided to the generator can be referenced as @name
. This can be used like
record do |m| m.file "renamed_file", "#{@name}_file" end
The same value is available to your templates as just name
. You can template
a file like
class Test<%= name.capitalize %> < Test::Unit::TestCase ... end
The same process applies to the model and task generator for gigantron
projects. These generators live in gigantron_generators/
. Modifying them
is exactly the same as modifying the application generator.
All of this is pretty vanilla RubiGen, so if in doubt, check out the docos on that fine piece of work.
The only other place for code in Gigantron is is lib/gigantron/tasks/
where a few boilerplate test and db tasks live. I think I ripped the test tasks off of rails.
If you have any questions, do contact me. I am interested in anything that will make Gigantron suck less and be useful to people.
How to submit patches
git clone git://github.com/schleyfox/gigantron.git
Build and test instructions
cd gigantron cp test/template_database.yml.example test/template_database.yml vim test/template_database.yml rake test rake install_gem
License
This code is free to use under the terms of the MIT license.
Contact
Comments are welcome. Send an email to Ben Hughes
Ben Hughes, 18th June 2008
Theme extended from Paul Battley