= SIP: SQL to Hive Importer SIP is a ETL[http://en.wikipedia.org/wiki/Extract,_transform,_load] tool for extracting SQL databases and importing them into Hive[http://hive.apache.org]. It was created because the ability to transform columns and partition data was an absolute requirement, and no other tool provided that functionality. Unique features include: * The ability to transform columns (using Ruby code) * A single, simple, human readable configuration file * The ability to parition tables in Hive based on the value of any (possibly transformed) columns Bug reports and pull requests welcome on github[https://github.com/livingsocial/sip]. == Requirements * A working Hadoop installation * Hive on the namenode * Ruby >= 1.8.6 on all datanodes * SQL Ruby libs == Installation, Configuration, and Use gem install sip note: if no primary key (default: id), must set incremental_index to blank == How it Works Per table to be imported, SIP determines the queries necessary to perform an export and the creates scripts (one per datanode) that are then run individually in parallel. Each script: 1. Copies a transformation / partition (transpart) script to the datanode 1. Performs the SQL extraction, piping output through the transpart script 1. Uploads all partitions to HDFS Then, all of the partitions are imported from HDFS into Hive. Easy squeezy.