Path: | README.rdoc |
Last Update: | Fri Mar 25 16:22:16 -0400 2011 |
SIP is a ETL tool for extracting SQL databases and importing them into Hive. It was created because the ability to transform columns and partition data was an absolute requirement, and no other tool provided that functionality.
Unique features include:
Bug reports and pull requests welcome on Github.
if no primary key (default: id), must set incremental_index to blank
sip [--db <dbname>] [--table <tablename>] [-c <config location>]
Per table to be imported, SIP determines the queries necessary to perform an export and the creates scripts (one per datanode) that are then run individually in parallel. Each script:
Then, all of the partitions are imported from HDFS into Hive. Easy squeezy.