0.1.0 - Dec 6, 2006 * Initial release 0.2.0 - Dec 7, 2006 * Added an XML parser for source parsing * Added support for compound key constraints in destinations via the :unique => [] option * Added ability to declare explicit columns in bulk import * Added support for generators in destinations * Added a SurrogateKeyGenerator for cases where the database doesn't support auto generated surrogate keys 0.3.0 - Dec 19, 2006 * Added support for calculated values in virtual fields with Proc 0.4.0 - Jan 11, 2006 * Added :skip_lines option to file source configurations, which can be used to skip the first n lines in the source data file * Added better error handling in delimited parser - an error is now raised if the expected and actual field lengths do not match * Added :truncate option for database destination. Set to true to truncate before importing data. * Added support for :unique => [] option and virtual fields for the database destination 0.5.0 - Feb 17, 2007 * Changed require_gem to gem and added alias to allow for older versions of rubygems. * Added support for Hash in the source configuration where :name => :parser_name defines the parser to use and :options => {} defines options to pass to the parser. * Added support for passing a custom Parser class in the source configuration. * Removed the need to include Enumerable in each parser implementation. * Added new date_to_string and string_to_date transformers. * Implemented foreign_key_lookup transform including an ActiveRecordResolver. * Added real time activity logging which is called when the etl bin script is invoked. * Improved error handling. * Default logger level is now WARN. 0.5.1 - Feb 18, 2007 * Fixed up truncate processor. * Updated HOW_TO_RELEASE doc. 0.5.2 - Feb 19, 2007 * Added error threshold. * Fixed problem with transform error handling. 0.6.0 - Mar 8, 2007 * Fixed missing method problem in validate in Control class. * Removed control validation for now (source could be code in the control file). * Transform interface now defined as taking 3 arguments, the field name, field value and the row. This is not backwards compatible. * Added HierarchyLookupTransform. * Added DefaultTransform which will return a specified value if the initial value is blank. * Added row-level processing. * Added HierarchyExploderProcessor which takes a single hierarchy row and explodes it to multiple rows as used in a hierarchy bridge. * Added ApacheCombinedLogParser which parses Apache Combined Log format, including parsing of the user agent string and the URI, returning a Hash. * Fixed bug in SAX parser so that attributes are now set when the start_element event is received. * Added an HttpTools module which provides some parsing methods (for user agent and URI). * Database source now uses its own class for establishing an ActiveRecord connection. * Log files are now timestamped. * Source files are now archived automatically during the extraction process * Added a :condition option to the destination configuration Hash that accepts a Proc with a single argument passed to it (the row). * Added an :append_rows option to the destination configuration Hash that accepts either a Hash (to append a single row) or an Array of Hashes (to append multiple rows). * Only print the read and written row counts if there is at least one source and one destination respectively. * Added a depends_on directive that accepts a list of arguments of either strings or symbols. Each symbol is converted to a string and .ctl is appended; strings are passed through directly. The dependencies are executed in the order they are specified. * The default field separator in the bulk loader is now a comma (was a tab). 0.6.1 - Mar 22, 2007 * Added support for absolute paths in file sources * Added CopyFieldProcessor 0.7 - Apr 8, 2007 * Job execution is now tracked in a database. This means that ActiveRecord is required regardless of the sources being used in the ETL scripts. An example database configuration for the etl can be found in test/database.example.yml. This file is loaded from either a.) the current working directory or b.) the location specified using the -c command line argument when running the etl command. * etl script now supports the following command line arguments: ** -h or --help: Prints the usage ** -l or --limit: Specifies a limit for the number of source rows to read, useful for testing your control files before executing a full ETL process ** -o or --offset: Specified a start offset for reading from the source, useful for testing your control files before executing a full ETL process ** -c or --config: Specify the database.yml file to configure the ETL execution data store ** -n or --newlog: Write to the logfile rather than appending to it * Database source now supports specifying the select, join and order parts of the query. * Database source understands the limit argument specified on the etl command line * Added CheckExistProcessor * Added CheckUniqueProcessor * Added SurrogateKeyProcessor. The SurrogateKey processor should be used in conjunction with the CheckExistProcessor and CheckUniqueProcessor to provide * Added SequenceProcessor * Added OrdinalizeTransform * Fixed a bug in the trim transform * Sources now provide a trigger file which can be used to indicate that the original source data has been completely extracted to the local file system. This is useful if you need to recover from a failed ETL process. * Updated README 0.7.1 - Apr 8, 2007 * Fixed source caching 0.7.2 - Apr 8, 2007 * Fixed quoting bug in CheckExistProcessor