README.markdown in datashift-0.2.1 vs README.markdown in datashift-0.2.2

- old
+ new

@@ -1,306 +1,327 @@ -## DataShift - -Provides tools to shift data between Ruby, external applications, files and ActiveRecord. - -### Features - -Map Active Record models and associations to files, generate sample templates. - -Import and Export ActiveRecord models through .xls or CSV files, including -all associations and with configurable defaults. - -Create, parse and use Excel/OpenOffice (.xls) documents dynamically from Ruby. - -Easily extendable Loader functionality to deal with non trivial import cases, such -as complex association lookups. - -High level rake tasks for import/export provided. - -Specific loaders and rake tasks for Spree E-Commerce, enabling import/export of all data including Products with -complex associations such as Properties/Taxons/Options/Variants and Images. - -Specific loaders and rake tasks provided out the box for Spree E-Commerce, -enabling import/export of all data including Products with complex associations such as Options/Variants and Images. - -Many example Spreadsheets in spec/fixtures, fully documented with comments for each column. - -## Installation - -Add gem 'datashift' to your Gemfile/bundle, or install the latest gem as usual : - - `gem install datashift` - -To use : - - gem 'datashift' - require 'datashift' - -To pull the tasks in, add this call to your Rakefile : - - DataShift::load_tasks - -N.B - To use the Excel loader, OLE and Excel are NOT required, however -JRuby is required, since it uses Java's Apache POI under the hood to process .xls files. - -Guards are provided, and used internally, for mixed Ruby setups. Can be used like : - - if(DataShift::Guards::jruby? ) - ..do something with Excel - else - ..do something with CSV - end - -## Active Record - Import/Export - -Provides high level rake tasks for importing data via ActiveRecord models into a DB, - from various sources, currently csv or .xls files (Excel/Open Office) - - - jruby -S rake datashift:import:excel model=BlogPost input=BlogPostImport.xls verbose=true - -Provides high level rake tasks for exporting data to various sources, currently .xls files (Excel/Open Office) - - jruby -S rake datashift:export:excel model=BlogPost result=BlogExport.xls - - -The library can be easily extended with Loaders to deal with non trivial cases, - for example when multiple lookups required to find right association. - -Spree loaders are an example, these illustrate over riding processing for specific columns with -complicated lookup requirements. - -A core feature of DataShift is the MethodDictionary and MethodMapper, which provides features for collecting -reflection information from ActiveRecord models (all different associations, including join tables with many-to-many relationships). - -A full picture of all possible operations on a class can be created very easily, for example ona Blog model : - - MethodDictionary.find_operators( Blog ) - -This then allows Import/Export to be achieved, by mapping the file's header and column data to the found operators -i.e. the methods to set data on model's attributes and associations. - -Here we retrieve the method details for a column name from a file, "Blog Date" - - MethodDictionary.find_method_detail( Blog, "Blog Date" ) - -Loaders can use this method lookup functionality, to find the correct association for a column heading, -and populate AR object with row data. - -This means data can be mapped to any model without any further coding. Generators are also supplied to export -a models attributes and associations to files, thus providing template spreadsheets that any user can fill out. - -MethodMapper also stores column type information so the raw file data can be provided as is, -and whenever possible, under the bonnet the data will be cast to correct DB type. - -Here we show how a column name from a file, "Blog Date", can be mapped to Assign a stringified date, to the blog_date column, on a new Blog object : - - MethodDictionary.find_method_detail( Blog, "Blog Date" ).assign( Blog.new, "Sat Jul 23 2011" ) - -Because it's based on reflection against the model, can build complex relationships between tables during import/export, -and extending data files with new columns need not require any additional Ruby coding. - -New columns can simply be added to the Excel/Open Office spreadsheet, or CSV file, setting the new -attribute or association name in the header row. - - -The Loader attempts to handle various human read-able forms of column names. - -For example, given an association on the model called, product_properties, will successfully load -from columns with headings such as 'product_properties', 'Product Properties', 'ProductProperties' 'product properties' etc - -For has_many associations, either multiple columns can be used, - or multiple values can be specified in a single column using suitable delimiters. - -Modular - so complex associations/mappings that require non generic lookups, can be handled by extending the loader engine. - -Original focus was on support for the Open Source Spree e-commerce project, so includes specific loaders and rake tasks -for loading Spree Products, and associated data such as Product Variants, and Images. - -## Template Generation and Export - -Template generation tasks can be used to export a model's definition as column headings to CSV or .xls. -These can be provided to developers or business users, as a template for data collection and then loading. - -Export tasks can be used to export of a model's definition and any existing data stored in the database. - -This data can be exported directly to CSV or Excel/OpenOffice spreadsheets. - - -## Example Spreadsheets - - A number of example Spreadsheets with headers and comments, can be found in the spec/fixtures directory. - - Extensive Spree samples - including .xls and csv versions for simple Products or complex Products with multiple - taxons, variants properties etc - can be found in the spec/fixtures/spree subdirectory. - - Column headings contain comments with full descriptions and instructions on syntax. - - -## Features - -- *High level wrappers around applications including Excel and Word - - Quickly and easily access common enterprise applications through Ruby - - Wrapper around MS Excel File format, acheived via Apache POI under JRuby, so not restricted to Windows - and Excel does not need to be installed. - - The required POI jars are already included. - -- *Direct Excel export* - - Excel/OpenOffice spreadsheets are heavily used in many sectors, so direct support makes it - easier and quicker to migrate your client's data into a Rails/ActiveRecord project. - - No need to save to CSV or map to YAML. - -- *Semi-Smart Name Lookup* - - Includes helper classes that find and store details of all possible associations on an AR class. - Given a user supplied name, attempts to find the requested association. - - Example usage, load from a file or spreadsheet where the column names are only - an approximation of the actual associations, so given 'Product Properties' heading, - finds real association 'product_properties' to send or call on the AR object - -- *Associations* - - Can handle 'belongs_to, 'has_many' and 'has_one' associations, including assignment of multiple objects - via either multiple columns, or via specially delimited entry in a single (column). See Details section. - - Supports delegated attributes. - -- *Rake Tasks* - - High level Rake tasks are provided, only required to supply model class, and file location : - - jruby -S rake datashift:import:excel model=MusicTrack input=MyTrackListing.xls - - -- *Spree Rake Tasks* - - Spree's product associations are non trivial so specific Rake tasks are also provided for loading Spree Producta - with all associations and Image loading. - - jruby -S rake datashift:spree:products input=C:\MyProducts.xls - - -- *Seamless Spree Image loading can be achieved by ensuring SKU or class Name features in Image filename. - - Lookup is performed either via the SKU being prepended to the image name, or by the image name being equal to the **name attribute** of the klass in question. - - Images can be attached to any class defined with a suitable association. The class to use can be configured in rake task via - parameter klass=Xyz. - - In the Spree tasks, this defaults to Product, so attempts to attach Image to a Product via Product SKU or Name. - - Image loading **does not** specifically require JRuby - - A report is generated in the current working directory detailing any Images in the paths that could not be matched with a Product. - - rake datashift:spree:images input=C:\images\product_images skip_if_no_assoc=true - - rake datashift:spree:images input=C:\images\taxon_icons skip_if_no_assoc=true klass=Taxon - -## Import to Active Record - -### Associations - -To perform a lookup for an associated model, the primary column(s) must be supplied, along with required select values for those columns. - -A single association column can contain multiple name/value sets, in string form : - - column:lookup_key_1, lookup_key_2,... - -So if our Project model has many Categories, we can supply a Category list, which is keyed on the column Category.reference with : - - |Categories| - - reference:category_001,category_002 - -During loading, a call to find_all_by_reference will be made, picking up the 2 categories with matching references, - and our Project model will contain those two i.e project.categories = [category_002,category_003] - -## Spree Suppprt - -### OptionTypes & Variants - -When loaded with the Spree specific tasks, spree specific over rides are supported, such as direct s -support for OptionTypes with values - -Any 'Option Types' columns can contain the OptionType to associate with the Product, plus a selection of -appropriate OptionValues to go with that Type. - -For example, in a single column/row we could supply 2 OptionTypes (named, size & colour), with a selection values -(such as small, medium etc) - - 'Option Types' - size:small,medium,large|colour:red,white - -If no such OptionType exists, e.g size, then a new one is created with the supplied name. - -Next the OptionValues are also parsed, again if no such OptionValue exists, e.g small, then a new one is created with the supplied name. - -Lastly a Variant is created on each OptionValue, with price and availaable dates being copied from Master. -Currently a unique SKU is created by adding an index to the master's sku. - -TODO - Enable a hash of attributes to be supplied in association columns to enable more control over creation of associated objects. - -### Properties - -The properties to associate with this product. -Properties are for small snippets of text, shared across many products, -and are for display purposes only. - -An optional display value can be supplied to supplement the displayed text. - -As for all associations can contain multiple name/value sets in default form : - - Property:display_value|Property:display_value - -Example - No values : - manufacturer|standard - -Example - Display values : - manufacturer:somebody else plc|standard:ISOBlah21 - -## TODO - - - Add direct Image loading to Spree i.e should be able to specify just path in a column .. "/images/red-tshirt.jpg" - - - Smart sorting of column processing order .... - - - Does not currently ensure mandatory columns (for valid?) processed first. - - - Look at implementing import/export API using something like https://github.com/ianwhite/orm_adapter - rather than active record, so we can support additional ORMs - - - Create separate Spree extension to support import/export via the admin gui - -## License - -Copyright:: (c) Autotelik Media Ltd 2011 - -Author :: Tom Statter - -Date :: Dec 2011 - -The MIT License - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +## DataShift + +Provides tools to shift data between Ruby, external applications, files and ActiveRecord. + +### Features + +Map Active Record models and associations to files, generate sample templates. + +Import and Export ActiveRecord models through .xls or CSV files, including +all associations and with configurable defaults. + +Create, parse and use Excel/OpenOffice (.xls) documents dynamically from Ruby. + +Easily extendable Loader functionality to deal with non trivial import cases, such +as complex association lookups. + +High level rake/thor command line tasks for import/export provided. + +Specific loaders and rake tasks for Spree E-Commerce, enabling import/export of all data including Products with +complex associations such as Properties/Taxons/Options/Variants and Images. + +Specific loaders and rake tasks provided out the box for Spree E-Commerce, +enabling import/export of all data including Products with complex associations such as Options/Variants and Images. + +Many example Spreadsheets in spec/fixtures, fully documented with comments for each column. + +## Installation + +Add gem 'datashift' to your Gemfile/bundle, or install the latest gem as usual : + + `gem install datashift` + +To use : + + gem 'datashift' + require 'datashift' + +To pull the tasks in, add this call to your Rakefile : + + DataShift::load_tasks + +To use the Thor command line applications : + + Create a high level .thor file - e.g mysite.thor - in your applications root directory + +Pull in the thor commands : + + require 'thor' + require 'datashift' + + DataShift::load_commands + +To check the available tasks run + + bundle exec rake -T datashift + +and/or + + bundle exc thor list datashift + +N.B - To use the Excel loader, OLE and Excel are NOT required, however +JRuby is required, since it uses Java's Apache POI under the hood to process .xls files. + +Guards are provided, and used internally, for mixed Ruby setups. Can be used like : + + if(DataShift::Guards::jruby? ) + ..do something with Excel + else + ..do something with CSV + end + +## Active Record - Import/Export + +Provides high level rake tasks for importing data via ActiveRecord models into a DB, + from various sources, currently csv or .xls files (Excel/Open Office) + + + bundle exec rake datashift:import:csv model=BlogPost input=BlogPostImport.csv verbose=true + + jruby -S rake datashift:import:excel model=BlogPost input=BlogPostImport.xls verbose=true + +Provides high level rake tasks for exporting data to various sources, currently .xls files (Excel/Open Office) + + jruby -S rake datashift:export:excel model=BlogPost result=BlogExport.xls + + +The library can be easily extended with Loaders to deal with non trivial cases, + for example when multiple lookups required to find right association. + +Spree loaders are an example, these illustrate over riding processing for specific columns with +complicated lookup requirements. + +A core feature of DataShift is the MethodDictionary and MethodMapper, which provides features for collecting +reflection information from ActiveRecord models (all different associations, including join tables with many-to-many relationships). + +A full picture of all possible operations on a class can be created very easily, for example ona Blog model : + + MethodDictionary.find_operators( Blog ) + +This then allows Import/Export to be achieved, by mapping the file's header and column data to the found operators +i.e. the methods to set data on model's attributes and associations. + +Here we retrieve the method details for a column name from a file, "Blog Date" + + MethodDictionary.find_method_detail( Blog, "Blog Date" ) + +Loaders can use this method lookup functionality, to find the correct association for a column heading, +and populate AR object with row data. + +This means data can be mapped to any model without any further coding. Generators are also supplied to export +a models attributes and associations to files, thus providing template spreadsheets that any user can fill out. + +MethodMapper also stores column type information so the raw file data can be provided as is, +and whenever possible, under the bonnet the data will be cast to correct DB type. + +Here we show how a column name from a file, "Blog Date", can be mapped to Assign a stringified date, to the blog_date column, on a new Blog object : + + MethodDictionary.find_method_detail( Blog, "Blog Date" ).assign( Blog.new, "Sat Jul 23 2011" ) + +Because it's based on reflection against the model, can build complex relationships between tables during import/export, +and extending data files with new columns need not require any additional Ruby coding. + +New columns can simply be added to the Excel/Open Office spreadsheet, or CSV file, setting the new +attribute or association name in the header row. + + +The Loader attempts to handle various human read-able forms of column names. + +For example, given an association on the model called, product_properties, will successfully load +from columns with headings such as 'product_properties', 'Product Properties', 'ProductProperties' 'product properties' etc + +For has_many associations, either multiple columns can be used, + or multiple values can be specified in a single column using suitable delimiters. + +Modular - so complex associations/mappings that require non generic lookups, can be handled by extending the loader engine. + +Original focus was on support for the Open Source Spree e-commerce project, so includes specific loaders and rake tasks +for loading Spree Products, and associated data such as Product Variants, and Images. + +## Template Generation and Export + +Template generation tasks can be used to export a model's definition as column headings to CSV or .xls. +These can be provided to developers or business users, as a template for data collection and then loading. + +Export tasks can be used to export of a model's definition and any existing data stored in the database. + +This data can be exported directly to CSV or Excel/OpenOffice spreadsheets. + + +## Example Spreadsheets + + A number of example Spreadsheets with headers and comments, can be found in the spec/fixtures directory. + + Extensive Spree samples - including .xls and csv versions for simple Products or complex Products with multiple + taxons, variants properties etc - can be found in the spec/fixtures/spree subdirectory. + + Column headings contain comments with full descriptions and instructions on syntax. + + +## Features + +- *High level wrappers around applications including Excel and Word + + Quickly and easily access common enterprise applications through Ruby + + Wrapper around MS Excel File format, acheived via Apache POI under JRuby, so not restricted to Windows + and Excel does not need to be installed. + + The required POI jars are already included. + +- *Direct Excel export* + + Excel/OpenOffice spreadsheets are heavily used in many sectors, so direct support makes it + easier and quicker to migrate your client's data into a Rails/ActiveRecord project. + + No need to save to CSV or map to YAML. + +- *Semi-Smart Name Lookup* + + Includes helper classes that find and store details of all possible associations on an AR class. + Given a user supplied name, attempts to find the requested association. + + Example usage, load from a file or spreadsheet where the column names are only + an approximation of the actual associations, so given 'Product Properties' heading, + finds real association 'product_properties' to send or call on the AR object + +- *Associations* + + Can handle 'belongs_to, 'has_many' and 'has_one' associations, including assignment of multiple objects + via either multiple columns, or via specially delimited entry in a single (column). See Details section. + + Supports delegated attributes. + +- *Rake Tasks* + + High level Rake tasks are provided, only required to supply model class, and file location : + + jruby -S rake datashift:import:excel model=MusicTrack input=MyTrackListing.xls + + +- *Spree Rake Tasks* + + Spree's product associations are non trivial so specific Rake tasks are also provided for loading Spree Producta + with all associations and Image loading. + + jruby -S rake datashift:spree:products input=C:\MyProducts.xls + + +- *Seamless Spree Image loading can be achieved by ensuring SKU or class Name features in Image filename. + + Lookup is performed either via the SKU being prepended to the image name, or by the image name being equal to the **name attribute** of the klass in question. + + Images can be attached to any class defined with a suitable association. The class to use can be configured in rake task via + parameter klass=Xyz. + + In the Spree tasks, this defaults to Product, so attempts to attach Image to a Product via Product SKU or Name. + + Image loading **does not** specifically require JRuby + + A report is generated in the current working directory detailing any Images in the paths that could not be matched with a Product. + + rake datashift:spree:images input=C:\images\product_images skip_if_no_assoc=true + + rake datashift:spree:images input=C:\images\taxon_icons skip_if_no_assoc=true klass=Taxon + +## Import to Active Record + +### Associations + +To perform a lookup for an associated model, the primary column(s) must be supplied, along with required select values for those columns. + +A single association column can contain multiple name/value sets, in string form : + + column:lookup_key_1, lookup_key_2,... + +So if our Project model has many Categories, we can supply a Category list, which is keyed on the column Category.reference with : + + |Categories| + + reference:category_001,category_002 + +During loading, a call to find_all_by_reference will be made, picking up the 2 categories with matching references, + and our Project model will contain those two i.e project.categories = [category_002,category_003] + +## Spree Suppprt + +### OptionTypes & Variants + +When loaded with the Spree specific tasks, spree specific over rides are supported, such as direct s +support for OptionTypes with values + +Any 'Option Types' columns can contain the OptionType to associate with the Product, plus a selection of +appropriate OptionValues to go with that Type. + +For example, in a single column/row we could supply 2 OptionTypes (named, size & colour), with a selection values +(such as small, medium etc) + + 'Option Types' + size:small,medium,large|colour:red,white + +If no such OptionType exists, e.g size, then a new one is created with the supplied name. + +Next the OptionValues are also parsed, again if no such OptionValue exists, e.g small, then a new one is created with the supplied name. + +Lastly a Variant is created on each OptionValue, with price and availaable dates being copied from Master. +Currently a unique SKU is created by adding an index to the master's sku. + +TODO - Enable a hash of attributes to be supplied in association columns to enable more control over creation of associated objects. + +### Properties + +The properties to associate with this product. +Properties are for small snippets of text, shared across many products, +and are for display purposes only. + +An optional display value can be supplied to supplement the displayed text. + +As for all associations can contain multiple name/value sets in default form : + + Property:display_value|Property:display_value + +Example - No values : + manufacturer|standard + +Example - Display values : + manufacturer:somebody else plc|standard:ISOBlah21 + +## TODO + + - Add direct Image loading to Spree i.e should be able to specify just path in a column .. "/images/red-tshirt.jpg" + + - Smart sorting of column processing order .... + + - Does not currently ensure mandatory columns (for valid?) processed first. + + - Look at implementing import/export API using something like https://github.com/ianwhite/orm_adapter + rather than active record, so we can support additional ORMs + + - Create separate Spree extension to support import/export via the admin gui + +## License + +Copyright:: (c) Autotelik Media Ltd 2011 + +Author :: Tom Statter + +Date :: Dec 2011 + +The MIT License + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. \ No newline at end of file