# Csv2Hash [![Code Climate](https://codeclimate.com/github/FinalCAD/csv2hash.png)](https://codeclimate.com/github/FinalCAD/csv2hash) [![Dependency Status](https://gemnasium.com/FinalCAD/csv2hash.png)](https://gemnasium.com/FinalCAD/csv2hash) [![Build Status](https://travis-ci.org/FinalCAD/csv2hash.png?branch=master)](https://travis-ci.org/FinalCAD/csv2hash) (Travis CI) [![Coverage Status](https://coveralls.io/repos/FinalCAD/csv2hash/badge.png)](https://coveralls.io/r/FinalCAD/csv2hash) It is a DSL to validate and map a CSV to a Ruby Hash. ## Installation Add this line to your application's Gemfile: gem 'csv2hash' And then execute: $ bundle Or install it yourself as: $ gem install csv2hash ## Usage Parsing is based on rules, you should defined rule for each cells ### DSL ``` Csv2hash::Main.generate_definition :name do set_type { Definition::MAPPING } set_header_size { 2 } # 0 by default set_structure_rules {{ max_columns: 2 }} mapping do cell position: [0,0], key: 'gender' cell position: [1,0], key: 'name' end end Csv2hash::Main[:name] # Access anywhere ``` ### Rules You should declared a definition of your CSV file, and then define for each cell what you would expect. Example : If you want the very first cell, located on the first line and on the first column to be a string with values are either 'yes' either 'no', you can write the following validation rule: ``` cell name: 'aswering', type: 'string', values: ['yes', 'no'], position: [0,0] ``` `:type` attribute has `String` for default value, therefore you can just write this: ``` cell name: 'aswering', values: ['yes', 'no'], position: [0,0] ``` You can define you own message but default message is 'undefined :key on :position' ``` cell name: 'aswering', values: ['yes', 'no'], position: [0,0], message: 'this value is not supported' ``` You can also define Range of values ``` cell name: 'score', values: 0..5, position: [0,0] ``` The message is parsed: ``` cell ..., message: 'value of :name is not supported, please you one of :values' ``` It produces : ``` value of answering is not supported, please use one of [yes, no] ``` ### Default values Only position is required: * :position All remaining keys are optionals: * message: 'undefined :key on :position' * mappable: true * type: 'string' * values: nil * nested: nil * allow_blank: false * extra_validator: nil ## Define where your data is expected **IMPORTANT!** Position means [Y, X], where Y is rows, X columns A definition should be provided. There are 2 types of definitions: * search for data at a precise position in the table: `y,x` * or search for data in a column of rows, where all the rows are the same: `x` (column index) ## Samples ### [MAPPING] Validation of cells with defined precision Consider the following CSV: ``` | Fields | Person Informations | Optional | |-------------|----------------------|----------| | Nickname | jo | no | | First Name | John | yes | | Last Name | Doe | yes | ``` Precise position validation sample: ``` class MyParser attr_accessor :file_path_or_data def initialize file_path_or_data @file_path_or_data = file_path_or_data end def data @data_wrapper ||= Csv2hash::Main.new(:, file_path_or_data).parse end private def definition Main.generate_definition :my_defintion do set_type { Definition::MAPPING } set_header_size { 1 } mapping do cell position: [2,1], key: 'first_name' cell position: [3,1], key: 'last_name' end end end end end ``` ### Auto discover position This is a special feature for finding the Y index of row where you data start. For instance you have this following data : ``` |---------------|---------------|------|-----| | Nickname | jo | | | | First Name | John | | | | Last Name | Doe | | | | | | | | | Employment | CEO | | | | Post | Doe | | | | | | | | | | Personal info | Age | 26 | | | Sex | Male | | | | | | | ``` You want extract `Employment` information and `Personal info` but we do not know if extra information will not come and break our index. This feature can be useful is this case. You must change Y position (rows) by the column index and regex, the parser will search on this column the index row of this regex, here our rule : ``` cell position: [4,1], key: 'employment' ``` became ``` cell position: [[0, /Employment/],1], key: 'employment' ``` ### [COLLECTION] Validation of a collection (Regular CSV) Consider the following CSV: ``` | Nickname | First Name | Last Name | |----------|------------|-----------| | jo | John | Doe | | ja | Jane | Doe | ``` Collection validation sample: ``` class MyParser attr_accessor :file_path_or_data def initialize file_path_or_data @file_path_or_data = file_path_or_data end def data @data_wrapper ||= Csv2hash::Main.new(:, file_path_or_data).parse end private def definition Csv2Hash::Definition.new(rules, type = Csv2Hash::Definition::COLLECTION, header_size: 1) end def definition Main.generate_definition :my_defintion do set_type { Definition::COLLECTION } set_header_size { 1 } mapping do cell position: 0, key: 'nickname' cell position: 1, key: 'first_name' cell position: 2, key: 'last_name' end end end end end ``` ### Structure validation rules You may want to validate some structure, like min or max number of columns, definition accepts structure_rules as a key for the third parameter. Current validations are: :min_columns, :max_columns ``` class MyParser attr_accessor :file_path_or_data def initialize file_path_or_data @file_path_or_data = file_path_or_data end def data @data_wrapper ||= Csv2hash::Main.new(:, file_path_or_data).parse end private def definition Main.generate_definition :my_defintion do set_type { Definition::COLLECTION } set_header_size { 1 } set_structure_rules {{ min_columns: 2, max_columns: 3 }} mapping do cell position: 0, key: 'nickname' cell position: 1, key: 'first_name' cell position: 2, key: 'last_name' end end end end end ``` ### CSV Headers You can define the number of rows to skip in the header of the CSV. ``` set_header_size { 1 } ``` ### Parser and configuration Pasrer can take several parameters like that: ``` definition, file_path_or_data, ignore_blank_line: false ``` in `file_path_or_data` attribute you can pass directly an `Array` of data (`Array` with 2 dimensions) really useful for testing, if you don't care about blank lines in your CSV you can ignore them. ### Response The parser return values wrapper into `DataWrapper Object`, you can call ```.valid?``` method on this Object and grab either data or errors like that : ``` response = parser.parse if response.valid? response.data else response.errors end ``` data or errors are Array, but errors can be formatted on csv format with .to_csv call ``` response.errors.to_csv ``` ## Exception or Not ! You can choose into 2 differents modes of parsing, either **break_on_failure mode** for throw an exception when rule fail or **csv mode** for get csv original data + errors throwing into added extra column. ### On **BREAK_ON_FAILURE MODE** You need call ```.parse()``` with bang ```!``` ### On **CSV MODE** You need call `.parse()` return `data_wrapper` if `.parse()` is invalid, you can code your own behavior: in your code ``` parser = Csv2hash::Main.new(definition, file_path_or_data, ignore_blank_line: false).new response = parser.parse return response if response.valid? # Whatever ``` In the same time Csv2hash call **notify(response)** method when CSV parsing fail, you can add your own Notifier: ``` module Csv2hash module Plugins class Notifier def initialize csv2hash csv2hash.notifier.extend NotifierWithEmail end module NotifierWithEmail def notify response filename = 'issues_errors.csv' tempfile = Tempfile.new [filename, File.extname(filename)] File.open(tempfile.path, 'wb') { |file| file.write response.errors.to_csv } # Send mail with csv file + errors and free resource tempfile.unlink end end end end end ``` Or other implementation ### Errors Format errors is a Array of Hash ``` { y: 1, x: 0, message: 'message', key: 'key', value: '' } ``` ## Sample ### Csv data ``` | Fields | Person Informations | |-------------|----------------------| | Nickname | nil | ``` ### Rule ``` cell position: [1,1], key: 'nickname', allow_blank: false ``` ### Error ``` { y: 1, x: 1, message: 'undefined nikcname on [0, 0]', key: 'nickname', value: nil } ``` ## Personal Validator Rule You can define your own Validator For downcase validation ``` class DowncaseValidator < Csv2hash::ExtraValidator def valid? value !!(value.match /^[a-z]+$/) end end ``` in your rule ``` cell position: [0,0], key: 'name', extra_validator: DowncaseValidator.new, message: 'your data should be written in lowercase only.' ``` Csv data ``` [ [ 'Foo' ] ] ``` # Config file You can defined rules into a yaml file ``` name: 'example' mapping: 'mapping' header_size: 2 structure_rules: { max_columns: 20 } rules: - { position: [1,1], key: 'first_name' } - { position: [2,1], key: 'first_name' } ``` or for complex and verbose rule ``` name: 'example' mapping: 'mapping' header_size: 2 structure_rules: { max_columns: 20 } rules: - !rule position: [1,1] key: 'first_name' - !rule position: [2,1] key: 'last_name' ``` Special attention, if you use `ExtraValidator` you must give only the String name of the classe, like that : ``` extra_validator: DowncaseValidator.new ``` should become ``` extra_validator: 'DowncaseValidator' ``` for autodiscover field you must give String without Regex like that : ``` position: [[0,/LastName/],1] ``` should become ``` position: [[0,'LastName'],1] ``` this change is due to Yaml conversion You can write ERB file, should be named with following convention ```.erb.yml``` ## YamlLoader You can load your definition with YamlLoader llike that : ``` loader = Csv2hash::YamlLoader.load!('config/rules.erb.yml') loader.definition ``` or ``` loader = Csv2hash::YamlLoader.new('config/rules.erb.yml') loader.load! loader.definition ``` # Type conversion By default Csv2hash doesn't convert basic value, but you can activate this on configuration, for Rails use : add file `config/initializers/csv2hash.rb` ``` Csv2hash.configure do |conf| conf.convert = true end ``` ``` rails generate csh2hash:install ``` If you want add your specific conversion # Changes please refere to [CHANGELOG.md](https://github.com/FinalCAD/csv2hash/blob/master/CHANGELOG.md) doc # Upgrading please refere to [UPGRADE.md](https://github.com/FinalCAD/csv2hash/blob/master/UPGRADE.md) doc # Yard doc ``` bundle exec yard --plugin tomdoc ``` ## Contributing 1. Fork it 2. Create your feature branch (`git checkout -b my-new-feature`) 3. Commit your changes (`git commit -am 'Added some feature'`) 4. Push to the branch (`git push origin my-new-feature`) 5. Create new Pull Request