### Contents * [Introduction](./_introduction.md) * [The Basic API](./basic_api.md) * [**Batch Processing**](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) * [Header Transformations](./header_transformations.md) * [Header Validations](./header_validations.md) * [Data Transformations](./data_transformations.md) * [Value Converters](./value_converters.md) -------------- # Batch Processing Processing CSV data in batches (chunks), allows you to parallelize the workload of importing data. This can come in handy when you don't want to slow-down the CSV import of large files. Setting the option `chunk_size` sets the max batch size. ## Example 1: How SmarterCSV processes CSV-files as chunks, returning arrays of hashes: Please note how the returned array contains two sub-arrays containing the chunks which were read, each chunk containing 2 hashes. In case the number of rows is not cleanly divisible by `:chunk_size`, the last chunk contains fewer hashes. ```ruby > pets_by_owner = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) => [ [ {:first=>"Dan", :last=>"McAllister", :dogs=>"2"}, {:first=>"Lucy", :last=>"Laweless", :cats=>"5"} ], [ {:first=>"Miles", :last=>"O'Brian", :fish=>"21"}, {:first=>"Nancy", :last=>"Homes", :dogs=>"2", :birds=>"1"} ] ] ``` ## Example 2: How SmarterCSV processes CSV-files as chunks, and passes arrays of hashes to a given block: Please note how the given block is passed the data for each chunk as the parameter (array of hashes), and how the `process` method returns the number of chunks when called with a block ```ruby > total_chunks = SmarterCSV.process('/tmp/pets.csv', {:chunk_size => 2, :key_mapping => {:first_name => :first, :last_name => :last}}) do |chunk| chunk.each do |h| # you can post-process the data from each row to your heart's content, and also create virtual attributes: h[:full_name] = [h[:first],h[:last]].join(' ') # create a virtual attribute h.delete(:first) ; h.delete(:last) # remove two keys end puts chunk.inspect # we could at this point pass the chunk to a Resque worker.. end [{:dogs=>"2", :full_name=>"Dan McAllister"}, {:cats=>"5", :full_name=>"Lucy Laweless"}] [{:fish=>"21", :full_name=>"Miles O'Brian"}, {:dogs=>"2", :birds=>"1", :full_name=>"Nancy Homes"}] => 2 ``` ## Example 3: Populate a MongoDB Database in Chunks of 100 records with SmarterCSV: ```ruby # using chunks: filename = '/tmp/some.csv' options = {:chunk_size => 100, :key_mapping => {:unwanted_row => nil, :old_row_name => :new_name}} n = SmarterCSV.process(filename, options) do |chunk| # we're passing a block in, to process each resulting hash / row (block takes array of hashes) # when chunking is enabled, there are up to :chunk_size hashes in each chunk MyModel.insert_all( chunk ) # insert up to 100 records at a time end => returns number of chunks we processed ``` ---------------- PREVIOUS: [The Basic API](./basic_api.md) | NEXT: [Configuration Options](./options.md)