Sha256: 6583bbd3ccae122aaeefed2828e5296fff77f25c973dee40dab161f75d468803

Contents?: true

Size: 1.98 KB

Versions: 5

Compression:

Stored size: 1.98 KB

Contents

--
-- Doesn't work at the moment, just some notes on how the storefunc might look.
--


--
-- Right now the ElasticSearchOutputFormat gets all its options from the
-- Job object. We can use the call to setStoreLocation in the storefunc
-- to set the required parameters. Need to make sure the following are
-- set:
-- 
-- wonderdog.index.name  - should be set by the storefunc constructor
-- wonderdog.bulk.size   - should be set by the storefunc constructor
-- wonderdog.field.names - should be set by the call to checkSchema
-- wonderdog.id.field    - should be set by the storefunc constructor
-- wonderdog.object.type - should be set by the storefunc constructor
-- wonderdog.plugins.dir - should be set by call to setStoreLocation
-- wonderdog.config      - should be set by call to setStoreLocation
--
-- FIXME: options used in the ElasticSearchOutputFormat should NOT be
-- namespaced with 'wonderdog'

%default INDEX 'es_index'
%default OBJ   'text_obj'

        
records         = LOAD '$DATA'   AS (text_field:chararray);
records_with_id = LOAD '$IDDATA' AS (id_field:int, text_field:chararray);

-- Here we would use the elasticsearch index name as the uri, pass in a
-- comma separated list of field names as the first arg, the id field
-- as the second arg and the bulk size as the third. 
-- 
-- and so on.
STORE records INTO '$INDEX/$OBJ' USING ElasticSearchStorage('my_text_field', '-1', '1000');


-- but it would be really nice to duplicate what's in WonderDog.java in that,
-- should a bulk request fail, the failed records are written to hdfs. The
-- user should have some control of this. Also, it should be possible to generate
-- the field names directly from the pig schema? (We'd have to be VERY explicit in the
-- docs about this as it would be a point of headscratching/swearing...) In this
-- case we might have something like:
named_records = FOREACH records GENERATE text_field AS text_field_name;
STORE records INTO '/path/to/failed_requests' USING ElasticSearchStorage('$INDEX/$OBJ', '-1', '1000');

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
wonderdog-0.2.0 notes/pigstorefunc.pig
wonderdog-0.1.1 notes/pigstorefunc.pig
wonderdog-0.1.0 notes/pigstorefunc.pig
wonderdog-0.0.2 notes/pigstorefunc.pig
wonderdog-0.0.1 notes/pigstorefunc.pig