in wukong-load-0.0.2 vs in wukong-load-0.1.0
- old
+ new
@@ -1,9 +1,9 @@
# Wukong-Load
This Wukong plugin makes it easy to load data from the command-line
-into various.
+into various data stores.
It is assumed that you will independently deploy and configure each
data store yourself (but see
[Ironfan]( Once you've
done that, and once you've written some dataflows with
@@ -17,11 +17,11 @@
## Installation & Setup
Wukong-Load can be installed as a RubyGem:
-$ sudo gem install wukong-hadoop
+$ sudo gem install wukong-load
## Usage
Wukong-Load provides a command-line program `wu-load` you can use to
@@ -37,58 +37,39 @@
$ wu-load store_name --help
Further details will depend on the data store you're writing to.
-### Elasticsearch Usage
+### Expected Input
+All input to `wu-load` should be newline-separated, JSON-formatted,
+hash-like records. For some data stores, keys in the record may be
+interpreted as metadata about the record or about how to route the
+record within the data store.
+## Elasticsearch Usage
Lets you load JSON-formatted records into an
[Elasticsearch]( database. See full
options with
$ wu-load elasticsearch --help
-#### Expected Input
+### Connecting
-All input to `wu-load` should be newline-separated, JSON-formatted,
-hash-like record. Some keys in the record will be interpreted as
-metadata about the record or about how to route the record within the
-database but the entire record will be written to the database
+`wu-load` tries to connect to an Elasticsearch server at a default
+host (localhost) and port (9200). You can change these:
-A (pretty-printed for clarity -- the real record shouldn't contain
-newlines) record like
- "_index": "publications"
- "_type": "book",
- "ISBN": "0553573403",
- "title": "A Game of Thrones",
- "author": "George R. R. Martin",
- "description": "The first of half a hundred novels to come out since...",
- ...
-might use the `_index` and `_type` fields as metadata but the
-**whole** record will be written.
-#### Connecting
-`wu-load` has a default host (localhost) and port (9200) it tries to
-connect to but you can change these:
$ cat data.json | wu-load elasticsearch --host= --port=80
All queries will be sent to this address.
-#### Routing
+### Routing
Elasticsearch stores data in several *indices* which each contain
*documents* of various *types*.
`wu-load` loads each document into default index (`wukong`) and type
@@ -96,16 +77,101 @@
$ cat data.json | wu-load elasticsearch --host= --index=publication --es_type=book
-##### Creates vs. Updates
+A record with an `_index` or `_es_type` field will override these
+default settings. You can change the names of the fields used.
+### Creates vs. Updates
If an input document contains a value for the field `_id` then that
value will be as the ID of the record when written, possibly
overwriting a record that already exists -- an update.
You can change the field you use for the Elasticsearch ID property:
$ cat data.json | wu-load elasticsearch --host= --index=media --es_type=books --id_field="ISBN"
+## Kafka Usage
+Lets you load JSON-formatted records into a
+[Kafka]( queue. See full options with
+$ wu-load kafka --help
+### Connecting
+`wu-load` tries to connect to a Kafka broker at a default host
+(localhost) and a port (9092). You can change these:
+$ cat data.json | wu-load kafka --host= --port=1234
+All records will be sent to this address.
+### Routing
+Kafka stores data in several named *queues*. Each queue can have
+several numbered *partitions*.
+`wu-load` loads each record into the default queue (`test`) and
+partition (0), but you can change these:
+$ cat data.json | wu-load kafka --host= --topic=messages --partition=6
+A record with a `_topic` or `_partition` field will override these
+default settings. You can change the names of the fields used.
+## MongoDB Usage
+Lets you load JSON-formatted records into an
+[MongoDB]( database. See full options with
+$ wu-load mongodb --help
+### Connecting
+`wu-load` tries to connect to an MongoDB server at a default host
+(localhost) and port (27017). You can change these:
+$ cat data.json | wu-load mongodb --host= --port=1234
+All queries will be sent to this address.
+### Routing
+MongoDB stores *documents* in several *databases* which each contain
+`wu-load` loads each document into default database (`wukong`) and
+collection (`streaming_record`), but you can change these:
+$ cat data.json | wu-load mongodb --host= --database=publication --collection=book
+A record with a `_database` or `_collection` field will override these
+default settings. You can change the names of the fields used.
+### Creates vs. Updates
+If an input document contains a value for the field `_id` then that
+value will be as the ID of the record when written, possibly
+overwriting a record that already exists -- an update.
+You can change the field you use for the MongoDB ID property:
+$ cat data.json | wu-load mongodb --host= --database=media --collection=books --id_field="ISBN"