NOTES in rflow-0.0.5 vs NOTES in rflow-1.0.0a1
- old
+ new
@@ -1,5 +1,18 @@
+RFlow starts
+read in DB
+create new shards
+ - Create a set of workers with the shard configuration
+ - each worker creates a set of components
+
+ - create components
+
+
+
+
+
+
RFlow Manager
Components
Input Ports
Output Ports
@@ -18,28 +31,28 @@
Initialize components
Start components running and make sure that they "daemonize" correctly
- place pid files in deployment's run directory
Configure components via zmq
Daemonize self
-
+
class Component
def self.input_port
- end
+ end
def self.output_port
end
attr_accessor :state
def initialize(config, run_directory)
-
+
end
def run
-
+
end
def configure
end
@@ -54,21 +67,21 @@
def initialize(config, run_directory)
# This will initialize the ports
super
# Do stuff to initialize component
- end
+ end
end
Computation Requirements:
Initial startup with:
- management bus connection information
- group and instance UUID
- beacon interval
- - run directory, containing
+ - run directory, containing
- PID files
- log dir + logs
- computation-specific configuration (conf dir)
Needs to process the following messages from mgmt bus:
- CONFIGURE (ports)
@@ -88,43 +101,43 @@
- publish BEACON + state to mgmt bus every (beacon interval) seconds (default to 1 sec)
External Computations:
- Given (out-of-band) startup info (mgmt bus, UUIDs, beacon interval)
- -
+ -
RFlow
- Will need a DB for config
- Initial startup will need to resolve all remaining outstanding items (ports, UUIDs, etc) and store in config DB
- MVC, Mongrel2-like?
Translate
- Need to add <associated type="objtype" name="myname"> where name attr can be used in later XML templates
-
+
----------------
Plugins:
an externally defined plugin needs access to all current data types, as well as being able to define its own and tell the system about that.
- necessary to tell system?
- need a protocol for defining schema transfer
- each message has attached schema
-
+
lib/rflow/message.rb
RFlow::Config
RFlow::Management
- Somewhere for external people to register new computations with running system
- computation says that its running and asks for Connection configuration
- how will it specify where in the workflow it wants to run????
-
+
RFlow::Message(complete on-the-wire Avro message format)
data_type, provenance, external_ids, empty, data (see below)
RFlow::Data::(various message data blocks)
@@ -140,25 +153,25 @@
RFlow::Connection::AMQP
will manage connections to an AMQP server
RFlow::Connection::ZMQ
-
+
computation_a.output_port -> (connection.incoming -> connection.outgoing) -> computation_b.input_port
AMQP::Topic - responsible for setting up a topic -> queue binding
r.incoming = amqp connection, channel, vhost, login, password, topic
r.outgoing = amqp connection, channel, vhost, login, password, queue name
behavior -> n x m, "round-robin" among the connected outgoing
incoming behavior will need to set topic/key, uses the data type in the RFlow::Message
-
+
ZMQ::PubSub - device-less, responsible for assigning ip/port and assigning one client to bind the port
r.incoming = zmq connection string (tcp://ip:port), type pub
r.outgoing = zmq connection string (tcp://ip:port), type sub
- behavior -> n x m, broadcast sending,
+ behavior -> n x m, broadcast sending,
ZMQ::PushPull - device-less, responsible for assigning ip/port and assigning one client to bind the port
r.incoming = zmq connection string (tcp://ip:port), type push
r.outgoing = zmq connection string (tcp://ip:port), type pull