NOTES in rflow-0.0.5 vs NOTES in rflow-1.0.0a1

- old
+ new

@@ -1,5 +1,18 @@ +RFlow starts +read in DB +create new shards + - Create a set of workers with the shard configuration + - each worker creates a set of components + + - create components + + + + + + RFlow Manager Components Input Ports Output Ports @@ -18,28 +31,28 @@ Initialize components Start components running and make sure that they "daemonize" correctly - place pid files in deployment's run directory Configure components via zmq Daemonize self - + class Component def self.input_port - end + end def self.output_port end attr_accessor :state def initialize(config, run_directory) - + end def run - + end def configure end @@ -54,21 +67,21 @@ def initialize(config, run_directory) # This will initialize the ports super # Do stuff to initialize component - end + end end Computation Requirements: Initial startup with: - management bus connection information - group and instance UUID - beacon interval - - run directory, containing + - run directory, containing - PID files - log dir + logs - computation-specific configuration (conf dir) Needs to process the following messages from mgmt bus: - CONFIGURE (ports) @@ -88,43 +101,43 @@ - publish BEACON + state to mgmt bus every (beacon interval) seconds (default to 1 sec) External Computations: - Given (out-of-band) startup info (mgmt bus, UUIDs, beacon interval) - - + - RFlow - Will need a DB for config - Initial startup will need to resolve all remaining outstanding items (ports, UUIDs, etc) and store in config DB - MVC, Mongrel2-like? Translate - Need to add <associated type="objtype" name="myname"> where name attr can be used in later XML templates - + ---------------- Plugins: an externally defined plugin needs access to all current data types, as well as being able to define its own and tell the system about that. - necessary to tell system? - need a protocol for defining schema transfer - each message has attached schema - + lib/rflow/message.rb RFlow::Config RFlow::Management - Somewhere for external people to register new computations with running system - computation says that its running and asks for Connection configuration - how will it specify where in the workflow it wants to run???? - + RFlow::Message(complete on-the-wire Avro message format) data_type, provenance, external_ids, empty, data (see below) RFlow::Data::(various message data blocks) @@ -140,25 +153,25 @@ RFlow::Connection::AMQP will manage connections to an AMQP server RFlow::Connection::ZMQ - + computation_a.output_port -> (connection.incoming -> connection.outgoing) -> computation_b.input_port AMQP::Topic - responsible for setting up a topic -> queue binding r.incoming = amqp connection, channel, vhost, login, password, topic r.outgoing = amqp connection, channel, vhost, login, password, queue name behavior -> n x m, "round-robin" among the connected outgoing incoming behavior will need to set topic/key, uses the data type in the RFlow::Message - + ZMQ::PubSub - device-less, responsible for assigning ip/port and assigning one client to bind the port r.incoming = zmq connection string (tcp://ip:port), type pub r.outgoing = zmq connection string (tcp://ip:port), type sub - behavior -> n x m, broadcast sending, + behavior -> n x m, broadcast sending, ZMQ::PushPull - device-less, responsible for assigning ip/port and assigning one client to bind the port r.incoming = zmq connection string (tcp://ip:port), type push r.outgoing = zmq connection string (tcp://ip:port), type pull