h1. Queues There are a number of important queues that we employ to track data: # queues - responsilbe for pushing/popping jobs. only stores a unique id. Is pushed and popped, therefore a fairly "volatile" dataset. # data-store:[name] - hash responsible for the different types of job data. Stored by job id. This data is persisted until a job is successful or failed too many times. Then the job will be dropped and failure recorded. # result-store:[name] - hash responsible for storing the results by job id. Data will remain here until removed. Can be fetched multiple times if desired providing a cached retrieval. # failure-store:[name] - hash responsible for storing any failed jobs. Stored until manually cleared. In addition there is a unique id counter that gets incremented as the job queue grows. It is stored at "unique_id" On top of these queues, there is also a set of stores setup to keep track of workers and processes actively being worked. # workers - set that tracks all the registered workers. All workers not in this list should be destroyed. # working - set of workers that are actively working a job. h1. Jobs I decided to make jobs a two fold purpose vehicle. One, it would be the job storage mechanism through which a job would be placed on the queue for later execution by workers and two, it would actually do the job execution, even though the worker is the one that calls it. This way all job related tasks can be abstracted to the job class, while the worker can busy itself about handling the control of the job execution and not the job execution itself. Jobs will also be responsible for storing the state of the result and any failures that come from the job execution. A job can only live on one queue. Job will try 3 times before being taken out of the work queue and placed in the failure queue h1. Workers There was a big struggle in the beginning to decide how to work queued jobs. I originally wanted to do a more cooperative scheduling schema but found that would not end up taking multiple cores into consideration. Moreover, it is wicked hard to preempt fibers in a meaningful way without digging really deep into other code as well in order to retrofit them for Fibers or EventMachine. So I simply didn't. Instead I went with Ruby's Process library and decided to work at making it work as much as possible around a Unix-style processing queue so that in one fell swoop, we could use multiple cores as well as gain true concurrency. One more design decision is that we will not attempt to run multiple processes- per-worker. This way we can control the number of processes running by how many workers we choose to run, rather than by how many processes a worker is allowed to spawn. This presents a slight problem however. Each worker then will involve two processes. One parent/control-loop and one child/job-processor. This means in the end, we have more processes running at once, possibly chewing up more resources than might be ultimately necessary. This will hopefully be overcome by suspending the parent worker while the child process runs. This way one core is not chewed up by an essentially idle process. Hopefully this can be benchmarked to figure out if its faster to run one-to-many parent/childs or one-to-one. A worker can work multiple queues h1. Web Interface Intended to have a web frontend to control/observe workers, jobs and queues. h1. Job Observer Widget Intended to have a JS widget that will indicate when a job has been computed and the results are ready or a job has failed. Potential application for a Node.js implementation (using something like http://github.com/fictorial/redis-node-client). Should be easily embeddable in any page for simple notification.