{Previous tutorial}[link:files/doc/tutorials/04-EventPropagation_rdoc.html] {Next tutorial}[link:files/doc/tutorials/06-Overview_rdoc.html] = Representing and handling errors One thing about robotics, and in particular plan execution, is that Murphy's rule applies quite well. This is due to a few things. Among them, the first is that the models planning uses (and therefore the plans it builds) are (i) too simple to completely reflect the reality, (ii) badly parametrized and (iii) represent dynamic agents, which can themselves be able to take decisions. So, in essence, the rule of thumb is that a plan will fail during its execution. Because Roby represents and executes all the activities of a given system, the representation of errors becomes a very powerful thing: it is quite easy, when an error appears somewhere to actually determine what are its consequences. What this tutorial will show is: * how parts of the error conditions are encoded in the task structure. * how exceptions that come from the code itself (like NoMethodError ...) are handled. == Where do errors come from ? === Task structure as a constraint representation Some (not all) task relations also define a set of constraints on the plan execution. For instance, the +realized_by+ relation defines a set of _desirable_ and a set of _forbidden_ events (the +success+ and +failure+ options of TaskStructure#realized_by). If none of the desirable events are reachable (i.e. none will be emitted +ever+, see Roby::EventGenerator#unreachable?), or if one of the forbidden events is emitted, a ChildFailedError error is generated. For instance, if we look at the first tutorial, we had an error provoked because the +failed+ event of ComputePath has been emitted, while ComputePath was a child of MoveTo: $ scripts/shell >> move_to! :x => 10, :y => 10 => MoveTo{goal => Vector3D(x=10.000000,y=10.000000,z=0.000000)}:0x48350370[] >> !Roby::ChildFailedError !at [336040:01:45.419/186] in the failed event of ComputePath:0x483502e0 !block not supplied (ArgumentError) ! /home/doudou/dev/roby/lib/roby/thread_task.rb:51:in `instance_eval', ! /home/doudou/dev/roby/lib/roby/thread_task.rb:61:in `value', ! /home/doudou/dev/roby/lib/roby/thread_task.rb:61:in the polling handler, ! /home/doudou/system/powerpc-linux/ruby-1.8.6/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require', ! /home/doudou/system/powerpc-linux/ruby-1.8.6/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require', ! scripts/run:3 ! !The failed relation is ! MoveTo:0x48350370 ! owners: Roby::Distributed ! arguments: {:goal=>Vector3D(x=10.000000,y=10.000000,z=0.000000)} ! realized_by ComputePath:0x483502e0 ! owners: Roby::Distributed ! arguments: {:max_speed=>1.0, ! :goal=>Vector3D(x=10.000000,y=10.000000,z=0.000000)} !The following tasks have been killed: ! ComputePath:0x483502e0 ! MoveTo:0x48350370 In the case of the PlannedBy relation that we saw in the previous tutorial, the error is that no plan can be found. A PlanningFailedError is generated in that case. Those two types of error have in common that it is possible to associate the error with one of the plan objects (event or task). They are localized errors and are subclasses of Roby::LocalizedError. The nice aspect of that is that it is possible to assess what is their impact on the plan execution. It is therefore possible to handle the error at the plan level and continue executing what can be executed. === Errors generated by the code itself In that case, the problem is not to have plan-specific errors anymore. It is to handle errors that appear because of bugs in the code itself. Roby is implemented in a way where the code is split into two parts: The framework code is the really problematic one. It means that there is really a bug in the execution engine itself. In that case, Roby tries to hang up as cleanly as possible by killing all tasks that are being executed. The user code is the part of the code which is tied to events and tasks: event commands, event handlers, polling blocks. For those, it is actually possible to generate a Roby::LocalizedError as in the previous case and to handle the error at the plan level. Failed command tasks generate a Roby::CommandFailedError, failed event handlers a Roby::EventHandlerError. Polling blocks actually emit +failed+ with the poller exception as context (see the above error message). Let's try. Add the following event handler in the definition of MoveTo (tasks/move_to.rb). on :start do raise end Start (or restart) the controller and launch a move_to! action in the shell. The following should happen: !Roby::EventHandlerError: user code raised an exception at [336641:28:04.607/23] in the start event of MoveTo:0x2b4330b4fae8 ! ! ! (RuntimeError) !./tasks/move_to.rb:10:in event handler for 'start', ! /home/joyeux/system/rubygems/lib/rubygems/custom_require.rb:27:in `gem_original_require', ! /home/joyeux/system/rubygems/lib/rubygems/custom_require.rb:27:in `require', ! scripts/run:3 !The following tasks have been killed: ! MoveTo:0x2b4330b4fae8 An equivalent thing would happen with a task-level event handler (i.e. one defined on the task object instead of the task model). Remove the model-level handler we just added and try adding the following to the planning method in planners/PathPlan/main.rb. Execute, and see the result ! move.on :start do raise end Now, what happens during execution: how Roby does react to that error ? What we can see in the relation display is the following two successive steps (don't forget to uncheck View/Hide finalized). link:../../images/replay_handler_error_0.png link:../../images/replay_handler_error_1.png From Roby point of view, the event has already happened when the event handlers are called. Therefore, the event propagation should go on (the temporal structure is well-formed). However, an error occured and has not been handled, so the MoveTo task cannot be kept running. That is the job of the garbage collection process, which queues the 'stop' event, to be executed during the next cycle. The MoveTo task is therefore stopped at the next cycle, and the tasks that are now useless are also stopped. For event commands, all depends on where the exception actually appears. If 'emit' has already been called, then the event will be emitted and propagated. Otherwise, it counts as a cancelling of the event command. == Handling errors Now that we have seen how errors are detected and represented, we can tackle the problem of handling them. There are three ways to do that in Roby, that we will present right away. As we saw in the fifth tutorial, the forwarding relation represents an event generalization (the target represents a superset of the situations represented by the source), allowing to represent fault modes, i.e. specific fault situations that are classified through the forwarding relation (see figure below). The target of the forwarding relations being, of course, the +failed+ event. This is used during error handling to generalize the event handlers: an event handler which applies to a given erroneous situation also applies to all the situations that are subsets of it. link:../../images/task_event_generalization.png *Example*: the +blocked+ event is a particular fault mode during the movement. More complex forwarding network would allow to represent the relationships between the different type of faults recognized by the system. === Repairing during events propagation If a child fails, for instance because of a spurious problem, it would have been possible to actually restart the failing child directly in the event handler of 'failed' and replace the failed task through this new one. This is as simple as: on(:failed) do plan.respawn(self) end Let's try it. Add the following to the definition of +TrackPath+ to simulate an error: attr_accessor :should_pass event :start do if !should_pass forward :start, self, :failed, :delay => 0.2 end emit :start end What the event command does is schedule a delayed forwarding of 0.2 seconds if #should_pass is false (the default). +failed+ will therefore be emitted 0.2 seconds after the path tracking has been started, if +should_pass+ is false. Then, the error handler itself: on :failed do if !should_pass Robot.info "respawning ..." new_task = plan.respawn(self) new_task.should_pass = true end end This handler replaces the failed TrackPath with a copy of itself and schedules it for starting. Then, we set @should_pass to true to avoid having further errors. Look at the relation display to see how it worked. Note that doing such a thing on the +failed+ event is a bad idea, as +failed+ is emitted when the task gets interrupted. The next figure is an example of how it works on a real robot. As a workaround of a spurious error in the +TrackSpeedStart+ task, known to be harmless, an event handler is defined on this task model, which restarts the task online. link:../../images/repair_event_propagation.png === Asynchronous repairs Sometime, repairing the plan needs a few actions. While those actions are performed, we do not actually know yet if the plan *can* be repaired or not, only that necessary measures are taken to assess it and/or repair it. In Roby's plans, asynchronous repairs are represented as plan repairs (Roby::Plan#add_repair). Plan repairs are tasks which are associated with a task's event. While the plan repair is running, errors whose failure point is the associated event are simply ignored by the system. Once the task finished, normal error detection and handling resumes. To automate the process of installing plan repairs, a ErrorHandling relation exists, which defines the set of possible plan repairs for a given task and event. Roby::TaskEventGenerator#handle_with allows to easily add a new plan repair by associated the receiving task event with the (pending) task. Here is a simple example: http://roby.rubyforge.org/videos/rflex_repaired.avi. In this video, the microcontroller which drives the robot's motors can give us spurious BRAKES_ON messages. Our problem is that the Roby controller must determine if the message is spurious, or if brakes are actually set by the means of an emergency switch for instance. To do that, an error handling is set up, which wait for a few seconds and tests the BRAKES_ON state of the robot. If the brakes are reported as off, then the robot can start moving again. Otherwise, the error was a rightful one and should be handled by other means. Let's simulate the same kind of problem in the PathPlan controller. What we will do is the following: * add a 'blocked' fault event to the model of TrackPath, and make the 'poll' event of TrackPath emit 'blocked' randomly. * have a 'repair' task wait 2 seconds and either (randomly) respawn the path tracking after those two seconds, or emit +failed+. The first point is done by adding the following to the definition of TrackPath: event :blocked forward :blocked => :failed and then those three lines to the polling block: if rand < 0.05 emit :blocked end A new RepairTask model has to be added. Open tasks/repair_task.rb and add the following class RepairTask < Roby::Task terminates event :start do Robot.info "repair will succeed in 2 seconds" forward :start, self, :success, :delay => 2 emit :start end on :success do plan.respawn(failed_task) end end Finally, the repair handler must be defined added to the plan. Edit the +move_to+ method in planners/PathPlan/main.rb and add the following line before the last line of the method: track.event(:blocked).handle_with(RepairTask.new) Run as usual and see what happens ... Another, more complex example is the "P3d repaired" video presented {here}[files/doc/videos_rdoc.html] === Exception propagation This is the third error handling paradigm available in Roby. It is akin to classical exception propagation: * task models can define per-type exception handlers using Roby::Task#on_exception * when an error occurs and is not handled by a plan repair, the error is propagated up in the +realized_by+ relation, searching for a matching exception handler. * if an exception handler is found, it is called with the error. If the exception handlers raises, or if it calls #pass_exception, the propagation is resumed. Otherwise, the system stops propagating the exception. In addition to following the +realized_by+ relation, the +planned_by+ relation is used to check if planning activities can repair the error as well (see example below). * if no handler accepted the error, it is passed to a global error handler defined by Roby.on_exception. link:../../images/exception_propagation_5.png == Unhandled errors Once the exception propagation phase is finished, the plan analysis (i.e. constraint verification) is re-ran once to verify that exception handlers do have repaired the errors. If errors are still found, they cannot be handled anymore. This set of errors, and the errors that have not been handled before, determine a set of tasks that can be dangerous for the whole system. The garbage collection kicks in and will take the necessary actions to remove these tasks from the plan. Once necessity is to kill all tasks which were actually depending on the faulty activities: all tasks that are parents of the faulty tasks in any relation are forcefully garbage collected. In the exception propagation example above, all tasks which have a number will be killed and remove from the plan. = Next tutorial This tutorial presented you with one of the two most singular features of Roby: an extensive way to represent and handle errors. Among them, the error handling relation is the most powerful, as it allows to represent error handling directly in the plan and would for instance work in multi-robot context, even without communication between the two robots. {The next tutorial} is not really a tutorial. It is an overview of important Roby features that these tutorials did not cover. Again, my PhD thesis should still be considered one of the most central design document which allow to understand the system.