# @title All About Cancellation: How to Stop Concurrent Operations # All About Cancellation: How to Stop Concurrent Operations ## The Problem of Cancellation Being able to cancel an operation is an important aspect of concurrent programming. When you have multiple operations going on at the same time, you want to be able to stop an operation in certain circumstances. Imagine sending a an HTTP request to some server, and waiting for it to respond. We can wait forever, or we can use some kind of mechanism for stopping the operation and declaring it a failure. This mechanism, which is generally called cancellation, plays a crucial part in how Polyphony works. Let's examine how operations are cancelled in Polyphony. ## Cancellation in Polyphony In Polyphony, every operation can be cancelled in the same way, using the same APIs. Polyphony provides multiple APIs that can be used to stop an ongoing operation, but the underlying mechanism is always the same: the fiber running the ongoing operation is scheduled with an exception. Let's revisit how fibers are run in Polyphony (this is covered in more detail in the overview document). When a waiting fiber is ready to continue, it is scheduled with the result of the operation which it was waiting for. If the waiting fiber is scheduled with an exception *before* the operation it is waiting for is completed, the operation is stopped, and the exception is raised in the context of the fiber once it is switched to. What this means is that any fiber waiting for a long-running operation to complete can be stopped at any moment, with Polyphony taking care of actually stopping the operation, whether it is reading from a file, or from a socket, or waiting for a timer to elapse. On top of this general mechanism of cancellation, Polyphony provides cancellation APIs with differing semantics that can be employed by the developer. For example, `move_on_after` can be used to stop an operation after a timeout without raising an exception, while `cancel_after` can be used to raise an exception that must be handled. There's also the `Fiber#restart` API which, as its name suggests, allows one to restart any fiber, which might be very useful for retrying complex operations. Let's examine how a concurrent operation is stopped in Polyphony: ```ruby sleeper = spin { sleep 1 } sleep 0.5 sleeper.raise 'Foo' ``` In the example above, we spin up a fiber that sleeps for 1 second, we then sleep for half a second, and cancel `sleeper` by raising an exception in its context. This causes the sleep operation to be cancelled and the fiber to be stopped. The exception is further propagated to the context of the main fiber, and the program finally exits with an exception message. Another way to stop a concurrent operation is to use the `Fiber#move_on` method, which causes the fiber to stop, but without raising an exception: ```ruby sleeper = spin { sleep 1; :foo } sleep 0.5 sleeper.move_on :bar result = sleeper.await #=> :bar ``` Using `Fiber#move_on`, we avoid raising an exception which then needs to be rescued, and instead cause the fiber to stop, with its return value being the value given to `Fiber#move_on`. In the code above, the fiber's result will be set to `:bar` instead of `:foo`. ## Using Timeouts Timeouts are probably the most common reason for cancelling an operation. While different Ruby gems provide their own APIs and mechanisms for setting timeouts (core Ruby has also recently introduced timeout settings for IO operations), Polyphony provides a uniform interface for stopping *any* long-running operation based on a timeout, using either the core ruby `Timeout` class, or the `move_on_after` and `cancel_after` that Polyphony provides. Before we discuss the different timeout APIs, we can first explore how to create a timeout mechanism from scratch in Polyphony: ```ruby class MyTimeoutError < RuntimeError end def with_timeout(duration) timeout_fiber = spin do sleep duration raise MyTimeoutError end yield ensure timeout_fiber.stop # this is the same as timeout_fiber.move_on end # Usage example: with_timeout(5) { sleep 1; :foo } #=> :foo with_timeout(5) { sleep 10; :bar } #=> MyTimeoutError raised! ``` In the code above, we create a `with_timeout` method that takes a duration argument. It starts by spinning up a fiber that will sleep for the given duration, then raise a custom exception. It then runs the given block by calling `yield`. If the given block returns before the timeout, its return value is returned from the call to `with_timeout`, not before making sure to stop the timeout fiber. If the given block runs longer than the timeout, the exception raised by the timeout will interrupt the fiber running the block, and will propagate to the call site. Now that we have an idea of how we can construct timeouts, let's look at the different timeout APIs included in Polyphony: ```ruby # Timeout without raising an exception move_on_after(5) { ... } # Timeout without raising an exception, returning an arbitrary value move_on_after(5, with_value: :foo) { ... } #=> :foo (in case of a timeout) # Timeout raising an exception cancel_after(5) { ... } #=> raises a Polyphony::Cancel exception # Timeout raising a custom exception cancel_after(5, with_exception: MyExceptionClass) { ... } #=> raises the given exception # Timeout using the Timeout API Timeout.timeout(5) { ... } #=> raises Timeout::Error ``` ## Resetting Ongoing Operations In addition to offering a uniform API for cancelling operations and setting timeouts, Polyphony also allows you to reset, or restart, ongoing operations. Let's imagine an active search feature that shows the user search results while they're typing their search term. How we go about implementing this? We would like to show the user search results, but if the user hits another key before the results are received from the database, we'd like to cancel the operation and relaunch the search. Let's see how Polyphony let's us do this: ```ruby searcher = spin do peer, term = receive results = get_search_results_from_db(term) peer << results end def search_term_updated(term) spin do searcher.restart searcher << [Fiber.current, term] results = receive update_search_results(results) end end ``` In the example above we use fiber message passing in order to communicate between two concurrent operations. Each time `search_term_updated` is called, we *restart* the `searcher` fiber, send the term to it, wait for the results and them update them in the UI. ## Resettable Timeouts Here's another example of restarting: we have a TCP server that accepts connection but would like to close connections after one minute of inactivity. We can use a timeout for that, but each time we receive data from the client, we need to reset the timeout. Here's how we can do this: ```ruby def handle_connection(conn) timeout = spin do sleep 60 raise Polyphony::Cancel end conn.recv_loop do |msg| timeout.reset # same as timeout.restart handle_message(msg) end rescue Polyphony::Cancel puts 'Closing connection due to inactivity!' ensure timeout.stop end server.accept_loop { |conn| handle_connection(conn) } ``` In the code above, we create a timeout fiber that sleeps for one minute, then raises an exception. We then run a loop waiting for messages from the client, and each time a message arrives we reset the timeout. In fact, the standard `#move_on_after` and `#cancel_after` APIs also propose a way to reset timeouts. Let's examine how to do just that: ```ruby def handle_connection(conn) cancel_after(60) do |timeout| conn.recv_loop do |msg| timeout.reset handle_message(msg) end end rescue Polyphony::Cancel puts 'Closing connection due to inactivity!' end server.accept_loop { |conn| handle_connection(conn) } ``` Here, instead of hand-rolling our own timeout mechanism, we use `#cancel_after` but give it a block that takes an argument. When the block is called, this argument is actually the timeout fiber that `#cancel_after` spins up, which lets us reset it just like in the example before. Also notice how we don't need to cleanup the timeout in the ensure block, as `#cancel_after` takes care of it by itself.