CONFIGURATION.md in pitchfork-0.5.0

- old
+ new

@@ -171,37 +171,59 @@
   Default: `1`
 
 ### `timeout`
 
 ```ruby
-timeout 10
+timeout 10, cleanup: 3
 ```
 
-Sets the timeout of worker processes to a number of seconds.
-Workers handling the request/app.call/response cycle taking longer than
-this time period will be forcibly killed (via `SIGKILL`).
+Sets the timeout for worker processes to a number of seconds.
 
-This timeout mecanism shouldn't be routinely relying on, and should
+Note that Pitchfork has two layers of timeout.
+
+A first "soft" timeout will invoke the `after_worker_timeout` from
+within the worker (but from a background thread) and then call `exit`
+to terminate the worker cleanly.
+
+The second "hard" timeout, is the sum of `timeout` and `cleanup`.
+Workers taking longer than this time period to be ready to handle a new
+request will be forcibly killed (via `SIGKILL`).
+
+Neither of these timeout mecanisms should be routinely relied on, and should
 instead be considered as a last line of defense in case you application
 is impacted by bugs causing unexpectedly slow response time, or fully stuck
 processes.
 
+If some of the application endpoints require an unreasonably large timeout,
+rather than to increase the global application timeout, it is possible to
+adjust it on a per request basis via the rack request environment:
+
+```ruby
+class MyMiddleware
+  def call(env)
+    if slow_endpoint?(env)
+      # Give 10 more seconds
+      env["pitchfork.timeout"]&.extend_deadline(10)
+    end
+    @app.call(env)
+  end
+end
+```
+
 Make sure to read the guide on [application timeouts](Application_Timeouts.md).
 
-This configuration defaults to a (too) generous 20 seconds, it is
-highly recommended to set a stricter one based on your application
-profile.
+This configuration defaults to a (too) generous 20 seconds for the soft timeout
+and an extra 2 seconds for the hard timeout. It is highly recommended to set a
+stricter one based on your application profile.
 
-This timeout is enforced by the master process itself and not subject
-to the scheduling limitations by the worker process.
 Due the low-complexity, low-overhead implementation, timeouts of less
 than 3.0 seconds can be considered inaccurate and unsafe.
 
 For running Pitchfork behind nginx, it is recommended to set
 "fail_timeout=0" for in your nginx configuration like this
 to have nginx always retry backends that may have had workers
-SIGKILL-ed due to timeouts.
+exit or be SIGKILL-ed due to timeouts.
 
 ```
    upstream pitchfork_backend {
      # for UNIX domain socket setups:
      server unix:/path/to/.pitchfork.sock fail_timeout=0;
@@ -282,18 +304,62 @@
 after_worker_ready do |server, worker|
   server.logger.info("worker #{worker.nr} ready")
 end
 ```
 
+### `after_worker_timeout`
+
+Called by the worker process when the request timeout is elapsed:
+
+```ruby
+after_worker_timeout do |server, worker, timeout_info|
+  timeout_info.copy_thread_variables!
+  timeout_info.thread.kill
+  server.logger.error("Request timed out: #{timeout_info.rack_env.inspect}")
+  $stderr.puts timeout_info.thread.backtrace
+end
+```
+
+Note that this callback is invoked from a different thread. You can access the
+main thread via `timeout_info.thread`, as well as the rack environment via `timeout_info.rack_env`.
+
+If you need to invoke cleanup code that rely on thread local state, you can copy
+that state with `timeout_info.copy_thread_variables!`, but it's best avoided as the
+thread local state could contain thread unsafe objects.
+
+Also note that at this stage, the thread is still alive, if your callback does
+substantial work, you may want to kill the thread.
+
+After the callback is executed the worker will exit with status `0`.
+
+It is recommended not to do slow operations in this callback, but if you
+really have to, make sure to configure the `cleanup` timeout so that the
+callback has time to complete before the "hard" timeout triggers.
+By default the cleanup timeout is 2 seconds.
+
 ### `after_worker_exit`
 
 Called in the master process after a worker exits.
 
 ```ruby
 after_worker_exit do |server, worker, status|
   # status is a Process::Status instance for the exited worker process
   unless status.success?
     server.logger.error("worker process failure: #{status.inspect}")
+  end
+end
+```
+
+### `after_request_complete`
+
+Called in the worker processes after a request has completed.
+
+Can be used for out of band work, or to exit unhealthy workers.
+
+```ruby
+after_request_complete do |server, worker|
+  if something_wrong?
+    exit
   end
 end
 ```
 
 ## Reforking