# Architecture ## Overview ![https://bit.ly/2iJuFky](images/puma-general-arch.png) Puma is a threaded Ruby HTTP application server processing requests across a TCP and/or UNIX socket. Puma processes (there can be one or many) accept connections from the socket via a thread (in the [`Reactor`](../lib/puma/reactor.rb) class). The connection, once fully buffered and read, moves into the `todo` list, where an available thread will pick it up (in the [`ThreadPool`](../lib/puma/thread_pool.rb) class). Puma works in two main modes: cluster and single. In single mode, only one Puma process boots. In cluster mode, a `master` process is booted, which prepares (and may boot) the application and then uses the `fork()` system call to create one or more `child` processes. These `child` processes all listen to the same socket. The `master` process does not listen to the socket or process requests - its purpose is primarily to manage and listen for UNIX signals and possibly kill or boot `child` processes. We sometimes call `child` processes (or Puma processes in `single` mode) _workers_, and we sometimes call the threads created by Puma's [`ThreadPool`](../lib/puma/thread_pool.rb) _worker threads_. ## How Requests Work ![https://bit.ly/2zwzhEK](images/puma-connection-flow.png) * Upon startup, Puma listens on a TCP or UNIX socket. * The backlog of this socket is configured with a default of 1024, but the actual backlog value is capped by the `net.core.somaxconn` sysctl value. The backlog determines the size of the queue for unaccepted connections. If the backlog is full, the operating system is not accepting new connections. * This socket backlog is distinct from the `backlog` of work as reported by `Puma.stats` or the control server. The backlog that `Puma.stats` refers to represents the number of connections in the process' `todo` set waiting for a thread from the [`ThreadPool`](../lib/puma/thread_pool.rb). * By default, a single, separate thread (created by the [`Reactor`](../lib/puma/reactor.rb) class) reads and buffers requests from the socket. * When at least one worker thread is available for work, the reactor thread listens to the socket and accepts a request (if one is waiting). * The reactor thread waits for the entire HTTP request to be received. * Puma exposes the time spent waiting for the HTTP request body to be received to the Rack app as `env['puma.request_body_wait']` (milliseconds). * Once fully buffered and received, the connection is pushed into the "todo" set. * Worker threads pop work off the "todo" set for processing. * The worker thread processes the request via `call`ing the configured Rack application. The Rack application generates the HTTP response. * The worker thread writes the response to the connection. While Puma buffers requests via a separate thread, it does not use a separate thread for responses. * Once done, the thread becomes available to process another connection in the "todo" set. ### `queue_requests` ![https://bit.ly/2zxCJ1Z](images/puma-connection-flow-no-reactor.png) The `queue_requests` option is `true` by default, enabling the separate reactor thread used to buffer requests as described above. If set to `false`, this buffer will not be used for connections while waiting for the request to arrive. In this mode, when a connection is accepted, it is added to the "todo" queue immediately, and a worker will synchronously do any waiting necessary to read the HTTP request from the socket.