There are two aspects with regard to optimizing Phusion Passenger performance.
The first aspect is settings tuning. Phusion Passenger's default settings are not aimed at optimizing, but at safety. The defaults are designed to conserve resources, to prevent server overload and to keep web apps up and running. To optimize for performance, you need to tweak some settings whose values depend on your hardware and your environment.
Besides Phusion Passenger settings, you may also want to tune kernel-level settings.
The second aspect is using performance-enhancing features. This requires small application-level changes.
If you are optimizing Phusion Passenger for the purpose of benchmarking then you should also follow the benchmarking recommendations.
By default, Phusion Passenger spawns and shuts down application processes according to traffic. This allows it to use more resources during busy times, while conserving resources during idle times. This is especially useful if you host more than 1 app on a single server: if not all apps are used at the same time, then you don't have to keep all apps running at the same time.
However, spawning a process takes a lot of time (in the order of 10-20 seconds for a Rails app), and CPU usage will be near 100% during spawning. Therefore, while spawning, your server will be slower at performing other activities, such as handling requests.
For consistent performance, it is thus recommended that you configure a static process pool: telling Phusion Passenger to use a fixed number of processes, instead of spawning and shutting them down dynamically.
Run passenger start
with --max-pool-size=N --min-instances=N
, where N
is the number of processes you want.
Let N
be the number of processes you want. Set the following configuration in your http
block:
passenger_max_pool_size N;
Set the following configuration in your server
block:
passenger_min_instances N;
You should also configure passenger_pre_start
in the http
block so that your app is started during web server launch:
# Refer to the Users Guide for more information about passenger_pre_start.
passenger_pre_start http://your-website-url.com;
Let N
be the number of processes you want. Set the following configuration in the global context:
PassengerMaxPoolSize N
Set the following configuration in your virtual host block:
PassengerMinInstances N
You should also configure PassengerPreStart
in the global context so that your app is started during web server launch:
# Refer to the Users Guide for more information about PassengerPreStart.
PassengerPreStart http://your-website-url.com
This section provides guidance on maximizing Phusion Passenger's throughput. The amount of throughput that Phusion Passenger handles is proportional to the number of processes or threads that you've configured. More processes/threads generally means more throughput, but there is an upper limit. Past a certain value, further increasing the number of processes/threads won't help. If you increase the number of processes/threads even further, then performance may even go down.
The optimal value depends on the hardware and the environment. This section will provide you with formulas to calculate that optimal value. The following factors are involved in calculation:
Number of CPUs. True (hardware) concurrency cannot be higher than the number of CPUs. In theory, if all processes/threads on your system use the CPUs constantly, then:
NUMBER_OF_CPUS
processes/threads.Having more processes than CPUs may decrease total throughput a little thanks to context switching overhead, but the difference is not big because OSes are good at context switching these days.
On the other hand, if your CPUs are not used constantly, e.g. because they’re often blocked on I/O, then the above does not apply and increasing the number of processes/threads does increase concurrency and throughput, at least until the CPUs are saturated.
Blocking I/O. This covers all blocking I/O, including hard disk access latencies, database call latencies, web API calls, etc. Handling input from the client and output to the client does not count as blocking I/O, because Phusion Passenger has buffering layers that relief the application from worrying about this.
The more blocking I/O calls your application process/thread makes, the more time it spends on waiting for external components. While it’s waiting it does not use the CPU, so that’s when another process/thread should get the chance to use the CPU. If no other process/thread needs CPU right now (e.g. all processes/threads are waiting for I/O) then CPU time is essentially wasted. Increasing the number processes or threads decreases the chance of CPU time being wasted. It also increases concurrency, so that clients do not have to wait for a previous I/O call to be completed before being served.
The formulas in this section assume that your machine is dedicated to Phusion Passenger. If your machine also hosts other software (e.g. a database) then you'll need to tweak the formulas a little bit.
The amount of memory that your application uses on a per-process basis, is key to our calculation. You should first figure out how much memory your application typically needs. Every application has different memory usage patterns, so the typical memory usage is best determined by observation.
Run your app for a while, then run passenger-status
at different points in time to examine memory usage. Then calculate the average of your data points. In the rest of this section, we'll refer to the amount of memory (in MB) that an application process needs, as RAM_PER_PROCESS
.
In our experience, a typical medium-sized single-threaded Rails application process can use 150 MB of RAM on a 64-bit machine, even when the spawning method is set to "smart".
First, let's define the maximum number of (single-threaded) processes, or the number of threads, that you can comfortably have given the amount of RAM you have. This is a reasonable upper limit that you can reach without degrading system performance. This number is not the final optimal number, but is merely used for further caculations in later steps.
There are two formulas that we can use, depending on what kind of concurrency model your application is using in production.
Purely single-threaded multi-process formula
If you didn't explicitly configure multithreading, then you are using using this concurrency model. Or, if you are not using Ruby (e.g. if you using Python, Node.js or Meteor), then you are also using this concurrency model, because Phusion Passenger only supports multithreading for Ruby apps.
The formula is then as follows:
max_app_processes = (TOTAL_RAM * 0.75) / RAM_PER_PROCESS
It is derived as follows:
(TOTAL_RAM * 0.75)
: We can assume that there must be at least 25% of free RAM that the operating system can use for other things. The result of this calculation is the RAM that is freely available for applications. If your system runs a lot of services and thus has less memory available for Phusion Passenger and its apps, then you should lower 0.75
to some constant that you think is appropriate./ RAM_PER_PROCESS
: Each process consumes a roughly constant amount of RAM, so the maximum number of processes is a single devision between the aforementioned calculation and this constant.Multithreaded formula
The formula for multithreaded concurrency is as follows:
max_app_threads_per_process =
((TOTAL_RAM * 0.75) - (CHOSEN_NUMBER_OF_PROCESSES * RAM_PER_PROCESS * 0.9)) /
(RAM_PER_PROCESS / 10)
Here, CHOSEN_NUMBER_OF_PROCESSES
is the number of application processes you want to use. In case of Ruby, Python, Node.js and Meteor, this should be equal to NUMBER_OF_CPUS
. This is because all these languages can only utilize a single CPU core per process. If you're using a language runtime that does not have a Global Interpreter Lock, e.g. JRuby or Rubinius, then CHOSEN_NUMBER_OF_PROCESSES
can be 1.
The formula is derived as follows:
(TOTAL_RAM * 0.75)
: The same as explained earlier.(CHOSEN_NUMBER_OF_PROCESSES * RAM_PER_PROCESS)
: In multithreaded scenarios, the application processes consume a constant amount of memory, so we deduct this from the RAM that is available to applications. The result is the amount of RAM available to application threads./ (RAM_PER_PROCESS / 10)
: A thread consumes about 10% of the amount of memory a process would, so we divide the amount of RAM available to threads with this number. What we get is the number of threads that the system can handle.On 32-bit systems, max_app_threads_per_process
should not be higher than about 200. Assuming an 8 MB stack size per thread, you will run out of virtual address space if you go much further. On 64-bit systems you don’t have to worry about this problem.
The earlier two formulas were not for calculating the number of processes or threads that application needs, but for calculating how much the system can handle without getting into trouble. Your application may not actually need that many processes or threads! If your application is CPU-bound, then you only need a small multiple of the number of CPUs you have. Only if your application performs a lot of blocking I/O (e.g. database calls that take tens of milliseconds to complete, or you call to Twitter) do you need a large number of processes or threads.
Armed with this knowledge, we derive the formulas for calculating how many processes or threads we actually need.
If your application performs a lot of blocking I/O then you should give it as many processes and threads as possible:
# Use this formula for purely single-threaded multi-process scenarios.
desired_app_processes = max_app_processes
# Use this formula for multithreaded scenarios.
desired_app_threads_per_process = max_app_threads_per_process
If your application doesn’t perform a lot of blocking I/O, then you should limit the number of processes or threads to a multiple of the number of CPUs to minimize context switching:
# Use this formula for purely single-threaded multi-process scenarios.
desired_app_processes = min(max_app_processes, NUMBER_OF_CPUS)
# Use this formula for multithreaded scenarios.
desired_app_threads_per_process = min(max_app_threads_per_process, 2 * NUMBER_OF_CPUS)
Purely single-threaded multi-process scenarios
passenger start
with --max-pool-size=<desired_app_processes> --min-instances=<desired_app_processes>
.passenger_max_pool_size <desired_app_processes>;
passenger_min_instances <desired_app_processes>;
passenger_pre_start
to have your app started automatically at web server boot.PassengerMaxPoolSize <desired_app_processes>
PassengerMinInstances <desired_app_processes>
PassengerPreStart
to have your app started automatically at web server boot.Multithreaded scenarios
In order to use multithreading you must use Phusion Passenger Enterprise. The open source version of Phusion Passenger does not support multithreading.
passenger start
with --max-pool-size=<CHOSEN_NUMBER_OF_PROCESSES> --min-instances=<CHOSEN_NUMBER_OF_PROCESSES> --concurrency-model=thread --thread-count=<desired_app_threads_per_process>
desired_app_processes
is 1, then you should also add --spawn-method=direct
. By using direct spawning instead of smart spawning, Phusion Passenger will not keep a Preloader process around, saving you some memory. This is because a Preloader process is useless when there's only 1 application process.passenger_max_pool_size <CHOSEN_NUMBER_OF_PROCESSES>;
passenger_min_instances <CHOSEN_NUMBER_OF_PROCESSES>;
passenger_concurrency_model thread;
passenger_thread_count <desired_app_threads_per_process>;
passenger_pre_start
to have your app started automatically at web server boot.desired_app_processes
is 1, then you should set passenger_spawn_method direct
. By using direct spawning instead of smart spawning, Phusion Passenger will not keep a Preloader process around, saving you some memory. This is because a Preloader process is useless when there's only 1 application process.PassengerMaxPoolSize <desired_app_processes>
PassengerMinInstances <desired_app_processes>
PassengerConcurrencyModel thread
PassengerThreadCount <desired_app_threads_per_process>
PassengerPreStart
to have your app started automatically at web server boot.desired_app_processes
is 1, then you should set PassengerSpawnMethod direct
. By using direct spawning instead of smart spawning, Phusion Passenger will not keep a Preloader process around, saving you some memory. This is because a Preloader process is useless when there's only 1 application process.Only if you're using the multithreaded concurrency model do you need to configure Rails. You need to enable thread-safety by setting config.thread_safe!
in config/environments/production.rb
. In Rails 4.0 this is on by default for the production environment, but in earlier versions you had to enable it manually.
You should also increase the ActiveRecord pool size because it limits concurrency. You can configure it in config/database.yml
. Set the pool
value to the number of threads per application process. But if you believe your database cannot handle that much concurrency, keep it at a low value.
Suppose you have:
Then the calculation is as follows:
# Use this formula for purely single-threaded multi-process deployments.
max_app_processes = (1024 * 0.75) / 150 = 5.12
desired_app_processes = max_app_processes = 5.12
Conclusion: you should use 5 or 6 processes. Phusion Passenger should be configured as follows:
# Standalone
passenger start --max-pool-size=5 --min-instances=5
# Nginx
passenger_max_pool_size 5;
passenger_min_instances 5;
# Apache
PassengerMaxPoolSize 5
PassengerMinInstances 5
However a concurrency of 5 or 6 is way too low if your application performs a lot of blocking I/O. You should use a multithreaded deployment instead, or you need to get more RAM so you can run more processes.
Suppose you have:
Then the calculation is as follows:
# Use this formula for purely single-threaded multi-process deployments.
max_app_processes = (1024 * 32 * 0.75) / 150 = 163.84
desired_app_processes = max_app_processes = 163.84
Conclusion: you should use 163 or 164 processes. This number seems high, but the value is correct. Because your app performs a lot of blocking I/O, you need a lot of I/O concurrency. The more concurrency the better. The amount of concurrency scales linearly with the number of processes, which is why you end up with such a large number.
Phusion Passenger should be configured as follows:
# Standalone
passenger start --max-pool-size=163 --min-instances=163
# Nginx
passenger_max_pool_size 163;
passenger_min_instances 163;
# Apache
PassengerMaxPoolSize 163
PassengerMinInstances 163
Note that in this example, 163-164 processes is merely the maximum number of processes that you can run, without overloading your RAM. It does not mean that you have enough concurrency for your application! If you need more concurrency, you should use a multithreaded deployment instead.
Consider the same machine as in example 2:
But this time you're using multithreading with 8 application processes (because you have 8 CPUs). How many threads do you need per process?
# Use this formula for multithreaded deployments.
max_app_threads_per_process
= ((1024 * 32 * 0.75) - (8 * 150)) / (150 / 10)
= 1558.4
Conclusion: you should use 1558 threads per process.
# Standalone
passenger start --max-pool-size=8 --min-instances=8 --concurrency-model=thread --thread-count=1558
# Nginx
passenger_max_pool_size 8;
passenger_min_instances 8;
passenger_concurrency_model thread;
passenger_thread_count 1558;
# Apache
PassengerMaxPoolSize 8
PassengerMinInstances *
PassengerConcurrencyModel thread
PassengerThreadCount 1558
Because of the huge number of threads, this only works on a 64-bit platform. If you're on a 32-bit platform, consider lowering the number of threads while raising the number of processes. For example, you can double the number of processes (to 16) and halve the number of threads (to 779).
If you're using Nginx then it does not need additional configuration. Nginx is evented and already supports a high concurrency out of the box.
If you're using Apache, then prefer the worker MPM (which uses a combination of processes and threads) or the event MPM (which is similar to the worker MPM, but better) over the prefork MPM (which only uses processes) whenever possible. PHP requires prefork, but if you don't use PHP then you can probably use one of the other MPMs. Make sure you set a low number of processes and a moderate to high number of threads.
Because Apache performs a lot of blocking I/O (namely HTTP handling), you should give it a lot of threads so that it has a lot of concurrency. Apache's concurrency must be somewhat larger than the total number of application processes or total number of application threads. When considering example 3, the Apache concurrency must be larger than 8 * 1558 = 12464
.
If you cannot use the event MPM, consider putting Apache behind an Nginx reverse proxy, with response buffering turned on on the Nginx side. This reliefs a lot of concurrency problems from Apache. If you can use the event MPM then adding Nginx to the mix does not provide many advantages.
Phusion Passenger supports turbocaching since version 4. Turbocaching is an HTTP cache built inside Phusion Passenger. When used correctly, the cache can accelerate your app tremendously.
To utilize turbocaching, you only need to set HTTP caching headers. Please refer to Google's HTTP caching tutorial. Phusion Passenger takes advantage of the HTTP headers automatically.
Phusion Passenger supports out-of-band garbage collection for Ruby apps. With this feature enabled, Phusion Passenger can run the garbage collector in between requests, so that the garbage collector doesn't delay the app as much. Please refer to the Users Guide for more information about this feature.
ab
because it's slow and buggy.siege
and httperf
because they cannot utilize multiple CPU cores.builtin
engine. This is the default.