How dynamic scaling of Ruby application processes works on Passenger + Nginx

Passenger dynamically adjusts the number of application processes based on traffic. Learn how Passenger decides when a process should be added or removed.

Table of contents

Maximum process concurrency

Main article: Request load balancing

A core concept in dynamic process scaling is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.

For Ruby applications, the maximum process concurrency is assumed to be 1. This means that Passenger assumes each process can handle 1 request at a time.

This can be changed by setting passenger_concurrency_model to thread, and by setting passenger_thread_count. If you do that, the assumed maximum process concurrency will equal the number of configured threads. This reflects the fact that each thread can handle 1 request at a time.

A new process is spawned when the concurrency limit is reached

Passenger keeps track of the number of requests a process is handling. When all processes have reached their maximum concurrency – that is, when they're handling exactly as many requests as their maximum concurrency indicate they can – then Passenger will decide to spawn a new process.

This behavior is deeply coupled to the request load balancing logic, so you should read up on that too.

Example: maximum concurrency of 1

Suppose that you have 3 application processes, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:

Process A [ ]
Process B [ ]
Process C [ ]

When a new request comes in, Passenger may decide to route the request to process A. Now A has reached its maximum concurrency:

Process A [*]
Process B [ ]
Process C [ ]

Suppose that, while that request is still in progress, two new requests come in. Because A has reached its maximum concurrency, Passenger routes these two new requests to B and C. Now all requests have reached their maximum concurrency:

Process A [*]
Process B [*]
Process C [*]

If a new request comes in while all previous 3 requests are still in progress, then Passenger will decide to spawn a new process. This new incoming request is put in a queue:

Request queue [*       ]

Process A [*]
Process B [*]
Process C [*]
Process D (spawning...)

When process D is done spawning, or when one of the existing processes is done with their request (and are no longer at their maximum concurrency), then Passenger will route the queued request to either of those processes.

Suppose D finishes spawning immediately after, then the situation looks like this:

Request queue [        ]
                |
Process A [*]   |
Process B [*]   |  queued request
Process C [*]   |  is routed to D
Process D [*] <-+

Example: maximum concurrency of 4

Suppose that you have 2 application processes, and you configured the number of threads to 4, causing each process's maximum concurrency to be 4. When the application is idle, none of the processes are handling any requests:

Process A [    ]
Process B [    ]

When a new request comes in, Passenger may decide to route the request to process A.

Process A [*   ]
Process B [    ]

Suppose that, while that request is still in progress, 7 more requests come in. All processes will reach their maximum concurrency:

Process A [****]
Process B [****]

If another request comes in, none of the existing processes have enough concurrency to handle that. So Passenger will queue the request and spawn a new process:

Request queue [*       ]

Process A [****]
Process B [****]
Process C (spawning...)

When process C is done spawning, or when one of the existing processes is done with their request (and are no longer at their maximum concurrency), then Passenger will route the queued request to either of those processes.

Suppose C finishes spawning immediately after, then the situation looks like this:

Request queue [        ]
                   |
Process A [****]   | queued request
Process B [****]   | is routed to C
Process C [*   ] <-+

A process is shut down when it becomes idle

When a process hasn't processed any requests for a while, it is said to be "idle". Idle processes are shut down in order to conserve resources during periods of low traffic.

Process limits

The minimum and maximum amount of processes depend on various configuration options, such as passenger_max_pool_size, passenger_min_instances and passenger_max_instances. Passenger won't ever scale the number of processes past the limits set by those configuration options.

Disabling dynamic process scaling

You can disable dynamic process scaling by setting passenger_min_instances and passenger_max_instances to the same number. The advantage of this is that it will make your server a bit faster, because process spawning is expensive. The disadvantage is that Passenger will not be able to free up processes in order to conserve resources during times of low traffic.