How dynamic scaling of Python application processes works on Passenger Standalone

Passenger dynamically adjusts the number of application processes based on traffic. Learn how Passenger decides when a process should be added or removed.

Table of contents

Maximum process concurrency

Main article: Request load balancing

A core concept in dynamic process scaling is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.

For Python applications, the maximum process concurrency is always 1.

A new process is spawned when the concurrency limit is reached

Passenger keeps track of the number of requests a process is handling. When all processes have reached their maximum concurrency – that is, when they're handling exactly as many requests as their maximum concurrency indicate they can – then Passenger will decide to spawn a new process.

This behavior is deeply coupled to the request load balancing logic, so you should read up on that too.

Example: maximum concurrency of 1

Suppose that you have 3 application processes, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:

Process A [ ]
Process B [ ]
Process C [ ]

When a new request comes in, Passenger may decide to route the request to process A. Now A has reached its maximum concurrency:

Process A [*]
Process B [ ]
Process C [ ]

Suppose that, while that request is still in progress, two new requests come in. Because A has reached its maximum concurrency, Passenger routes these two new requests to B and C. Now all requests have reached their maximum concurrency:

Process A [*]
Process B [*]
Process C [*]

If a new request comes in while all previous 3 requests are still in progress, then Passenger will decide to spawn a new process. This new incoming request is put in a queue:

Request queue [*       ]

Process A [*]
Process B [*]
Process C [*]
Process D (spawning...)

When process D is done spawning, or when one of the existing processes is done with their request (and are no longer at their maximum concurrency), then Passenger will route the queued request to either of those processes.

Suppose D finishes spawning immediately after, then the situation looks like this:

Request queue [        ]
                |
Process A [*]   |
Process B [*]   |  queued request
Process C [*]   |  is routed to D
Process D [*] <-+

A process is shut down when it becomes idle

When a process hasn't processed any requests for a while, it is said to be "idle". Idle processes are shut down in order to conserve resources during periods of low traffic.

Process limits

The minimum and maximum amount of processes depend on various configuration options, such as --max-pool-size / "max_pool_size", and --min-instances / "min_instances". Passenger won't ever scale the number of processes past the limits set by those configuration options.

Disabling dynamic process scaling

You can disable dynamic process scaling by setting --min-instances / "min_instances" and --max-pool-size / "max_pool_size" to the same number. The advantage of this is that it will make your server a bit faster, because process spawning is expensive. The disadvantage is that Passenger will not be able to free up processes in order to conserve resources during times of low traffic.