How dynamic scaling of Node.js application processes works on Passenger + Nginx

Passenger dynamically adjusts the number of application processes based on traffic. Learn how Passenger decides when a process should be added or removed.

Table of contents

Maximum process concurrency

Main article: Request load balancing

A core concept in dynamic process scaling is that of the maximum process concurrency. This is the maximum number of concurrent requests that a particular process can handle.

For Node.js applications, the maximum process concurrency is assumed to be unlimited. This is because Node.js uses evented I/O for concurrency which is very lightweight, so it can handle a virtually unlimited number of requests concurrently.

Having said that, as a Node.js process handles more concurrent requests, it is normal for performance to degrade. The practical concurrency is not really unlimited, and the limit varies for every application and every workload.

For dynamic process scaling to work, Passenger needs to know at approximately how many requests the performance of a single process begins to degrade. You will learn later why this is so and how this is done.

A new process is spawned when the concurrency limit is reached

Passenger keeps track of the number of requests a process is handling. When all processes have reached their maximum concurrency – that is, when they're handling exactly as many requests as their maximum concurrency indicate they can – then Passenger will decide to spawn a new process.

This behavior is deeply coupled to the request load balancing logic, so you should read up on that too.

Making dynamic scaling work by providing a hint to Passenger

Since Node.js applications are assumed to have unlimited concurrency by default, dynamic process scaling does not work out-of-the-box. Passenger will never think that your application process has reached its maximum concurrency. This is why you need to give Passenger a hint: you need to tell Passenger how many concurrent requests your application can handle without degrading performance. This way, Passenger will know when it should spawn more processes.

This is achieved with the configuration option passenger_force_max_concurrent_requests_per_process. For example, to tell Passenger that your application can handle at most 150 concurrent requests without degrading performance, set that option to 150. That way, Passenger will spawn more processes when all existing processes have reached 150 concurrent requests.

Example: maximum concurrency of 4

Suppose that you have 2 application processes, and the processes' maximum concurrency is configured to 4. When the application is idle, none of the processes are handling any requests:

Process A [    ]
Process B [    ]

When a new request comes in, Passenger may decide to route the request to process A.

Process A [*   ]
Process B [    ]

Suppose that, while that request is still in progress, 7 more requests come in. All processes will reach their maximum concurrency:

Process A [****]
Process B [****]

If another request comes in, none of the existing processes have enough concurrency to handle that. So Passenger will queue the request and spawn a new process:

Request queue [*       ]

Process A [****]
Process B [****]
Process C (spawning...)

When process C is done spawning, or when one of the existing processes is done with their request (and are no longer at their maximum concurrency), then Passenger will route the queued request to either of those processes.

Suppose C finishes spawning immediately after, then the situation looks like this:

Request queue [        ]
                   |
Process A [****]   | queued request
Process B [****]   | is routed to C
Process C [*   ] <-+

A process is shut down when it becomes idle

When a process hasn't processed any requests for a while, it is said to be "idle". Idle processes are shut down in order to conserve resources during periods of low traffic.

Process limits

The minimum and maximum amount of processes depend on various configuration options, such as passenger_max_pool_size, passenger_min_instances and passenger_max_instances. Passenger won't ever scale the number of processes past the limits set by those configuration options.

Disabling dynamic process scaling

You can disable dynamic process scaling by setting passenger_min_instances and passenger_max_instances to the same number. The advantage of this is that it will make your server a bit faster, because process spawning is expensive. The disadvantage is that Passenger will not be able to free up processes in order to conserve resources during times of low traffic.