A Node.js Infrastructure for Scalability, Fault Resilience, and Zero Downtime

Over the past few years, consumer applications and enterprise solutions have been rapidly adopting the Node.js stack. In this paper we explore three key areas relevant to Node.js infrastructure – scalability, resilience, and mature DevOps.

Challenges with Node

Underutilizing Multi-Core CPUs

Node.js inherently operates in a single-process execution model. However, most modern production-grade hardware has multiple CPU cores.

So the execution of Node.js on modern hardware results in heavy utilization of one CPU core (due to CPU affinity), while leaving the other CPU cores underutilized.

No Isolation Between The Server Engine and Your Biz Logic

Mature servers such as Apache Tomcat or Apache HTTPd bring some degree of process-isolation (or thread-isolation) between the server core and the developer’s business code. Faults in business code do not crash the core server engine itself.

Node.js does not inherently have such an isolation. Node developers typically initialize a web server and run business code within the same process. If the business code throws an exception, it causes the server itself to crash!

The Risks of A Weakly Typed Language

The weakly typed nature of JavaScript makes it easy for developers to leak defects – No compilation phase, and no type checking.

Which means, Javascript code is a lot more prone to bad references and null pointer exceptions that get discovered only at runtime. It is hard anticipate and catch every possible exception in advance. So weakly-typed languages bear greater risks in the runtime environment.

Try-Catch Doesn’t Suffice

A try-catch block will not catch exceptions that occur in an async callback function contained within itself. Such is the nature of async callbacks!

Which means, there is only a limited use of try-catch blocks for the purposes of error handling and defensive programming in Node.js

Slow Memory Leaks

The Node Package Manager has become the de-facto approach to import libraries (modules) into your Node application today. Developers are ‘trigger happy’ about the use of many public NPM modules in their application code.

However many npm modules exist in the wild, and have rarely been curated or intensively tested. Some of them misbehave, throw unexpected exceptions, or slowly leak memory at runtime. Moreover, such leaks may be difficult to catch in your profiling tests as the rate of the memory leak could be slow and only adds-up over time.

Utilizing Multi-Core CPUs

An Approach

To effectively utilize multiple CPU cores and to achieve a higher application throughput, one can think of the following approach:

  • Spawn multiple worker processes to execute our Node.js code.
  • The kernel scheduler will allocate these worker processes across the available CPU cores on the system: Since Linux kernels often prefer CPU-affinity, each worker process is likely to get allocated to a specific core for it’s entire lifetime.
  • Distribute inbound HTTP requests evenly across these worker processes (We see how this really happens later).
  • As inbound requests arrive, each worker process services the requests allocated to it and thus starts utilizing it’s own CPU core.

Comparisons With Apache MPM Pre-fork

At a first glance, this approach seems very similar to Apache’s MPM Pre-fork module, which spawns multiple child processes at startup and delegates incoming requests to them. However there is one key difference!

Process-per-CPU model of traditional servers.
Process-per-CPU model of traditional servers.

The Apache-Way of Doing Things

Apache’s I/O is blocking in nature. Which means, a child process will receive a request and will often block waiting for I/O to complete (Say, a file read from the disk). In order to increase the concurrency in this case, we spawn a pool of additional child processes. So while some processes are blocked, other processes can continue serving new inbound requests from the clients.

We can keep increasing our concurrency by increasing the number of child processes – but only to a certain limit!

Since each process has it’s own memory footprint and we soon reach a limit for the number of child processes that we can spawn without causing excessive thrashing of the virtual memory on this system. At some point, there will be too many child processes vying for attention from the OS scheduler, and the cost of swapping the process images in-and-out of the disk will be prohibitive.

The Node-Way of Doing Things

A Node process, on the other hand, does not block for I/O at all. Which means, the process can service more and more inbound requests until it’s own CPU core is nearly saturated.

Since each Node process has the ability to saturate it’s own CPU core, the number of Node processes required here to achieve high concurrency is equal to the number of CPUs on that machine. Fewer processes means, an overall lower memory footprint.

The Node.js cluster module adopts this approach. Let us explore the workings of the cluster module further in the next section.

Non-blocking Node.js processes.

Figure 2: Node’s non-blocking IO. One process per CPU core.

The Node.js Cluster Module

Spawning Workers

The primary Node.js process is called the master. Using the cluster module, the master can spawn additional worker processes and tell them which Node.js code to execute. This works much like the Unix fork() where a master process spawns child processes.

How the cluster module works in Node.js

IPC Channel Between Master and Workers

Whenever a new worker is spawned, the cluster module sets up an IPC (Inter Process Communication) channel between the master and that worker processes. Thru this IPC mechanism, the master and worker can exchange brief messages and socket descriptors with each other.

Listening to Inbound Connections

Once spawned, the worker process is ready to service inbound connections an invokes a listen() call on a certain HTTP port. Node.js internally rewires this call as follows:

The worker sends a message to the master (via the IPC channel), asking the master to listen on the specified port.
The master starts to listen on that port (if it is not already listening).
The master is now aware that a specific worker has indicated interest in servicing inbound requests arriving on that port.

While it may seem that the worker is invoking the listen(), the actual job of listening to inbound requests is done by the master itself.

Load Balancing Between Worker Processes

When an inbound request arrives, the master accepts the inbound socket, and adopts a round-robin mechanism to decide ‘which worker’ should this request be delegated to.

The master then hands-over the socket descriptor of this request to that worker over the IPC channel. This round-robin mechanism is part of the Node core and helps accomplish the load balancing of the inbound traffic between multiple workers.

Recommended Practices

Minimize Responsibilities of The Master

Let your master process do a minimal amount of work and be responsible only for:

  • Spawning worker processes at the start of your server.
  • Managing the lifecycle of your worker processes.
  • Delegating all inbound requests to workers.
  • Nothing else!

In particular, do not encapsulate your business logic in the master process. And do not load any unwanted npm modules in the master.

Most runtime errors are likely to occur due to buggy business code or npm modules used by your code. By encapsulating business code only in the workers, such errors will impact (or crash) a worker processes, but not impact your master process.

This gives you a stable, unhindered, master process which can ‘baby-sit’ worker processes at all times. As we shall see later, the master is responsible to manage workers and ensure that enough healthy workers are available to service your inbound traffic.

If the master itself crashes or is badly-behaved (because it ran buggy business code), there is no caretaker left for your workers anymore.

Replenishing Worker Processes

It is possible that worker processes die over time. This can happen due to various reasons (Running out of memory, receiving a Unix signal to forcefully kill itself, programming bug causes an abrupt crash in a certain path of execution).

When a worker dies, Node.js notifies your master process with an event. At that point, in the event handler, your master process should spawn a new worker process. This ensures that we have enough workers in our pool to service inbound traffic.

Gracefully Killing A Worker Process

As we shall see in later sections, there are several scenarios where you would like to gracefully shutdown (kill) a worker process in your cluster. Let us understand how we can accomplish a graceful worker shutdown:

  • Suppose, the master decides to elegantly kill a specific worker &om the present pool of workers in the cluster.
  • The master sends a signal or a message to that worker (via the IPC channel) asking the worker to gracefully kill itself.
  • At this point, that worker disconnects from the IPC channel, so it stops accepting new inbound HTTP requests from the master.
  • The worker attempts to gracefully finish any in-flight requests that it has already accepted in the past. (So that we don’t drop in-flight requests and we don’t send errors to our clients).
  • After giving itself time to gracefully complete in-flight requests, the worker attempts to close any resources it had acquired (DB connections, cache connections, sockets, file handles etc).
  • Then, the worker kills itself.
  • If the worker does not manage to kill itself elegantly within a certain window of time, say 10 seconds, the master decides to forcefully kill the worker by sending it a signal.

Dealing With Uncaught Exceptions

Even with meticulous programming and defensive tactics, unhandled exceptions are likely to occur at runtime, and your Node server has to deal with it. But how?

When an unhandled exception occurs, Node.js offers your worker process a way to ‘catch’ it. However, Node creator Ryan Dahl, mentions that the event loop is likely to be in an indeterminate state at that point in time.

So it is best to kill that worker process as soon as you can, and spawn a new worker as a replacement. Here is what you should do:

  • Return a HTTP 500 for the request that resulted in an unhandled exception.
  • Perform the steps for a graceful worker shutdown (as we’ve seen before) and then let the worker process kill itself.
  • When the master notices that a worker has died, it spawns a new worker at that point.

Consider the worker to be ‘unhealthy’ anytime it catches an unhanded exception. The above steps would be an elegant way to deal with the situation.

Periodic Roll-Over of the Cluster

We’ve seen earlier how some npm modules could potentially result in slow memory leaks. Sometimes your own own code could leak memory as well.

To keep the cluster healthy, it is recommended that your master periodically kill all worker processes and spawn new ones. But this needs to be done elegantly – If you kill all workers at once, there would be nobody left to do the work and the server’s throughput will drop momentarily.

Rolling worker processes.

We adopt a process called as slow rolling of workers as follows: • At regular intervals, say every 12 hours, the master initiates the rolling process.

  • The master chooses one worker from the present pool of workers in the cluster and decides to gracefully kill that worker proces. (We’ve already seen the steps to gracefully kill a worker in an earlier section).
  • At the same time, the master spawns a new worker process to replenish capacity.
  • Once the rollover of this worker is completed, the master picks the next worker from the initial pool to gracefully roll that one, and the process continues until all workers are ‘rolled over’.

Rolling workers is a great way to keep your node cluster healthy over elongated periods of time.

Preventing A ‘Self Inflicted’ Denial of Service Attack

So far we’ve looked at how our master can baby-sit worker processes and spawn new ones if a worker dies. But here is an interesting scenario to consider:

  • An inbound HTTP request results in execution of a specific (buggy) code that throws an uncaught exception.
  • The worker process kills itself, and then the master spawns a new worker right away!
  • Subsequent HTTP requests again result in uncaught exceptions in the new worker. The new worker decides to kill itself, and this self destructive cycle continues over and again.

The process of continuously spawning a new process over and again (in an indefinite loop), causes your OS to thrash. Anything else that is possibly running on that machine will get impacted too. This represents a ‘self inflicted’ denial of service attack.

The ultimate resolution to this problem would be to fix that buggy code and redeploy it. But until then, you need to add some safeguards to prevent such a run-away conditions from occurring on your machine:

  • When the master spawns a new worker, it watches if the new worker process survives for a certain number of seconds (threshold).
  • If the worker dies within that threshold of time, the master infers that something is seriously wrong within the mainline code itself (and not just within some corner case).
  • At this point, the master should dispatch a panic message into your logs, or invoke a HTTP call to an alerting service – that gets your rapid response team into action!
  • Also, the master should to throttle the rate of spawning new process at this point, so as not to thrash the OS or impact other things running on that machine.

It is likely that this machine would soon be devoid of any useful workers to serve any requests. But at least you have: (A) Notified your rapid response team to swing in action (B) Prevented a run- away condition in your OS.

In a subsequent section, we talk about how your master process can improvise this further and safeguard your server from deploying such buggy (self destructing) code.

Zero Downtime Restarts

The utopia for a mature DevOps is to have a zero downtime restart capability on the production servers. This would mean the following things:

  • The development team can push new code snapshots to a live server without shutting down the server itself (even for a moment).
  • All in-flight and ongoing requests continue to be processed normally without clients noticing any errors.
  • The new code cuts-over seamlessly and soon new client requests get served by the newly deployed code.

With Node.js this is not an utopia anymore. We have already looked at most of the ingredients which can make zero downtime restart a reality:

  • Suppose you have a running Node.js cluster that is serving Version 1 of your code. All the modules from your source code are already loaded and cached by Node’s module cache.
  • Now you place Version 2 of your code on the file system. And you send a signal to the master to initiate a graceful rollover of the entire cluster (We’ve seen those details earlier).
  • At this point, the master will gracefully kill one worker at a time, and span a new worker as a replenishment. This new worker will read Version 2 of your code and starting serving requests suing the Version 2 code.
  • As an additional safeguard, if the new worker dies within a short threshold of time, the master infers that the new code may have a buggy mainline and hence it does not proceed with the graceful restart for other workers.

There will be a brief period during which some of your workers are serving Version 1 of the code, and some others are already serving Version 2. This may or may not be okay, depending on the circumstances.

Wrapping Requests in Domains

We explored earlier how a simple try-catch block does not suffice to catch exceptions that occur within asynchronous callbacks.

Node.js has now introduced the concept of domains to elegantly handle asynchronous errors. Your implementation hence needs to do the following:

  • Wrap every inbound request in a domain. Wrap all event emitters from that request in that same domain.
  • Write an error handler on that domain which can catch runtime errors that occur within that domain.
  • When you encounter an error in this domain, gracefully kill the present worker process and re-spawn a new one.

This approach is similar to handling uncaught exceptions which we described earlier. The key difference is that we are wrapping individual requests within the scope of each domain. This helps isolate faults within a request (context) and dealing with that specific request more elegantly.

Delegating Work to a Front-Proxy

Terminating HTTPS

Node.js does have the capability to accept inbound HTTPS traffic, but this task is best done by a front-proxy such as Nginx which can run on the same machine as your Node server. Configure Nginx as a reverse proxy and let it terminate inbound SSL connections. Alternatively, a front load balancer could also do that for you.

Compressing HTTP Streams

There are npm modules to achieve gzip compression of HTTP streams. But you would rather delegate this job to the front-proxy such as Nginx. This way your Node.js server can focus on serving your core business logic and delegate such tasks to Nginx.

Traffic Throttling

We’ve spoken earlier about how a single Node.js process can, in theory, saturate a CPU core by accepting more and more inbound requests. In reality, you would not like to reach the peak of your machine capacity in production. The front-proxy can play an important role in traffic throttling and making sure your Node.js machine does not fully saturate.

Other Recommended Practices

Running as a Non Privileged User

This may be an obvious consideration when building any server runtime: For reasons of security, you do not want your Node.js processes to run with root privileges! So make sure you create a non-privileged user and have the node processes to run under that user on your system. This has been a standard guideline, of course, when creating any daemon on Linux.

Keeping IPC Messages Lean

The IPC channel between the master and child processes is only intended to exchange short, control messages. Do not abuse this channel to send large business payloads.


Building Internet-Scale Web Platforms with the Amazon Elastic Load Balancer


The Elastic Load Balancer distributes your Application’s inbound traffic to multiple Web Servers running on EC2 instances. This offers the following key benefits to your architecture:

Increased Throughput: This increases the capacity of your Web infrastructure to handle additional traffic (i.e. Horizontally scaling-out).

Avoiding Single Points of Failure: An individual Web Server is no longer a single point of failure, since traffic is distributed across multiple server instances. This makes your application much more resilient.

Scale Out with aLoad Balancer

Figure-1: ELB distributes inbound traffic across multiple EC2 instances.

Maintaining Healthier Servers: The risk of overloading or overwhelming a single Web server is now minimized due to distribution of traffic. This increases the chances of your individual Web servers staying healthier over much longer periods of time.

An architect needs to consider several critical aspects of a deployment such as:

  • How do I truly achieve ‘internet-scale’? What does my ‘scaled out’ architecture look like?
  • How does the ELB schedule incoming traffic?
  • What if my load balancer itself becomes a single point of failure?
  • How does my design guarantee fault-tolerance, resiliency, and high-availability?
  • What security features does the load balancer offer for my inflight traffic?

In this post, we answer these questions and also help you understand why the ELB is more effective than a home-brewed load balancing solution using Nginx or Apache.

Continue reading “Building Internet-Scale Web Platforms with the Amazon Elastic Load Balancer”