NGINX.COM
Web Server Load Balancing with NGINX Plus

When we help NGINX users who are having problems, we often see the same configuration mistakes we’ve seen over and over in other users’ configurations – sometimes even in configurations written by fellow NGINX engineers! In this blog we look at 10 of the most common errors, explaining what’s wrong and how to fix it.

  1. Not enough file descriptors per worker
  2. The error_log off directive
  3. Not enabling keepalive connections to upstream servers
  4. Forgetting how directive inheritance works
  5. The proxy_buffering off directive
  6. Improper use of the if directive
  7. Excessive health checks
  8. Unsecured access to metrics
  9. Using ip_hash when all traffic comes from the same /24 CIDR block
  10. Not taking advantage of upstream groups

Mistake 1: Not Enough File Descriptors per Worker

The worker_connections directive sets the maximum number of simultaneous connections that a NGINX worker process can have open (the default is 512). All types of connections (for example, connections with proxied servers) count against the maximum, not just client connections. But it’s important to keep in mind that ultimately there is another limit on the number of simultaneous connections per worker: the operating system limit on the maximum number of file descriptors (FDs) allocated to each process. In modern UNIX distributions, the default limit is 1024.

For all but the smallest NGINX deployments, a limit of 512 connections per worker is probably too small. Indeed, the default nginx.conf file we distribute with NGINX Open Source binaries and NGINX Plus increases it to 1024.

The common configuration mistake is not increasing the limit on FDs to at least twice the value of worker_connections. The fix is to set that value with the worker_rlimit_nofile directive in the main configuration context.

Here’s why more FDs are needed: each connection from an NGINX worker process to a client or upstream server consumes an FD. When NGINX acts as a web server, it uses one FD for the client connection and one FD per served file, for a minimum of two FDs per client (but most web pages are built from many files). When it acts as a proxy server, NGINX uses one FD each for the connection to the client and upstream server, and potentially a third FD for the file used to store the server’s response temporarily. As a caching server, NGINX behaves like a web server for cached responses and like a proxy server if the cache is empty or expired.

NGINX also uses an FD per log file and a couple FDs to communicate with master process, but usually these numbers are small compared to the number of FDs used for connections and files.

UNIX offers several ways to set the number of FDs per process:

  • The ulimit command if you start NGINX from a shell
  • The init script or systemd service manifest variables if you start NGINX as a service
  • The /etc/security/limits.conf file

However, the method to use depends on how you start NGINX, whereas worker_rlimit_nofile works no matter how you start NGINX.

There is also a system‑wide limit on the number of FDs, which you can set with the OS’s sysctl fs.file-max command. It is usually large enough, but it is worth verifying that the maximum number of file descriptors all NGINX worker processes might use (worker_rlimit_nofile * worker_processes) is significantly less than fs.file‑max. If NGINX somehow uses all available FDs (for example, during a DoS attack), it becomes impossible even to log in to the machine to fix the issue.

Mistake 2: The error_log off Directive

The common mistake is thinking that the error_log off directive disables logging. In fact, unlike the access_log directive, error_log does not take an off parameter. If you include the error_log off directive in the configuration, NGINX creates an error log file named off in the default directory for NGINX configuration files (usually /etc/nginx).

We don’t recommend disabling the error log, because it is a vital source of information when debugging any problems with NGINX. However, if storage is so limited that it might be possible to log enough data to exhaust the available disk space, it might make sense to disable error logging. Include this directive in the main configuration context:

error_log /dev/null emerg;

Note that this directive doesn’t apply until NGINX reads and validates the configuration. So each time NGINX starts up or the configuration is reloaded, it might log to the default error log location (usually /var/log/nginx/error.log) until the configuration is validated. To change the log directory, include the -e <error_log_location> parameter on the nginx command.

Mistake 3: Not Enabling Keepalive Connections to Upstream Servers

By default, NGINX opens a new connection to an upstream (backend) server for every new incoming request. This is safe but inefficient, because NGINX and the server must exchange three packets to establish a connection and three or four to terminate it.

At high traffic volumes, opening a new connection for every request can exhaust system resources and make it impossible to open connections at all. Here’s why: for each connection the 4-tuple of source address, source port, destination address, and destination port must be unique. For connections from NGINX to an upstream server, three of the elements (the first, third, and fourth) are fixed, leaving only the source port as a variable. When a connection is closed, the Linux socket sits in the TIME‑WAIT state for two minutes, which at high traffic volumes increases the possibility of exhausting the pool of available source ports. If that happens, NGINX cannot open new connections to upstream servers.

The fix is to enable keepalive connections between NGINX and upstream servers – instead of being closed when a request completes, the connection stays open to be used for additional requests. This both reduces the possibility of running out of source ports and improves performance.

To enable keepalive connections:

  • Include the keepalive directive in every upstream{} block, to set the number of idle keepalive connections to upstream servers preserved in the cache of each worker process.

    Note that the keepalive directive does not limit the total number of connections to upstream servers that an NGINX worker process can open – this is a common misconception. So the parameter to keepalive does not need to be as large as you might think.

    We recommend setting the parameter to twice the number of servers listed in the upstream{} block. This is large enough for NGINX to maintain keepalive connections with all the servers, but small enough that upstream servers can process new incoming connections as well.

    Note also that when you specify a load‑balancing algorithm in the upstream{} block – with the hash, ip_hash, least_conn, least_time, or random directive – the directive must appear above the keepalive directive. This is one of the rare exceptions to the general rule that the order of directives in the NGINX configuration doesn’t matter.

  • In the location{} block that forwards requests to an upstream group, include the following directives along with the proxy_pass directive:

    proxy_http_version 1.1;
    proxy_set_header   "Connection" "";

    By default NGINX uses HTTP/1.0 for connections to upstream servers and accordingly adds the Connection: close header to the requests that it forwards to the servers. The result is that each connection gets closed when the request completes, despite the presence of the keepalive directive in the upstream{} block.

    The proxy_http_version directive tells NGINX to use HTTP/1.1 instead, and the proxy_set_header directive removes the close value from the Connection header.

Mistake 4: Forgetting How Directive Inheritance Works

NGINX directives are inherited downwards, or “outside‑in”: a child context – one nested within another context (its parent) – inherits the settings of directives included at the parent level. For example, all server{} and location{} blocks in the http{} context inherit the value of directives included at the http level, and a directive in a server{} block is inherited by all the child location{} blocks in it. However, when the same directive is included in both a parent context and its child context, the values are not added together – instead, the value in the child context overrides the parent value.

The mistake is to forget this “override rule” for array directives, which can be included not only in multiple contexts but also multiple times within a given context. Examples include proxy_set_header and add_header – having “add” in the name of second makes it particularly easy to forget about the override rule.

We can illustrate how inheritance works with this example for add_header:

http {
    add_header X-HTTP-LEVEL-HEADER 1;
    add_header X-ANOTHER-HTTP-LEVEL-HEADER 1;

    server {
        listen 8080;
        location / {
            return 200 "OK";
        } 
    }

    server {
        listen 8081;
        add_header X-SERVER-LEVEL-HEADER 1;

        location / {
            return 200 "OK";
        }

        location /test {
            add_header X-LOCATION-LEVEL-HEADER 1;
            return 200 "OK";
        }

        location /correct {
            add_header X-HTTP-LEVEL-HEADER 1;
            add_header X-ANOTHER-HTTP-LEVEL-HEADER 1;

            add_header X-SERVER-LEVEL-HEADER 1;
            add_header X-LOCATION-LEVEL-HEADER 1;
            return 200 "OK";
        } 
    }
}

For the server listening on port 8080, there are no add_header directives in either the server{} or location{} blocks. So inheritance is straightforward and we see the two headers defined in the http{} context:

% curl -is localhost:8080
HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:15 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-HTTP-LEVEL-HEADER: 1
X-ANOTHER-HTTP-LEVEL-HEADER: 1
OK

For the server listening on port 8081, there is an add_header directive in the server{} block but not in its child location / block. The header defined in the server{} block overrides the two headers defined in the http{} context:

% curl -is localhost:8081
HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:20 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-SERVER-LEVEL-HEADER: 1
OK

In the child location /test block, there is an add_header directive and it overrides both the header from its parent server{} block and the two headers from the http{} context:

% curl -is localhost:8081/test
HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:25 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-LOCATION-LEVEL-HEADER: 1
OK

If we want a location{} block to preserve the headers defined in its parent contexts along with any headers defined locally, we must redefine the parent headers within the location{} block. That’s what we’ve done in the location /correct block:

% curl -is localhost:8081/correct
HTTP/1.1 200 OK
Server: nginx/1.21.5
Date: Mon, 21 Feb 2022 10:12:30 GMT
Content-Type: text/plain
Content-Length: 2
Connection: keep-alive
X-HTTP-LEVEL-HEADER: 1
X-ANOTHER-HTTP-LEVEL-HEADER: 1
X-SERVER-LEVEL-HEADER: 1
X-LOCATION-LEVEL-HEADER: 1
OK

Mistake 5: The proxy_buffering off Directive

Proxy buffering is enabled by default in NGINX (the proxy_buffering directive is set to on). Proxy buffering means that NGINX stores the response from a server in internal buffers as it comes in, and doesn’t start sending data to the client until the entire response is buffered. Buffering helps to optimize performance with slow clients – because NGINX buffers the response for as long as it takes for the client to retrieve all of it, the proxied server can return its response as quickly as possible and return to being available to serve other requests.

When proxy buffering is disabled, NGINX buffers only the first part of a server’s response before starting to send it to the client, in a buffer that by default is one memory page in size (4 KB or 8 KB depending on the operating system). This is usually just enough space for the response header. NGINX then sends the response to the client synchronously as it receives it, forcing the server to sit idle as it waits until NGINX can accept the next response segment.

So we’re surprised by how often we see proxy_buffering off in configurations. Perhaps it is intended to reduce the latency experienced by clients, but the effect is negligible while the side effects are numerous: with proxy buffering disabled, rate limiting and caching don’t work even if configured, performance suffers, and so on.

There are only a small number of use cases where disabling proxy buffering might make sense (such as long polling), so we strongly discourage changing the default. For more information, see the NGINX Plus Admin Guide.

Mistake 6: Improper Use of the if Directive

The if directive is tricky to use, especially in location{} blocks. It often doesn’t do what you expect and can even cause segfaults. In fact, it’s so tricky that there’s an article titled If is Evil in the NGINX Wiki, and we direct you there for a detailed discussion of the problems and how to avoid them.

In general, the only directives you can always use safely within an if{} block are return and rewrite. The following example uses if to detect requests that include the X‑Test header (but this can be any condition you want to test for). NGINX returns the 430 (Request Header Fields Too Large) error, intercepts it at the named location @error_430 and proxies the request to the upstream group named b.

location / {
    error_page 430 = @error_430;
    if ($http_x_test) {
        return 430; 
    }

    proxy_pass http://a;
}

location @error_430 {
    proxy_pass b;
}

For this and many other uses of if, it’s often possible to avoid the directive altogether. In the following example, when the request includes the X‑Test header the map{} block sets the $upstream_name variable to b and the request is proxied to the upstream group with that name.

map $http_x_test $upstream_name {
    default "b";
    ""      "a";
}

# ...

location / {
    proxy_pass http://$upstream_name;
}

Mistake 7: Excessive Health Checks

It is quite common to configure multiple virtual servers to proxy requests to the same upstream group (in other words, to include the identical proxy_pass directive in multiple server{} blocks). The mistake in this situation is to include a health_check directive in every server{} block. This just creates more load on the upstream servers without yielding any additional information.

At the risk of being obvious, the fix is to define just one health check per upstream{} block. Here we define the health check for the upstream group named b in a special named location, complete with appropriate timeouts and header settings.

location / {
    proxy_set_header Host $host;
    proxy_set_header "Connection" "";
    proxy_http_version 1.1;
    proxy_pass http://b;
}

location @health_check {
    health_check;
    proxy_connect_timeout 2s;
    proxy_read_timeout 3s;
    proxy_set_header Host example.com;
    proxy_pass http://b;
}

In complex configurations, it can further simplify management to group all health‑check locations in a single virtual server along with the NGINX Plus API and dashboard, as in this example.

server {
	listen 8080;
 
	location / {
	    # …
 	}
 
	location @health_check_b {
	    health_check;
	    proxy_connect_timeout 2s;
	    proxy_read_timeout 3s;
	    proxy_set_header Host example.com;
	    proxy_pass http://b;
	}
 
	location @health_check_c {
	    health_check;
	    proxy_connect_timeout 2s;
	    proxy_read_timeout 3s;
	    proxy_set_header Host api.example.com;
	    proxy_pass http://c;
	}
 
	location /api {
	    api write=on;
	    # directives limiting access to the API (see 'Mistake 8' below)
	}
 
	location = /dashboard.html {
	    root   /usr/share/nginx/html;
	}
}

For more information about health checks for HTTP, TCP, UDP, and gRPC servers, see the NGINX Plus Admin Guide.

Mistake 8: Unsecured Access to Metrics

Basic metrics about NGINX operation are available from the Stub Status module. For NGINX Plus, you can also gather a much more extensive set of metrics with the NGINX Plus API. Enable metrics collection by including the stub_status or api directive, respectively, in a server{} or location{} block, which becomes the URL you then access to view the metrics. (For the NGINX Plus API, you also need to configure shared memory zones for the NGINX entities – virtual servers, upstream groups, caches, and so on – for which you want to collect metrics; see the instructions in the NGINX Plus Admin Guide.)

Some of the metrics are sensitive information that can be used to attack your website or the apps proxied by NGINX, and the mistake we sometimes see in user configurations is failure to restrict access to the corresponding URL. Here we look at some of the ways you can secure the metrics. We’ll use stub_status in the first examples.

With the following configuration, anyone on the Internet can access the metrics at http://example.com/basic_status.

server {
    listen 80;
    server_name example.com;

    location = /basic_status {
        stub_status;
    }
}

Protect Metrics with HTTP Basic Authentication

To password‑protect the metrics with HTTP Basic Authentication, include the auth_basic and auth_basic_user_file directives. The file (here, .htpasswd) lists the usernames and passwords of clients who can log in to see the metrics:

server {
    listen 80;
    server_name example.com;

    location = /basic_status {
        auth_basic “closed site”;
        auth_basic_user_file conf.d/.htpasswd;
        stub_status;
    }
}

Protect Metrics with the allow and deny Directives

If you don’t want authorized users to have to log in, and you know the IP addresses from which they will access the metrics, another option is the allow directive. You can specify individual IPv4 and IPv6 addresses and CIDR ranges. The deny all directive prevents access from any other addresses.

server {
    listen 80;
    server_name example.com;

    location = /basic_status {
        allow 192.168.1.0/24;
        allow 10.1.1.0/16;
        allow 2001:0db8::/32;
        allow 96.1.2.23/32;
        deny  all;
        stub_status;
    }
}

Combine the Two Methods

What if we want to combine both methods? We can allow clients to access the metrics from specific addresses without a password and still require login for clients coming from different addresses. For this we use the satisfy any directive. It tells NGINX to allow access to clients who either log in with HTTP Basic auth credentials or are using a preapproved IP address. For extra security, you can set satisfy to all to require even people who come from specific addresses to log in.

server {
    listen 80;
    server_name monitor.example.com;

    location = /basic_status {
        satisfy any;

        auth_basic “closed site”;
        auth_basic_user_file conf.d/.htpasswd;
        allow 192.168.1.0/24;
        allow 10.1.1.0/16;
        allow 2001:0db8::/32;
        allow 96.1.2.23/32;
        deny  all;
        stub_status;
    }
}

With NGINX Plus, you use the same techniques to limit access to the NGINX Plus API endpoint (http://monitor.example.com:8080/api/ in the following example) as well as the live activity monitoring dashboard at http://monitor.example.com/dashboard.html.

This configuration permits access without a password only to clients coming from the 96.1.2.23/32 network or localhost. Because the directives are defined at the server{} level, the same restrictions apply to both the API and the dashboard. As a side note, the write=on parameter to api means these clients can also use the API to make configuration changes.

For more information about configuring the API and dashboard, see the NGINX Plus Admin Guide.

server {
    listen 8080;
    server_name monitor.example.com;
 
    satisfy any;
    auth_basic “closed site”;
    auth_basic_user_file conf.d/.htpasswd;
    allow 127.0.0.1/32;
    allow 96.1.2.23/32;
    deny  all;

    location = /api/ {    
        api write=on;
    }

    location = /dashboard.html {
        root /usr/share/nginx/html;
    }
}

Mistake 9: Using ip_hash When All Traffic Comes from the Same /24 CIDR Block

The ip_hash algorithm load balances traffic across the servers in an upstream{} block, based on a hash of the client IP address. The hashing key is the first three octets of an IPv4 address or the entire IPv6 address. The method establishes session persistence, which means that requests from a client are always passed to the same server except when the server is unavailable.

Suppose that we have deployed NGINX as a reverse proxy in a virtual private network configured for high availability. We put various firewalls, routers, Layer 4 load balancers, and gateways in front of NGINX to accept traffic from different sources (the internal network, partner networks, the Internet, and so on) and pass it to NGINX for reverse proxying to upstream servers. Here’s the initial NGINX configuration:

http {

    upstream {
        ip_hash;
        server 10.10.20.105:8080;
        server 10.10.20.106:8080;
        server 10.10.20.108:8080;
    }
 
    server {# …}
}

But it turns out there’s a problem: all of the “intercepting” devices are on the same 10.10.0.0/24 network, so to NGINX it looks like all traffic comes from addresses in that CIDR range. Remember that the ip_hash algorithm hashes the first three octets of an IPv4 address. In our deployment, the first three octets are the same – 10.10.0 – for every client, so the hash is the same for all of them and there’s no basis for distributing traffic to different servers.

The fix is to use the hash algorithm instead with the $binary_remote_addr variable as the hash key. That variable captures the complete client address, converting it into a binary representation that is 4 bytes for an IPv4 address and 16 bytes for an IPv6 address. Now the hash is different for each intercepting device and load balancing works as expected.

We also include the consistent parameter to use the ketama hashing method instead of the default. This greatly reduces the number of keys that get remapped to a different upstream server when the set of servers changes, which yields a higher cache hit ratio for caching servers.

http {
    upstream {
        hash $binary_remote_addr consistent;
        server 10.10.20.105:8080;
        server 10.10.20.106:8080;
        server 10.10.20.108:8080;
    }

    server {# …}
}

Mistake 10: Not Taking Advantage of Upstream Groups

Suppose you are employing NGINX for one of the simplest use cases, as a reverse proxy for a single NodeJS‑based backend application listening on port 3000. A common configuration might look like this:

http {

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_set_header Host $host;
            proxy_pass http://localhost:3000/;
        }
    }
}

Straightforward, right? The proxy_pass directive tells NGINX where to send requests from clients. All NGINX needs to do is resolve the hostname to an IPv4 or IPv6 address. Once the connection is established NGINX forwards requests to that server.

The mistake here is to assume that because there’s only one server – and thus no reason to configure load balancing – it’s pointless to create an upstream{} block. In fact, an upstream{} block unlocks several features that improve performance, as illustrated by this configuration:

http {

    upstream node_backend {
        zone upstreams 64K;
        server 127.0.0.1:3000 max_fails=1 fail_timeout=2s;
        keepalive 2;
    }

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_set_header Host $host;
            proxy_pass http://node_backend/;
            proxy_next_upstream error timeout http_500;

        }
    }
}

The zone directive establishes a shared memory zone where all NGINX worker processes on the host can access configuration and state information about the upstream servers. Several upstream groups can share the zone. With NGINX Plus, the zone also enables you to use the NGINX Plus API to change the servers in an upstream group and the settings for individual servers without restarting NGINX. For details, see the NGINX Plus Admin Guide.

The server directive has several parameters you can use to tune server behavior. In this example we have changed the conditions NGINX uses to determine that a server is unhealthy and thus ineligible to accept requests. Here it considers a server unhealthy if a communication attempt fails even once within each 2-second period (instead of the default of once in a 10-second period).

We’re combining this setting with the proxy_next_upstream directive to configure what NGINX considers a failed communication attempt, in which case it passes requests to the next server in the upstream group. To the default error and timeout conditions we add http_500 so that NGINX considers an HTTP 500 (Internal Server Error) code from an upstream server to represent a failed attempt.

The keepalive directive sets the number of idle keepalive connections to upstream servers preserved in the cache of each worker process. We already discussed the benefits in Mistake 3: Not Enabling Keepalive Connections to Upstream Servers.

With NGINX Plus you can configure additional features with upstream groups:

  • We mentioned above that NGINX Open Source resolves server hostnames to IP addresses only once, during startup. The resolve parameter to the server directive enables NGINX Plus to monitor changes to the IP addresses that correspond to an upstream server’s domain name, and automatically modify the upstream configuration without the need to restart.

    The service parameter further enables NGINX Plus to use DNS SRV records, which include information about port numbers, weights, and priorities. This is critical in microservices environments where the port numbers of services are often dynamically assigned.

    For more information about resolving server addresses, see Using DNS for Service Discovery with NGINX and NGINX Plus on our blog.

  • The slow_start parameter to the server directive enables NGINX Plus to gradually increase the volume of requests it sends to a server that is newly considered healthy and available to accept requests. This prevents a sudden flood of requests that might overwhelm the server and cause it to fail again.

  • The queue directive enables NGINX Plus to place requests in a queue when it’s not possible to select an upstream server to service the request, instead of returning an error to the client immediately.

Resources

To try NGINX Plus, start your free 30-day trial today or contact us to discuss your use cases.

Hero image

Learn how to deploy, configure, manage, secure, and monitor your Kubernetes Ingress controller with NGINX to deliver apps and APIs on-premises and in the cloud.



About The Author

Timo Stark

Product Management Engineer

About The Author

Sergey Budnevich

Head of Professional Services for NGINX

About F5 NGINX

F5, Inc. is the company behind NGINX, the popular open source project. We offer a suite of technologies for developing and delivering modern applications. Together with F5, our combined solution bridges the gap between NetOps and DevOps, with multi-cloud application services that span from code to customer.

Learn more at nginx.com or join the conversation by following @nginx on Twitter.