This blog post includes contributions from several NGINX team members, including Valentin Bartenev and Nick Shadrin.
Over the past few years, I’ve worked with a handful of partners where NGINX Plus performance was their primary concern. The conversation typically starts with challenges on their end hitting our published performance benchmarks. The challenge usually comes from the partner jumping straight to a fixed use case, such as using their existing SSL keys or targeting very large file payloads, and then seeing sub-par performance from NGINX Plus.
To a certain degree, this is expected behavior. I always like to explain to partners that as a software component, NGINX Plus can run at near line-rate speeds on any hardware that’s available to us when dealing with the most basic HTTP use case. In order to hit our published numbers with specific use cases, though, often NGINX Plus benefits from NGINX configuration, low-level OS, and hardware settings tweaking.
In every case to date, our partners have been able to achieve the theoretical performance numbers with very unique use cases simply by focusing on the components of the OS and hardware settings which need to be configured to match their use case and with how NGINX Plus interacts with those components.
Over the years, I’ve compiled this list of NGINX configuration, OS, and hardware tips, tricks, and tweaks. It’s intended to help NGINX partners and customers achieve higher performance with the open source NGINX software and NGINX Plus, with their specific use cases.
This document should only be used as a guide to a subset of configuration settings that can impact performance; it’s not an exhaustive list, nor should every setting below necessarily be changed in your environment.
Note: We revised this blog post Tuesday, December 19th, and we are reviewing it further to make it as strong as possible. Please add your own suggested changes and additions in the Comments below; we’ll incorporate what we can into the blog post.
Starting on Tuning
I generally recommend the following workflow when tackling performance-tuning issues:
- Start with performance testing NGINX Plus in the most generic HTTP use case possible. This will allow you to set your own benchmarking baseline in your environment first.
- Next, identify your specific use case. If, for instance, your application requires large file uploads, or if you’ll be dealing with high-security large SSL key sizes, define the end-goal use case first.
- Configure NGINX Plus for your use case and re-test to determine the delta between theoretical performance in your environment and real-world performance with your use case.
- Begin tweaking one setting at a time by focusing on the settings that most apply to your use case. In other words, don’t change a bunch of
systemctlconfigs while also adding new NGINX directives at the same time. Start small, and start with the features that are most applicable to your use case. For example, change SSL key types and sizes first, if high security is critical for your environment.
- If the change doesn’t impact performance, revert the setting back to the default. As you progress through each individual change, you’ll start to see a pattern where like settings tend to affect performance together. This will allow you to home in on the groups of settings that you can later tweak together as needed.
It’s important to note that every deployment environment is unique and comes with its own networking and application performance requirements. It may not be advisable to change some of these values in production. Results of any configuration tweaks outlined below can result in dramatically different results based on the application type and networking topology.
With NGINX having such strong roots in the open source community, many people over the years have contributed back to the performance conversations. Where applicable, I’ve included links to external resources for specific performance-tuning suggestions from people who have already battle-tested many of these solutions in production.
NGINX Config Tuning
Please refer to NGINX documentation on details about configuring any of the below values, default settings, and the scope within which each setting is supported.
This section describes how to remove slow and unnecessary ciphers from OpenSSL and NGINX.
When SSL performance is paramount, it’s always a good idea to try different key sizes and types in your environment – finding the balance between longer keys for increased security and shorter keys for faster performance, based on your specific security needs. An easy test is to move from more traditional RSA keys to Elliptical Curve Cryptography (ECC), which uses smaller key sizes (and is therefore computationally faster) for the same level of security.
To generate quick, self-signed ECC keys for testing:
openssl ecparam -out ./nginx-ecc-p256.key -name prime256v1 -genkey
openssl req -new -key ./nginx-ecc-p256.key -out ./nginx-ecc-p256-csr.pem -subj '/CN=localhost'
openssl req -x509 -nodes -days 30 -key ./nginx-ecc-p256.key -in ./nginx-ecc-p256-csr.pem -out ./nginx-ecc-p256.pem
Caching and Compression
Setting gzip parameters incorrectly can decrease NGINX performance, as they allow granular control over how NGINX delivers content. Enabling gzip can save bandwidth, improving page load time on slow connections. (In local, synthetic benchmarks, enabling gzip might not show the same benefits that you will see in the real world.) Try these settings for optimum performance:
- Do not increase the compression level, as this costs CPU effort without a commensurate increase in throughput
- Evaluate the effect of enabling compression by enabling and disabling gzip for different types and sizes of content.
More information on granular gzip control can be found in the NGINX documentation for the gzip module.
Please refer to the NGINX documentation for details on each one of these configuration options, proper syntax, scope of application (HTTP, server, location), etc. These are options that don’t fit in any specific category:
multi_accept is disabled, a worker process will accept one new connection at a time. If enabled, a worker process will accept all new connections at a time. It’s generally best to leave this setting to the default value (off, in recent versions), unless you’re sure there’s a benefit to changing it. Start performance testing with this setting disabled to better measure predictable scale.
accept_mutex is enabled, worker processes will accept new connections by turns. Otherwise, all worker processes will be notified about new connections.
Under some high loads, it may be better to change the accept_mutex setting. Leave this setting to the default unless you have extensive knowledge of your app’s performance and the opportunity to test under a variety of conditions.
More information on these settings can be found in the NGINX documentation for the
When buffering is disabled, the response is passed to a client synchronously, immediately as it is received, increasing the load on NGINX. This is only required for applications that immediate access to the data stream.
Logging is an important tool for managing and auditing your system. However, logging large amounts of data, and storing large logs, can strain system resources. We only advise disabling logging in very specific cases or for performance troubleshooting.
access_log /path/to/access.log main buffer=16k; This is how to add buffering to access logs.
access_log off; This is how to disable access logging.
You may benefit from a centralized logging system based on the syslog protocol, available from many open source projects and commercial vendors. If you need metrics (which aggregate information initially recorded in logs) for NGINX servers, you can use NGINX Amplify.
keepalive directive to enable keepalive connections from NGINX Plus to upstream servers, defining the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed. Without keepalives you are adding more overhead and being inefficient with both connections and ephemeral ports.
keepalive connections to your upstream servers, you must also use the
proxy_http_version directive to tell NGINX Plus to use HTTP version 1.1, and the
proxy_set_header directive to remove any headers named Connection. Both directives can be placed in the http, server, or location configuration blocks.
proxy_set_header Connection "";
reuseport parameter enables the SO_REUSEPORT function in NGINX, enabling port sharding.
listen 80 reuseport;
For more information, please refer to our blog post on socket sharding.
Thread pooling consists of a task queue and a number of threads that handle the queue. When a worker process needs to do a potentially long operation, instead of processing the operation by itself, it puts a task in the pool’s queue, from which it can be taken and processed by any free thread.
Enabling and disabling thread pulling in NGINX is relatively straightforward in .conf files:
The way thread pools are managed, however, can be affected by other buffer-related configuration settings. For complete information on tweaking other settings to support thread pooling, please refer to our blog post on thread pools.
CPU affinity is used to control which CPUs NGINX utilizes for individual worker processes (find background on CPU affinity in the NGINX documentation):
worker_processes auto; is the default and the most commonly needed value. It sets the number of processes equal to the number of available CPU cores.
However, when NGINX is running in a containerized environment, such as a Docker container, the number of cores assigned to that container by the sysadmin might be less than the number available on the host machine. In this case, NGINX will detect the host number, and rotate workers among the cores that are actually available within that container. In that case, reduce the number of workers by setting worker_processes to the number of cores available within that container.
It’s always best to load NGINX with traffic similar to your production traffic. However, for basic testing, you can use a load generator such as wrk, as described here.
Load NGINX with a quick wrk session:
# wrk -t 1 -c 50 -d 20s http://localhost/1k.bin
If necessary, you can create a simple 1k.bin file for testing with:
# dd if=/dev/zero of=1kb.bin bs=1024 count=1
Run top in CPU view mode (by pressing 1 after top starts).
You can repeat with different numbers of processes and affinity bindings to see the linear scale. That’s an effective way to set the access limit to the appropriate subset of available cores.
General Sizing and Testing
Here’s a very rough sizing approximation for general web server and load balancing functionality (may not be as applicable for VOD streaming or CDN):
- 1 cpu core per 1-2 Gb/s of unencrypted traffic.
- Small (1-2KB) responses and one response per connection will increase CPU load.
- 1GB for OS and other general needs.
- The rest is divided among Nginx buffers, socket buffers, and virtual memory cache. Rough estimate is 1MB per connection.
proxy_bufferssize should be chosen to avoid disk i/o. If response size is larger than (
proxy_buffer_size) the response may be written to disk, thus increasing I/O, response time, etc.
Sizing shared memory zones:
- On the surface, NGINX zones are used to store data shared by multiple upstream servers, such as status, metrics, cookies, healthchecks, etc.
- Zone can also affect how NGINX distributes load between various components such as worker processes, however. For full documentation on what components a zone stores and effects, please refer to this section of the load balancing Admin Guide.
There are no exact settings due to quite different usage patterns. Each feature, such as sticky cookie/route/learn load balancing, health checks, or re-resolving will affect the zone size. For example, the 256Kb zone with the
sticky_route session persistence method and a single health check can hold up to:
- 128 servers (adding a single peer by specifying
- 88 servers (adding a single peer by specifying
hostname:port;hostname resolves to a single IP)
- 12 servers (adding multiple peers by specifying
hostname:port,hostname resolves to many IPs)
- When creating zones, it’s important to note that the shared memory area is controlled by the name of the zone. If you use the same name for all zones, then all data from all upstreams will be stored in that zone. In this case, the size may be exceeded.
- The limiting factor for disk I/O is a number of i/o operations per second (iops).
- NGINX depends on disk I/O and iops for a number of functions, including logging and caching.
See notes above for specific settings with regard to logging and caching.