NGINX.COM
Web Server Load Balancing with NGINX Plus

We all know that performance is critical to a website’s success. No one wants to use a slow website.

Since its initial open source release in 2004, NGINX has been synonymous with high‑performance websites. More of the world’s websites use NGINX than any other web server, and more than 350 million sites worldwide are now powered by NGINX. But how well does NGINX actually perform? What hardware configurations will yield the best performance at a reasonable price?

We previously published performance statistics for NGINX and NGINX Plus as a reverse proxy in our NGINX Plus Sizing Guide for Bare Metal Servers and detailed our testing methodolgy in NGINX Plus Sizing Guide: How We Tested.

Since publishing those pieces, we’ve received many requests for information about the performance of NGINX as a web server, so we went back into the lab. In this blog post, we present detailed performance numbers from our testing: requests per second (RPS) and connections per second (CPS) on live HTTP and HTTPS connections, and HTTP throughput on 50 dedicated channels. The results apply to both NGINX Open Source and NGINX Plus (none of the testing relies on features that are exclusive to NGINX Plus).

Our goal is that this information helps you determine the hardware specs you need to handle current and future traffic for your web application, taking into account your budget and performance needs.

Testing Overview

The testing setup we used is almost identical to the setup in NGINX Plus Sizing Guide: How We Tested, except there is no reverse proxy between the client and the web server. All tests were done using two separate machines connected together with dual 40‑GbE links in a simple, flat Layer 2 network.

We used 50 end-to-end connections between a client and a web server
to test NGINX performance

To simulate different numbers of CPUs in the tests, we varied the number of NGINX worker processes. By default, the number of NGINX worker processes that start executing equates to the number of CPUs available in the machine where NGINX is running. You can change the number of NGINX worker processes executing by changing the value of the worker_processes directive in the /etc/nginx/nginx.conf file and restarting the NGINX service.

For tests where client traffic was secured with HTTPS, we used the following encryption parameters:

  • ECDHE-RSA-AES256-GCM-SHA384 cipher
  • 2,048‑bit RSA key
  • Perfect forward secrecy (as indicated by ECDHE in the cipher name)
  • OpenSSL 1.0.1f

Hardware Used

We used the following hardware for the testing on both the client and web server machine:

  • CPU: 2x Intel(R) Xeon(R) CPU E5‑2699 v3 @ 2.30 GHz, 36 real (or 72 HT) cores
  • Network: 2x Intel XL710 40 GbE QSFP+ (rev 01)
  • Memory: 16 GB

Software Used

We used the following software for the testing:

  • Version 4.0.0 of wrk running on the client machine generated the traffic that NGINX proxied. We installed it according to these instructions.
  • Version 1.9.7 of NGINX Open Source ran on the web server machines. We installed it from the official repository at nginx.org according to these instructions.
  • Ubuntu Linux 14.04.1 ran on both client and web server machines.

Performance Metrics and Analysis

We obtained the following performance numbers from the tests. For the commands used, see NGINX Plus Sizing Guide: How We Tested.

Requests per Second

Requests per second (RPS) measures the ability to process HTTP requests. Each request is sent from the client machine to the NGINX web server. The tests were done for both unencrypted HTTP and encrypted HTTPS traffic.

Following common practice for performance testing, we used four standard file sizes:

  • 0 KB simulates an “empty” HTTP request or response with no accompanying data, such as a 302 error code.
  • 1 KB is roughly the size of small CSS or JavaScript files or a very small image, such as a small icon.
  • 10 KB approximates larger code files, larger icons, and small image files.
  • 100 KB represents large code files and other larger files.

Issuing small HTTP requests gives you more requests per second, with less total throughput. Issuing large HTTP requests gives you fewer requests per second and more throughput, as a single request initiates a large file transfer that takes an appreciable amount of time to complete.

RPS for HTTP Requests

The table and graph below show the number of HTTP requests for varying numbers of CPUs and varying request sizes, in kilobytes (KB).

CPUs 0 KB 1 KB 10 KB 100 KB
1 145,551 74,091 54,684 33,125
2 249,293 131,466 102,069 62,554
4 543,061 261,269 207,848 88,691
8 1,048,421 524,745 392,151 91,640
16 2,001,846 972,382 663,921 91,623
32 3,019,182 1,316,362 774,567 91,640
36 3,298,511 1,309,358 764,744 91,655

Large HTTP requests (such as the 10 and 100 KB sizes in the test) are fragmented and take longer to process. As a result, the lines in the graph for larger requests have flatter slopes.

When balancing budget versus performance, one interesting thing to note is that the slope of the lines changes as you pass 16 CPUs. Servers with 32 CPUs performed the same or better than those with 36 CPUs for 1 KB and 10 KB request sizes. Resource contention eventually outweighs the positive effect of adding more CPUs. This suggests that common server configurations for HTTP traffic of 4 to 8 cores might benefit strongly from adding CPUs up to a total of 16, less so from using 32, and with little or no benefit from moving to 36. However, as is always true with regard to testing, your mileage may vary.

RPS for HTTPS Requests

HTTPS RPS is lower than HTTP RPS for the same provisioned bare‑metal hardware because the data encryption and decryption necessary to secure data transmitted between machines is computationally expensive.

Nonetheless, continued advances in Intel architecture – resulting in servers with faster processors and better memory management – mean that the performance of software for CPU‑bound encryption tasks continually improves compared to dedicated hardware encryption devices.

Though RPS for HTTPS are roughly one‑quarter less than for HTTP at the 16‑CPU mark, “throwing hardware at the problem” – in the form of additional CPUs – is more effective than for HTTP, for the more commonly used file sizes and all the way up to 36 CPUs.

CPUs 0 KB 1 KB 10 KB 100 KB
1 71,561 40,207 23,308 4,830
2 151,325 85,139 48,654 9,871
4 324,654 178,395 96,808 19,355
8 647,213 359,576 198,818 38,900
16 1,262,999 690,329 383,860 77,427
32 2,197,336 1,207,959 692,804 90,430
36 2,175,945 1,239,624 733,745 89,842

Connections per Second

Connections per second (CPS) measures the ability of NGINX to create new TCP connections back to clients that have made requests. Clients send a series of HTTP or HTTPS requests, each on a new connection. NGINX parses the requests and sends back a 0‑byte response for each request. The connection is closed after the request is satisfied.

Note: The HTTPS variant of this test is often called SSL transactions per second (SSL TPS).

CPS for HTTP Requests

The table and graph show CPS for HTTP requests across different numbers of CPUs.

CPUs CPS
1 34,344
2 54,368
4 123,164
8 194,967
16 255,032
32 261,033
36 257,277

The graph resembles f(x) = √x, where x is the number of CPUs running. As for RPS, CPS growth flattens at around 16 CPUs, and there is a slight decrease in performance (here, in CPS) when we increase the number of CPUs from 32 to 36.

CPS for HTTPS Requests

The table and graph show CPS for HTTPS requests. Because of timing constraints, we did not run the tests with 32 CPUs.

CPUs CPS
1 428
2 869
4 1,735
8 3,399
16 6,676
24 10,274
36 10,067

We observe a higher rate of CPS increase the more we add CPUs. The graphic line flattens out at 24 CPUs. For SSL, throwing hardware at the problem works well.

Throughput

These tests measure the throughput of HTTP requests (in Gbps) that NGINX is able to sustain over a period of 180 seconds.

CPUs 100 KB 1 MB 10 MB
1 13 48 68
2 20 69 71
4 45 67 71
8 50 68 72
16 48 66 71
32 48 66 71
36 48 66 71

The throughput is proportional to the size of HTTP requests issued by the client machine. NGINX gets higher throughput when the file size is larger, as a given request results in the transmission of more data. However, performance peaks at around 8 CPUs; more is not necessarily beneficial for throughput‑heavy tasks.

Miscellaneous Notes

A few other notes on the testing and the results:

  • Hyper‑threading was available on the CPUs we tested, which means additional NGINX worker processes can run to use the full capacity of the hyper‑threading CPUs. We didn’t enable hyper‑threading for the tests reported here, but we did see improved performance with hyper‑threading in separate tests. Most notably, hyper‑threading improved SSL TPS by about 50%.
  • The numbers presented here are with OpenSSL 1.0.1. We also tested with OpenSSL 1.0.2 and saw about a 2x performance improvement. OpenSSL 1.0.1 is still far more widely used, but we recommend moving to OpenSSL 1.0.2, for better security as well as better performance.
  • We also tested elliptic curve cryptography (ECC), but the results presented here use RSA. For encryption, RSA is still far more widely used than ECC, though ECC is often deployed for mobile, where efficiency in power consumption is necessary. We saw 2x to 3x performance improvement with ECC compared to standard RSA certificates, and we recommend you consider implementing ECC.

The combination of moving to OpenSSL 1.0.2 and moving to ECC may bring very strong performance improvements. In addition, our results imply that if you currently use 4‑CPU or 8‑CPU servers, moving to 16 CPUs – or even 32 CPUs, for SSL – might result in really dramatic improvement.

Conclusion

We’ve analyzed performance test results for RPS and CPS on live HTTP and HTTPS connections, plus HTTP throughput on 50 dedicated channels. Use the information in this blog to help you decide what hardware specs you need to handle current and future traffic at your site given your budget and performance needs.

Hero image
Free O'Reilly eBook: The Complete NGINX Cookbook

Updated for 2024 – Your guide to everything NGINX



About The Author

Amir Rawdat

Solutions Engineer

Amir Rawdat is a technical marketing engineer at NGINX, where he specializes in content creation of various technical topics. He has a strong background in computer networking, computer programming, troubleshooting, and content creation. Previously, Amir was a customer application engineer at Nokia.

About F5 NGINX

F5, Inc. is the company behind NGINX, the popular open source project. We offer a suite of technologies for developing and delivering modern applications. Together with F5, our combined solution bridges the gap between NetOps and DevOps, with multi-cloud application services that span from code to customer.

Learn more at nginx.com or join the conversation by following @nginx on Twitter.