Detecting Homepage Defacement With Active Health Checks

Running the public‑facing website for any organization is no easy task. Considered by the business to be mission critical, a website requires 100% uptime and flawless stability. And yet, the same website is also expected to be updated frequently to meet the needs of the organization. This dichotomy of requirements between stability and constant change is one of the key challenges facing operations teams. But perhaps a bigger challenge is defending against bad actors, hackers, and “script kiddies”.

Homepage defacement is the digital equivalent of graffiti across the front of your storefront or headquarters. Homepage defacement is a constant threat, and when it happens, it can make for unwanted attention and much embarrassment. Those responsible for running the website may also find that homepage defacement can result in a difficult discussion with the CEO, or even become a career‑limiting event.

Security requires a multi‑layered approach. In this blog post, we’ll discuss how NGINX Plus can be used as part of a deep security stack that might also include firewalls, intrusion detection systems, and robust account management. We use the active health check capabilities of NGINX Plus to proactively monitor the website homepage for unexpected changes and to automatically serve the previous version if any such changes are detected.

Note: Active health checks with NGINX Plus are typically used for testing the general availability and end‑to‑end application health of backend systems. In this blog post, we’ll look at a security use case which can be used to supplement existing health checks. For more information on configuring health checks, see the Admin Guide.

Detecting Changes with ETag

To explore how NGINX Plus can be used to detect homepage defacement, we’ll use a simple test environment that represents the key components of a public‑facing website.

This simple solution uses the HTTP ETag of the homepage resource on the backend web server. Typically used for detecting changes to cached resources, by definition, the ETag value will change whenever the resource changes. We can obtain the ETag value by examining the headers for the home page.

$ curl -I
HTTP/1.1 200 OK
Date: Fri, 21 Jul 2017 15:05:25 GMT
Content-Type: text/html
Content-Length: 612
ETag: "58ad6e69-264"

We then define a health check that passes, provided that the same ETag is returned by the backend web server.

The match block defines the conditions that must be met by the >health_check directive on line 14, in order to pass. Here we require that the HTTP header ETag exactly matches the value obtained from the backend web server. By default, the health_check directive tests the root URI (/) so we need no further configuration to check the homepage.

This configuration effectively prevents any unauthorized changes to the homepage from being served. If the ETag changes, the health check will fail, and the backend web server will be considered unavailable. Clearly, this level of protection can have undesirable consequences. A site that is offline is arguably just as bad as one that has been defaced. Furthermore, routine and planned changes will cause an outage until the ETag value is updated in the NGINX configuration.

To address these issues, we enable content caching within NGINX and configure it so that when the health check fails, we continue to serve the last known good version. This configuration has the additional benefit of improving overall website performance by having NGINX serve cached content directly, instead of continually fetching it from the backend web server.

The proxy_cache_path directive on line 10 defines the location for the cached content. The keys_zone parameter defines a shared memory zone for the cache index (called hc_cache) and allocates 1 MB. The max_size parameter defines the point at which NGINX Plus starts to remove the least recently requested resources from the cache to make room for new cached items. The inactive parameter defines how long NGINX will keep the cached content on disk since it was last requested, regardless of Cache-Control attributes received from the backend web server.

Within the location block, the proxy_cache directive enables caching for this location by specifying the name of the shared memory zone used for storing the cache keys (hc_cache). The proxy_cache_valid directive tells NGINX how long it can serve a resource from cache before it’s considered to have expired. This value is overridden by any Cache-Control attributes received from the backend web server.

To meet the requirement of serving the last known good version of the homepage while the backend web server is considered unhealthy, we use the proxy_cache_use_stale directive to allow expired content to be served even when the backend is unavailable.

Scaling Out Change Detection with nginScript Hashing

Detecting changes to the ETag is a simple and effective solution, but it doesn’t scale from reverse proxying to load balancing. In a load balancing environment, a single instance of NGINX Plus might receive different ETag values from each of the backend web servers even though the content is the same.

Instead of relying on the ETag, a more robust way of detecting unexpected changes is by using a cryptographic hash of the actual content served from each backend web server. To do this, we provide an adjacent web service to each backend web server that provides the SHA‑256 hash value for any given URL. This adjacent web service, our hash service, will be used only by the NGINX Plus health check to obtain a hash value for the current homepage content on each backend web server.

The backend web server doesn’t need to run NGINX or NGINX Plus in order to provide the hash service. NGINX can be installed alongside the existing web server software for this purpose, and requires very little configuration.

The hash service is implemented as a simple TCP proxy within the Stream context and uses nginScript to perform the hash operation on the HTTP response received from the local web server on port 80. For more information about nginScript, see Introduction to nginScript. The code for the hash service appears below.

With this configuration in place, we obtain the SHA‑256 hash of the homepage by querying the hash service or by sending the homepage output to an external hashing library. Here we use OpenSSL to check if the results of the hash service are correct.

$ curl -I
HTTP/1.1 204 No Content
Hash: 38ffd4972ae513a0c79a8be4573403edcd709f0f572105362b08ff50cf6de521

$ curl -s | openssl dgst -sha256
(stdin)= 38ffd4972ae513a0c79a8be4573403edcd709f0f572105362b08ff50cf6de521

The hash service returns a minimal HTTP response, with only the hash value as a single HTTP header. The NGINX Plus configuration can now be modified to use the hash service for its health check.

The match block now defines a healthy response as one which includes a status code of 204 and a header of hash, with the value obtained from the hash service. In order for NGINX Plus to use the hash service on each backend web server, the health_check directive specifies the port number used by the hash service, which overrides the port specified in each server directive in the upstream group (lines 7–9).

We can expect the homepage content to be identical for each backend web server, so NGINX Plus removes a backend server from the upstream group if the homepage content changes. However, NGINX Plus continues to serve the last known good content from the cache so that users are unaffected. The NGINX Plus dashboard can then be used to highlight any backend servers that have failed health checks – another way in which NGINX Plus helps you deliver your site with performance, security, reliability, and scale.

Cover image
Microservices: From Design to Deployment
The complete guide to microservices development