In This Section

High Availability Support Based on Keepalived

NGINX Plus Release 6 (R6) and later includes a solution for fast and easy configuration of NGINX Plus in an active-passive high-availability (HA) setup. It is based on Keepalived.

The Keepalived open source project provides the keepalive daemon for Linux servers, an implementation of the Virtual Router Redundancy Protocol (VRRP) to manage virtual routers (virtual IP addresses), and a health check facility to determine whether a service (for example, a web server, PHP backend, or database server) is up and operational. If a service on a node fails the configured number of health checks, Keepalived reassigns the virtual IP address from the master (active) node to the backup (passive) node.

VRRP ensures that there is a master node at all times. The backup node listens for VRRP advertisement packets from the master node. If it does not receive an advertisement packet for a period longer than three times the configured advertisement interval, the backup node takes over as master and assigns the configured virtual IP addresses to itself.

Configuring High Availability

Note: This solution is designed to work for environments where IP addresses can be controlled through standard operating systems calls and often will not work in cloud environments where IP addresses are controlled through interfacing with the cloud infrastructure.

Run the nginx-ha-setup script (available in the nginx-ha-keepalived package, which must be installed in addition to the base NGINX Plus package) on both nodes as the root user. The script configures a high-availability NGINX Plus environment with an active-passive pair of nodes acting as master and backup. It prompts for the following data:

  • IP address of the local and remote nodes (one of which will be configured as a master, the other as a backup)
  • One free IP address to be used as the cluster endpoint’s (floating) virtual IP address

The configuration of the Keepalived daemon is recorded in the file /etc/keepalived/keepalived.conf. The configuration blocks in the file control notification settings, the virtual IP addresses to manage, and the health checks to use to test the services that rely on virtual IP addresses. Following is the configuration file created by the nginx-ha-setup script on a CentOS 7 machine. Note that this is not an NGINX Plus configuration file, so the syntax is different (semicolons are not used to delimit directives, for example).

global_defs {
    vrrp_version 3
}

vrrp_script chk_manual_failover {
    script   "/usr/libexec/keepalived/nginx-ha-manual-failover"
    interval 10
    weight   50

vrrp_script chk_nginx_service {
    script   "/usr/libexec/keepalived/nginx-ha-check"
    interval 3
    weight   50
}

vrrp_instance VI_1 {
    interface                  eth0
    priority                   101
    virtual_router_id          51
    advert_int                 1
    accept
    garp_master_refresh        5
    garp_master_refresh_repeat 1
    unicast_src_ip             192.168.100.100

    unicast_peer {
        192.168.100.101
    }

    virtual_ipaddress {
        192.168.100.150
    }

    track_script {
        chk_nginx_service
        chk_manual_failover
    }

    notify "/usr/libexec/keepalived/nginx-ha-notify"
}

Describing the entire configuration is beyond the scope of this article, but a few items are worth noting:

  • Each node in the HA setup needs its own copy of the configuration file, with values for the priority, unicast_src_ip, and unicast_peer directives that are appropriate to the node’s role (master or backup).
  • The priority directive controls which host becomes the master, as explained in the next section.
  • The notify directive names the notification script included in the distribution, which can be used to generate syslog messages (or other notifications) when a state transition or fault occurs.
  • The value 51 for the virtual_router_id directive in the vrrp_instance VI_1 block is a sample value; change it as necessary to be unique in your environment.
  • If you have multiple pairs of Keepalived instances (or other VRRP instances) running in your local network, create a vrrp_instance block for each one, with a unique name (like VI_1 in the sample) and virtual_router_id number.

Using a Health-Checking Script to Control Mastership

There is no fencing mechanism in Keepalived. If the two nodes in a pair are not aware of each other, each assumes it is the master and assigns the virtual IP address to itself. To prevent this situation, the configuration file defines a script-execution mechanism called chk_nginx_service that runs a script regularly to check whether NGINX Plus is operational, and adjusts the local node’s priority based on the script’s return code. Code 0 (zero) indicates correct operation, and code 1 (or any nonzero code) indicates an error.

In the sample configuration of the script, the weight directive is set to 50, which means that when the check script succeeds (returning code 0):

  • The priority of the first node (which has a base priority of 101) is set to 151.
  • The priority of the second node (which has a base priority of 100) is set to 150.

The first node has higher priority (151 in this case) and becomes master.

The interval directive specifies how often the check script executes, in seconds (3 seconds in the sample configuration) file. Note that the check fails if the timeout is reached (by default, the timeout is the same as the check interval).

The rise and fall directives (not used in the sample configuration file) specify how many times the script must succeed or fail before action is taken.

The nginx-ha-check script provided with the nginx-ha-keepalived package checks if NGINX is up. We recommend creating additional scripts as appropriate for your local setup.

Displaying Node State

To see which node is currently the master for a given virtual IP address, run the ip addr show command for the interface on which the VRRP instance is defined (in the following commands, interface eth0 on nodes centos7-1 and centos7-2):

centos7-1 # ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
     UP qlen 1000
    link/ether 52:54:00:33:a5:a5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.100/24 brd 192.168.122.255 scope global dynamic eth0
       valid_lft 3071sec preferred_lft 3071sec
    inet 192.168.100.150/32 scope global eth0
       valid_lft forever preferred_lft forever
centos7-2 # ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
     UP qlen 1000
    link/ether 52:54:00:33:a5:87 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.101/24 brd 192.168.122.255 scope global eth0
       valid_lft forever preferred_lft forever

In this output, the second inet line for centos7-1 indicates that it is master – the defined virtual IP address (192.168.100.150) is assigned to it. The other inet lines show its real IP address (192.168.100.100) and the backup node’s IP address (192.168.100.101).

A node’s current state is recorded in the local /var/run/nginx-ha-keepalived.state file. You can use the cat command to display it:

centos7-1 # cat /var/run/nginx-ha-keepalived.state
STATE=MASTER
centos7-2 # cat /var/run/nginx-ha-keepalived.state
STATE=BACKUP

Since version 1.1 of the nginx-ha-keepalived package, it is possible to dump VRRP extended statistics and data to the filesystem using the following command:

centos7-1 # service keepalived dump

This command will send signals to the running keepalived proccess to print current states to /tmp/keepalived.stats and /tmp/keepalived.data.

Forcing a State Change

To force the master node to become the backup, run the following command on it:

# service keepalived stop

As it shuts down, Keepalived sends a VRRP packet with priority 0 to the backup node, which causes the backup node to take over the virtual IP address.

If your cluster was configured with version 1.1 of nginx-ha-keepalived package, there is a simpler way of forcing the state change by running:

# touch /var/run/keepalived-manual-failover

This command will create a file checked by the script defined in a vrrp_script
chk_manual_failover
block. If the file exists, Keepalived will lower the priority of the master node, which causes the backup node to take over the virtual IP address.

Adding More Virtual IP Addresses

The configuration created by the nginx-ha-setup script is very basic, and makes a single IP address highly available.

To make more than one IP address highly available:

  1. Add each new IP address to the virtual_ipaddress block in the /etc/keepalived/keepalived.conf configuration file:

    virtual_ipaddress {
        192.168.100.150
        192.168.100.200
    }

    The syntax in the virtual_ipaddress block replicates the syntax of the ip utility.

  2. Run the service keepalived reload command on both nodes to reload the keepalived service:

    centos7-1 # service keepalived reload
    centos7-2 # service keepalived reload

IPv6/Dualstack configurations

Since version 1.2.20 (nginx-ha-keepalived 1.1), Keepalived no longer supports mixing the IPv4 and IPv6 addresses in one VRRP instance/virtual_ipaddress block, since it contradicts the VRRP standard.

There are two ways to configure dual-stack VRRP HA

  1. Add the virtual_ipaddress_excluded block with the addresses of one family:

    vrrp_instance VI_1 {
        ...
        unicast_src_ip 192.168.100.100
    
        unicast_peer {
            192.168.100.101
        }
    
        virtual_ipaddress {
            192.168.100.150
        }
        ...
    
        virtual_ipaddress_excluded {
            1234:5678:9abc:def::1
        }
        ...
    }

    Those addresses will be excluded from the VRRP advertisements, but will still be managed by Keepalived and added or removed when the state changes.

  2. Add another VRRP instance for IPv6 addresses.

    VRRP configuration for IPv6 addresses on a master node follows:

    vrrp_instance VI_2 {
        interface         eth0
        priority          101
        virtual_router_id 51
        advert_int        1
        accept
        unicast_src_ip    1234:5678:9abc:def::3
    
        unicast_peer {
            1234:5678:9abc:def::2
        }
    
        virtual_ipaddress {
            1234:5678:9abc:def::1
        }
    
        track_script {
            chk_nginx_service
            chk_manual_failover
        }
    
        notify "/usr/libexec/keepalived/nginx-ha-notify"
    }

    Note that VRRP instances can both use the same virtual_router_id since VRRP IPv4 and IPv6 instances are completely independent of each other.

Troubleshooting Keepalived and VRRP

The Keepalived daemon uses the syslog utility for logging. On CentOS, RHEL, and SLES-based systems, the output is typically written to /var/log/messages, whereas on Ubuntu and Debian-based systems it is written to /var/log/syslog. Log entries record events such as startup of the Keepalived daemon and state transitions.

Here are a few sample entries that show the Keepalived daemon starting up, and the node transitioning a VRRP instance to the master state (to reduce wrapping, the hostname has been removed from each line after the first):

Feb 27 14:42:04 centos7-1 systemd: Starting LVS and VRRP High Availability Monitor...
Feb 27 14:42:04 Keepalived [19242]: Starting Keepalived v1.2.15 (02/26,2015)
Feb 27 14:42:04 Keepalived [19243]: Starting VRRP child process, pid=19244
Feb 27 14:42:04 Keepalived_vrrp [19244]: Registering Kernel netlink reflector
Feb 27 14:42:04 Keepalived_vrrp [19244]: Registering Kernel netlink command channel
Feb 27 14:42:04 Keepalived_vrrp [19244]: Registering gratuitous ARP shared channel
Feb 27 14:42:05 systemd: Started LVS and VRRP High Availability Monitor.
Feb 27 14:42:05 Keepalived_vrrp [19244]: Opening file '/etc/keepalived/keepalived.conf '.
Feb 27 14:42:05 Keepalived_vrrp [19244]: Truncating auth_pass to 8 characters
Feb 27 14:42:05 Keepalived_vrrp [19244]: Configuration is using: 64631 Bytes
Feb 27 14:42:05 Keepalived_vrrp [19244]: Using LinkWatch kernel netlink reflector...
Feb 27 14:42:05 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 27 14:42:05 Keepalived_vrrp [19244]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(14,15)]
Feb 27 14:42:05 nginx -ha-keepalived: Transition to state 'BACKUP ' on VRRP instance 'VI_1 '.
Feb 27 14:42:05 Keepalived_vrrp [19244]: VRRP_Script(chk_nginx_service) succeeded
Feb 27 14:42:06 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) forcing a new MASTER election
Feb 27 14:42:06 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) forcing a new MASTER election
Feb 27 14:42:07 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) Transition to MASTER STATE
Feb 27 14:42:08 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) Entering MASTER STATE
Feb 27 14:42:08 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) setting protocol VIPs.
Feb 27 14:42:08 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.100.150
Feb 27 14:42:08 nginx -ha-keepalived: Transition to state 'MASTER ' on VRRP instance 'VI_1 '.
Feb 27 14:42:13 Keepalived_vrrp [19244]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth0 for 192.168.100.150

If the system log does not explain the source of a problem, run the tcpdump command with the following parameters to display the VRRP advertisements that are sent on the local network:

# tcpdump -vvv -ni eth0 proto vrrp

If you have multiple VRRP instances on the local network and want to filter the output to include only traffic between the node and its peer for a given service, include the host parameter and specify the peer’s IP address as defined by the unicast_peer block in the Keepalived configuration file, as in the following example:

centos7-1 # tcpdump -vvv -ni eth0 proto vrrp and host 192.168.100.101
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:48:27.188100 IP (tos 0xc0, ttl 255, id 382, offset 0, flags [none],
    proto VRRP (112), length 40)
    192.168.100.100 > 192.168.100.101: vrrp 192.168.100.100 >
        192.168.100.101: VRRPv2 , Advertisement , vrid 51, prio 151,
        authtype simple , intvl 1s, length 20, addrs: 192.168.100.150 auth
        "f8f0e511"

Several fields in the output are useful for debugging:

  • authtype – Type of authentication in use (set by the authentication directive)
  • vrid – Virtual router ID (set by the virtual_router_id directive)
  • prio – Node’s priority (set by the priority directive)
  • intvl – Frequency at which advertisements are sent (set by the advert_int directive)
  • auth – Authentication token sent (set by the auth_pass directive)

Keeping NGINX Plus Configuration Files in Sync

The NGINX Plus configuration files on both nodes must define the services that are being made highly available. Keeping the configuration files in sync is outside the scope of the provided clustering software.

Configuration Examples

The nginx-ha-keepalived package comes with configuration examples. The examples can be found in the /usr/share/doc/nginx-ha-keepalived/ directory.

Upgrade Notes

The version 1.3.6 of the nginx-ha-keepalived package introduces a new VRRP checksum generation algorithm, fixing issues with RFC compliancy and interoperability in certain scenarios. The new checksum algorithm is now used by default when the cluster is set up from scratch (e.g. with nginx-ha-setup script).

When upgrading from older versions (< 1.3.6) of the nginx-ha-keepalived package, keepalived will automatically fall back using an older algorithm if it detects it is used by the other node. The following message will be seen in the keepalived syslog messages when this happens:

"Keepalived_vrrp[7556]: (VI_1): Setting unicast VRRPv3 checksum to old version"

It is advised however to migrate to the new version since it’s unknown for how long this fall back will be supported by the keepalived authors.

The procedure to upgrade to the new algorithm is as follows:

  1. Upgrade the keepalived instance that is in backup mode. This will see old checksums when it starts, and so start using old checksums.
  2. The other keepalived instance can then be upgraded and it will also see old checksums when it starts up.
  3. Temporarily add the following line in the configuration of each vrrp instance that is in master state:

    old_unicast_checksum never
  4. Reload the MASTER keepalived instance with:

    # service keepalived reload
  5. Reload the BACKUP keepalived instance with:

    # service keepalived reload
  6. Both nodes will now be using new checksums and the temporary "old_unicast_checksum never" lines can now be removed from the configuration.