NGINX.COM
Web Server Load Balancing with NGINX Plus

[Editor – The solution described in this blog relies on the NGINX Plus Status and Upstream Conf modules (enabled by the status and upstream_conf directives). Those modules are replaced and deprecated by the NGINX Plus API in NGINX Plus Release 13 (R13) and later, and will not be available after NGINX Plus R15. For the solution to continue working, update the roles and scripts that refer to the two deprecated modules.]

Introduction

There are many solutions for handling autoscaling in cloud environments, but they’re usually dependent on the specific infrastructure of a given cloud provider. Leveraging the flexibility of NGINX Plus with the functionality of Chef, we can build an autoscaling system that works on most cloud providers.

Chef has a command‑line tool, knife, for acting on objects such as cookbooks, nodes, data bags, and more. There are plug‑ins for extending knife, which we can use to abstract out functionality specific to one specific cloud, enabling knife commands to work the same way across clouds.

Requirements

For this setup, we’re leveraging the NGINX Chef cookbook from our GitHub repository. Also, we’re using Hosted Chef, to make switching between clouds more straightforward.

This setup is currently configured to work with AWS, Azure, and OpenStack. It’s possible to extend it to cover all of the knife cloud plug‑ins, but they haven’t been tested.

Basic Setup

This configuration relies heavily on role membership to look up information about the different nodes that are part of the cluster. There are three basic roles: NGINX Plus server, upstream application server, and autoscaler server. The last is a node that monitors the NGINX Plus status page, and makes API calls to scale up or down servers based on NGINX Plus statistics.

NGINX Plus Server Role

name "nginx_plus_autoscale"
description "Sample role to install NGINX Plus"
run_list "recipe[nginx]","recipe[nginx::autoscale]"
default_attributes "nginx" => { "install_source" => "plus",
                                "plus_status_enable" => true,
                                "enable_upstream_conf" => true,
                                "plus_status_allowed_ips" => ['104.245.19.144', '172.31.0.0/16', '127.0.0.1'],
                                "server_name" => "test.local",
                                "upstream" => "test",
                                "nginx_repo_key" => "-----BEGIN PRIVATE KEY-----nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCbYwum24BwEYDf4P4x0/KZjkKN7/EE/gg0qAU3ebG5kY8gWb8NpQ2itj/DfmwPAEnvI6In86c6YFokAZxeo6HbkKkeQKBgQDGQEHp2lCON9FLgRNMjtcp4S2VYjxdAMVinDLkIgVb9qgh6BvTDt5hRY/Vhcx8xV70+BCnoMSzbvLWhZbpSrdmD9nOj1KkPcWn4ArSv6prlYItUwWbNtFLw/E=n-----END PRIVATE KEY-----",
  "nginx_repo_crt" => "-----BEGIN CERTIFICATE-----nMIIDrDCCApSgAwIBAgICBs8wDQYJKoZIhvcNAQEFBQAwXjELMAkGA1UI2pLoSbonYiEvivb4Cg7POn+cQBwurcYUH/jB9zLPPSwlqcUiG2hScuEeaBiEoK/ixHIRuMV9nyp3xTi3b0ZKvOFjEZpBHB8WIdQVneTNRvaFLbiwznhiAe7D4uMaAEYqF96GTgX2XnbovinLlYPfdi7BhlXTI9u78+tqbo0YVsSBiDV49hcIA=n-----END CERTIFICATE-----" }

Upstream Application Server Role

name "test-upstream"
description "Sample role to install the NGINX Plus hello demo"
run_list "recipe[nginx::hello-demo]"
default_attributes "nginx" => { "application_port" => "80"}

Autoscaler Server Role

name "autoscaler"
description "Sample role to install autoscaler script"
run_list "recipe[nginx::autoscale_script]"
default_attributes "nginx" => { "server_name" => "test.local",
				"upstream" => "test",
				"cloud_provider" => "ec2" }

Here’s a quick breakdown of the different attributes leveraged in the roles:

  • install_source – Tells the NGINX cookbook to install NGINX Plus instead of open source
  • plus_status_enable – Enables the NGINX Plus status page
  • enable_upstream_conf – Enables the dynamic reconfiguration API
  • plus_status_allowed_ips – List of IP addresses or ranges that are allowed to access the status page and reconfiguration API
  • server_name – Defines a server directive in the NGINX Plus configuration
  • upstream – Defines an upstream group to be used with the aforementioned server_name configuration
  • nginx_repo_key – Defines the certificate key for accessing the NGINX Plus repository
  • nginx_repo_crt – Defines the certificate for accessing the NGINX Plus repository
  • application_port – Defines the port on which the upstream application servers listen
  • cloud_provider – Defines the cloud provider (AWS/Azure/Google/OpenStack) to be used for the autoscale_nginx script

We also need to configure our knife.rb file with the credentials for accessing the different cloud providers we’re leveraging. Here’s a sample knife.rb with the details for the supported cloud providers:

current_dir = File.dirname(__FILE__)
log_level                :info
log_location             STDOUT
node_name                "damiancurry"
client_key               "#{current_dir}/damiancurry.pem"
chef_server_url          "https://api.chef.io/organizations/nginx"
cookbook_path            ["#{current_dir}/../cookbooks"]
#AWS variables
knife[:aws_access_key_id] = 
knife[:aws_secret_access_key] = 
#Azure variables
knife[:azure_tenant_id] = 
knife[:azure_subscription_id] = 
knife[:azure_client_id] = 
knife[:azure_client_secret] = 
#OpenStack variables
knife[:openstack_auth_url] = 
knife[:openstack_username] = 
knife[:openstack_password] = 
knife[:openstack_tenant] = 
knife[:openstack_image] = 
knife[:openstack_ssh_key_id] = "demo_key"

Now let’s check out the few scripts that enable autoscaling. First is the script that runs on the NGINX Plus nodes to watch for changes to the nodes that are online. The script is a little hard to read because it’s still in the Chef template format. It’s comparing the running config from the extended status page to the upstream config file managed by Chef.

#!/bin/bash
NGINX_NODES="$(mktemp)"
/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?upstream=<%= node['nginx']['upstream'] %>"| /usr/bin/awk '{print $2}' | /bin/sed -r 's/;//g' | /usr/bin/sort > $NGINX_NODES
CONFIG_NODES="$(mktemp)"
/bin/grep -E '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' /etc/nginx/conf.d/<%= node['nginx']['upstream'] %>-upstream.conf | /usr/bin/awk '{print $2}' | /bin/sed -r 's/;//g' | /usr/bin/sort > $CONFIG_NODES
DIFF_OUT="$(mktemp)"
/usr/bin/diff $CONFIG_NODES $NGINX_NODES > $DIFF_OUT
ADD_NODE=`/usr/bin/diff ${CONFIG_NODES} ${NGINX_NODES} | /bin/grep "<" | /usr/bin/awk '{print $2}'`
DEL_NODE=`/usr/bin/diff ${CONFIG_NODES} ${NGINX_NODES} | /bin/grep ">" | /usr/bin/awk '{print $2}'`

for i in $ADD_NODE; do
    echo "adding node ${i}";
    /usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?add=&upstream=<%= node['nginx']['upstream'] %>&server=${i}&max_fails=0"
done
for i in $DEL_NODE; do
    echo "removing node ${i}";
    #NODE_ID=`/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?upstream=<%= node['nginx']['upstream'] %>" | /bin/grep ${i} | /usr/bin/awk '{print $4}' | /bin/sed -r 's/id=//g'`
    NODE_ID=`/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?upstream=<%= node['nginx']['upstream'] %>" | /bin/grep ${i} | /bin/grep -oP 'id=Kd+'`
    NODE_COUNT=`/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?upstream=<%= node['nginx']['upstream'] %>" | /bin/grep -n ${i} | /bin/grep -oP 'd+:server' | /bin/sed -r 's/:server//g'`
    JSON_NODE_NUM=$(expr $NODE_COUNT - 1)
    NODE_CONNS=`/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/status" | /usr/bin/jq ".upstreams.<%= node['nginx']['upstream'] %>.peers[${JSON_NODE_NUM}].active"`
    NODE_STATE=`/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/status" | /usr/bin/jq ".upstreams.<%= node['nginx']['upstream'] %>.peers[${JSON_NODE_NUM}].state"`
    if [[ ${NODE_STATE} == '"up"' ]] && [[ ${NODE_CONNS} == 0 ]]; then
	echo "nodes is up with no active connections, removing ${i}"
	/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?remove=&upstream=<%= node['nginx']['upstream'] %>&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"draining"' ]] && [[ ${NODE_CONNS} == 0 ]]; then
    echo "nodes is draining with no active connections, removing ${i}"
    /usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?remove=&upstream=<%= node['nginx']['upstream'] %>&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"down"' ]]; then
	echo "node state is down, removing ${i}":
	/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?remove=&upstream=<%= node['nginx']['upstream'] %>&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"unhealthy"' ]]; then
	echo "node state is down, removing ${i}":
	/usr/bin/curl -s "http://localhost:<%= node['nginx']['plus_status_port'] %>/upstream_conf?remove=&upstream=<%= node['nginx']['upstream'] %>&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"up"' ]] && [[ ${NODE_CONNS} != 0 ]]; then
	echo "node has active connections, draining connections on ${i}"
    fi
done

rm $NGINX_NODES $CONFIG_NODES $DIFF_OUT

Here’s the logic used to generate the upstream configs:

upstream_node_ips = []
upstream_role = (node[:nginx][:upstream]).to_s
search(:node, "role:#{node[:nginx][:upstream]}-upstream") do |nodes|
  host_ip = nodes['ipaddress']
  unless host_ip.to_s.strip.empty?
    host_port = nodes['nginx']['application_port']
    upstream_node_ips << "#{host_ip}:#{host_port}" # if value.has_key?("broadcast")
  end
end

template "/etc/nginx/conf.d/#{node[:nginx][:upstream]}-upstream.conf" do
  source 'upstreams.conf.erb'
  owner 'root'
  group node['root_group']
  mode 0644
  variables(
    hosts: upstream_node_ips
  )
  # notifies :reload, 'service[nginx]', :delayed
  notifies :run, 'execute[run_api_update_script]', :delayed
end

You can see that we’re using the Chef search functionality to find nodes that are currently assigned to the upstream role we defined for this application. It then extracts the IP address and application port for the node, and passes that information to the template as an array. Here’s the templated version of the upstream configuration:

upstream <%= node['nginx']['upstream'] %> {
       zone <%= node['nginx']['upstream'] %> 64k;
       <% @hosts.each do |node| -%>
       server <%= node %>;
       <% end %>
   }

Finally, here’s the actual script that handles the autoscaling:

require 'chef/api_client'
require 'chef/config'
require 'chef/knife'
require 'chef/node'
require 'chef/search/query'
require 'net/http'
require 'json'
class MyCLI
  include Mixlib::CLI
end

Chef::Config.from_file(File.expand_path("~/.chef/knife.rb"))
nginx_node = "<%= @nginx_host %>"
cloud_provider = "<%= node['nginx']['cloud_provider'] %>"
nginx_upstream = "<%= node['nginx']['upstream'] %>"
nginx_server_zone = "<%= node['nginx']['server_name'] %>"
if cloud_provider == "ec2"
  create_args = ["#{cloud_provider}", 'server', 'create', '-r', "role[#{nginx_upstream}-upstream]", '-S', 'chef-demo', '-I', 'ami-93d80ff3', '--region', 'us-west-2', '-f', 'm1.medium', '-g', 'chef-demo', '--ssh-user', 'ubuntu', '-i', '~/.ssh/chef-demo.pem']
elsif cloud_provider == "openstack"
  create_args = ["#{cloud_provider}", 'server', 'create', '-i', '~/.ssh/demo_key.pem', '--ssh-user', 'ubuntu', '-f', 'demo_flavor', '--openstack-private-network', '-Z', 'nova', '-r', "role[#{nginx_upstream}-upstream]"]
else
  puts "Please specify a valid cloud provider"
  exit
end
sleep_interval_in_seconds = 10
min_server_count = 1
max_server_count = 10
min_conns = 10
max_conns = 20
nginx_status_url = "http://#{nginx_node}:8080/status"

def get_nginx_active_servers(nginx_status_data, nginx_upstream)
  active_nodes = Array.new
  peers = nginx_status_data["upstreams"]["#{nginx_upstream}"]["peers"]
  peers.each do |node|
    if node["state"] == "up"
      active_nodes.push node["server"]
    end
  end
  return active_nodes
end

def get_nginx_server_conns(nginx_status_data, nginx_server_zone)
  return nginx_status_data["server_zones"]["#{nginx_server_zone}"]["processing"]
end

def add_backend_node(create_args)
  #search for existing hostnames to pick a new one
  query = Chef::Search::Query.new
  #nodes = query.search('node', 'role:#{nginx_upstream}-upstream').first rescue []
  nodes = query.search('node', 'role:<%= node['nginx']['upstream'] %>-upstream').first rescue []
  hosts = Array.new
  used_num = Array.new
  nodes.each do |node|
    node_name = node.name
    hosts.push node_name
    num = node_name.scan(/d+/)
    used_num.push num
  end
  used_num.sort!
  fixed1 = used_num.flatten.collect do |num| num.to_i end
  fixed_num = fixed1.sort!
  firstnum = fixed_num.first
  lastnum = fixed_num.last
  firsthost = hosts.sort[0].to_i
  lasthost = hosts.sort[-1].to_i

  unless firstnum.nil? && lastnum.nil?
    total = (1..lastnum).to_a
    missingnum = total-fixed_num
  end
  newhostname = ""
  if missingnum.nil?
    puts "No existing hosts"
    fixnum = "1"
    newnum = fixnum.to_i
    newhostname = "<%= node['nginx']['upstream'] %>-app-#{newnum}"
  elsif missingnum.any?
    puts "Missing numbers are #{missingnum}"
    newnum = missingnum.first
    newhostname = "<%= node['nginx']['upstream'] %>-app-#{newnum}"
  else
    newnum = lastnum + 1
    puts "new number is n"
    newhostname = "<%= node['nginx']['upstream'] %>-app-#{newnum}"
  end
  new_create_args = create_args + ['--node-name', newhostname]
  knife = Chef::Knife.new
  knife.options=MyCLI.options
  Chef::Knife.run(new_create_args, MyCLI.options)
  #sleep to wait for chef run
  1.upto(10) do |n|
    puts "."
    sleep 1 # second
  end
end

def del_backend_node(nginx_status_data, nginx_node, active_nodes, cloud_provider, nginx_upstream)
  #lookup hostnames/ips and pick a backend at random
  query = Chef::Search::Query.new
  #nodes = query.search('node', 'role:#{nginx_upstream}-upstream').first rescue []
  nodes = query.search('node', 'role:<%= node['nginx']['upstream'] %>-upstream').first rescue []
  hosts = Array.new
  nodes.each do |node|
    node_name = node.name
    node_ip = node['ipaddress']
    if active_nodes.any? { |val| /#{node_ip}/ =~ val }
      hosts.push "#{node_name}:#{node_ip}"
    end
  end
  del_node = hosts.sample
  node_name = del_node.rpartition(":").first
  node_ip = del_node.rpartition(":").last
  puts "Removing #{node_name}"
  nginx_url = "http://#{nginx_node}:8080/upstream_conf?upstream=#{nginx_upstream}"
  response = Net::HTTP.get(URI(nginx_url))
  node_id = response.lines.grep(/#{node_ip}/).first.split('id=').last.chomp
  drain_url = "http://#{nginx_node}:8080/upstream_conf?upstream=#{nginx_upstream}&id=#{node_id}&drain=1"
  Net::HTTP.get(URI(drain_url))
  sleep(5)
  knife = Chef::Knife.new
  knife.options=MyCLI.options
  #delete_args = ["#{cloud_provider}", 'server', 'delete', "#{node_name}", '--purge', '-y']
  #Chef::Knife.run(delete_args, MyCLI.options)
  delete_args = "#{cloud_provider} server delete -N #{node_name} -P -y"
  `knife #{delete_args}`
end


last_conns_count = -1

while true
  response = Net::HTTP.get(URI(nginx_status_url))
  nginx_status_data = JSON.parse(response)

  active_nodes = get_nginx_active_servers(nginx_status_data, nginx_upstream)
  server_count = active_nodes.length
  current_conns = get_nginx_server_conns(nginx_status_data, nginx_server_zone)

  conns_per_server = current_conns / server_count.to_f

  puts "Current connections = #{current_conns}"
  puts "connections per server = #{conns_per_server}"

  if server_count < min_server_count
    puts "Creating new #{cloud_provider} Instance"
    add_backend_node(create_args)
  elsif conns_per_server > max_conns
    if server_count < max_server_count
      puts "Creating new #{cloud_provider} Instance"
      add_backend_node(create_args)
    end
  elsif conns_per_server < min_conns
    if server_count > min_server_count
      del_backend_node(nginx_status_data, nginx_node, active_nodes, cloud_provider, nginx_upstream)
    end

  end

  last_conns_count = current_conns
  sleep(sleep_interval_in_seconds)
end

The primary functions of this script are to monitor the NGINX Plus status page and to add and remove nodes to and from the NGINX Plus node based on the statistics. In its current state, the script is making decisions based on the number of active connections, divided by the number of active servers in the load‑balanced pool. You can easily modify this to use any of the other statistics available from the NGINX Plus status page.

Deploying an Autoscaling Stack

First we use the knife-ec2 plug‑in to start an autoscaler instance:

chef-repo$ knife ec2 server create -r "role[autoscaler]" -g sg-1f285866 -I ami-93d80ff3 -f m1.medium -S chef-demo --region us-west-2  --node-name autoscaler-test --ssh-user ubuntu -i ~/.ssh/chef-demo.pem
Instance ID: i-0c359f3a443d18d64
Flavor: m1.medium
Image: ami-93d80ff3
Region: us-west-2
Availability Zone: us-west-2a
Security Group Ids: sg-1f285866
Tags: Name: autoscaler-test
SSH Key: chef-demo

Waiting for EC2 to create the instance......
Public DNS Name: ec2-35-164-35-19.us-west-2.compute.amazonaws.com
Public IP Address: 35.164.35.19
Private DNS Name: ip-172-31-27-162.us-west-2.compute.internal
Private IP Address: 172.31.27.162

Waiting for sshd access to become available
SSH Target Address: ec2-35-164-35-19.us-west-2.compute.amazonaws.com(dns_name)
done

SSH Target Address: ec2-35-164-35-19.us-west-2.compute.amazonaws.com()
Creating new client for autoscaler-test
Creating new node for autoscaler-test
Connecting to ec2-35-164-35-19.us-west-2.compute.amazonaws.com
ec2-35-164-35-19.us-west-2.compute.amazonaws.com -----> Installing Chef Omnibus (-v 12)
…
ec2-35-164-35-19.us-west-2.compute.amazonaws.com Chef Client finished, 6/6 resources updated in 13 seconds

Here’s the script that actually handles the autoscaling on this node, /usr/bin/autoscale_nginx.rb. Notice that at this point there is no IP address assigned to the nginx_node variable (on the second line of the second stanza). This is because we haven’t created an NGINX Plus server yet, but Chef will update the script with that information once the server has been created.

require 'chef/api_client'
require 'chef/config'
require 'chef/knife'
require 'chef/node'
require 'chef/search/query'
require 'net/http'
require 'json'
class MyCLI
  include Mixlib::CLI
end

Chef::Config.from_file(File.expand_path("~/.chef/knife.rb"))
nginx_node = "[]"
cloud_provider = "ec2"
nginx_upstream = "test"
nginx_server_zone = "test.local"
if cloud_provider == "ec2"
  create_args = ["#{cloud_provider}", 'server', 'create', '-r', "role[#{nginx_upstream}-upstream]", '-S', 'damiancurry', '-I', 'ami-93d80ff3', '--region', 'us-west-2', '-f', 'm1.medium', '--ssh-user', 'ubuntu', '-i', '~/.ssh/damiancurry.pem']
elsif cloud_provider == "openstack"
  create_args = ["#{cloud_provider}", 'server', 'create', '-i', '~/.ssh/demo_key.pem', '--ssh-user', 'ubuntu', '-f', 'demo_flavor', '--openstack-private-network', '-Z', 'nova', '-r', "role[#{nginx_upstream}-upstream]"]
else
  puts "Please specify a valid cloud provider"
  exit
end
sleep_interval_in_seconds = 10
min_server_count = 1
max_server_count = 10
min_conns = 10
max_conns = 20
nginx_status_url = "http://#{nginx_node}:8080/status"

def get_nginx_active_servers(nginx_status_data, nginx_upstream)
  active_nodes = Array.new
  peers = nginx_status_data["upstreams"]["#{nginx_upstream}"]["peers"]
  peers.each do |node|
    if node["state"] == "up"
      active_nodes.push node["server"]
    end
  end
  return active_nodes
end

def get_nginx_server_conns(nginx_status_data, nginx_server_zone)
  return nginx_status_data["server_zones"]["#{nginx_server_zone}"]["processing"]
end

def add_backend_node(create_args)
  knife = Chef::Knife.new
  knife.options=MyCLI.options
  Chef::Knife.run(create_args, MyCLI.options)
  #sleep to wait for chef run
  1.upto(10) do |n|
    puts "."
    sleep 1 # second
  end
end

def del_backend_node(nginx_status_data, nginx_node, active_nodes, cloud_provider, nginx_upstream)
  #lookup hostnames/ips and pick a backend at random
  query = Chef::Search::Query.new
  #nodes = query.search('node', 'role:#{nginx_upstream}-upstream').first rescue []
  nodes = query.search('node', 'role:test-upstream').first rescue []
  hosts = Array.new
  nodes.each do |node|
    node_name = node.name
    node_ip = node['ipaddress']
    if active_nodes.any? { |val| /#{node_ip}/ =~ val }
      hosts.push "#{node_name}:#{node_ip}"
    end
  end
  del_node = hosts.sample
  node_name = del_node.rpartition(":").first
  node_ip = del_node.rpartition(":").last
  puts "Removing #{node_name}"
  nginx_url = "http://#{nginx_node}:8080/upstream_conf?upstream=#{nginx_upstream}"
  response = Net::HTTP.get(URI(nginx_url))
  node_id = response.lines.grep(/#{node_ip}/).first.split('id=').last.chomp
  drain_url = "http://#{nginx_node}:8080/upstream_conf?upstream=#{nginx_upstream}&id=#{node_id}&drain=1"
  Net::HTTP.get(URI(drain_url))
  sleep(5)
  knife = Chef::Knife.new
  knife.options=MyCLI.options
  #delete_args = ["#{cloud_provider}", 'server', 'delete', "#{node_name}", '--purge', '-y']
  #Chef::Knife.run(delete_args, MyCLI.options)
  delete_args = "#{cloud_provider} server delete #{node_name} -P -y"
  `knife #{delete_args}`
end


last_conns_count = -1

while true
  response = Net::HTTP.get(URI(nginx_status_url))
  nginx_status_data = JSON.parse(response)

  active_nodes = get_nginx_active_servers(nginx_status_data, nginx_upstream)
  server_count = active_nodes.length
  current_conns = get_nginx_server_conns(nginx_status_data, nginx_server_zone)

  conns_per_server = current_conns / server_count.to_f

  puts "Current connections = #{current_conns}"
  puts "connections per server = #{conns_per_server}"

  if server_count < min_server_count
    puts "Creating new #{cloud_provider} Instance"
    add_backend_node(create_args)
  elsif conns_per_server > max_conns
    if server_count < max_server_count
      puts "Creating new #{cloud_provider} Instance"
      add_backend_node(create_args)
    end
  elsif conns_per_server < min_conns
    if server_count > min_server_count
      del_backend_node(nginx_status_data, nginx_node, active_nodes, cloud_provider, nginx_upstream)
    end

  end

  last_conns_count = current_conns
  sleep(sleep_interval_in_seconds)
end

Now we start the NGINX Plus server:

default$ knife ec2 server create -r "role[nginx_plus_autoscale]" -g sg-1f285866 -I ami-93d80ff3 -f m1.medium -S chef-demo --region us-west-2 --ssh-user ubuntu -i ~/.ssh/chef-demo.pem --node-name nginx-autoscale
Instance ID: i-0856ee80f54c8f3e6
Flavor: m1.medium
Image: ami-93d80ff3
Region: us-west-2
Availability Zone: us-west-2b
Security Group Ids: sg-1f285866
Tags: Name: nginx-autoscale
SSH Key: chef-demo

Waiting for EC2 to create the instance.......
Public DNS Name: ec2-35-165-171-46.us-west-2.compute.amazonaws.com
Public IP Address: 35.165.171.46
Private DNS Name: ip-172-31-38-163.us-west-2.compute.internal
Private IP Address: 172.31.38.163

Waiting for sshd access to become available
SSH Target Address: ec2-35-165-171-46.us-west-2.compute.amazonaws.com(dns_name)
done

SSH Target Address: ec2-35-165-171-46.us-west-2.compute.amazonaws.com()
Creating new client for nginx-autoscale
Creating new node for nginx-autoscale
Connecting to ec2-35-165-171-46.us-west-2.compute.amazonaws.com
ec2-35-165-171-46.us-west-2.compute.amazonaws.com -----> Installing Chef Omnibus (-v 12)
…
ec2-35-165-171-46.us-west-2.compute.amazonaws.com Chef Client finished, 24/34 resources updated in 43 seconds

Then we verify on the autoscaler instance that the new node’s IP address is now assigned to the nginx_node variable in autoscale_nginx.rb:

root# grep 'nginx_node =' /usr/bin/autoscale_nginx.rb
nginx_node = "172.31.38.163"

When we access the NGINX Plus dashboard on the NGINX Plus node, it looks like this:

If we hit the NGINX Plus server on port 80, we get a 502 Bad Gateway error page because we haven’t started any backend application servers yet.

Before we fire up the autoscaler script and start the application nodes, let’s look at the script that adds these new nodes to the running NGINX Plus config, /tmp/api_update.sh:

#!/bin/bash
NGINX_NODES="$(mktemp)"
/usr/bin/curl -s "http://localhost:8080/upstream_conf?upstream=test"| /usr/bin/awk '{print $2}' | /bin/sed -r 's/;//g' | /usr/bin/sort > $NGINX_NODES
CONFIG_NODES="$(mktemp)"
/bin/grep -E '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}' /etc/nginx/conf.d/test-upstream.conf | /usr/bin/awk '{print $2}' | /bin/sed -r 's/;//g' | /usr/bin/sort > $CONFIG_NODES
DIFF_OUT="$(mktemp)"
/usr/bin/diff $CONFIG_NODES $NGINX_NODES > $DIFF_OUT
ADD_NODE=`/usr/bin/diff ${CONFIG_NODES} ${NGINX_NODES} | /bin/grep "<" | /usr/bin/awk '{print $2}'`
DEL_NODE=`/usr/bin/diff ${CONFIG_NODES} ${NGINX_NODES} | /bin/grep ">" | /usr/bin/awk '{print $2}'`

for i in $ADD_NODE; do
    echo "adding node ${i}";
    /usr/bin/curl -s "http://localhost:8080/upstream_conf?add=&upstream=test&server=${i}&max_fails=0"
done
for i in $DEL_NODE; do
    echo "removing node ${i}";
    #NODE_ID=`/usr/bin/curl -s "http://localhost:8080/upstream_conf?upstream=test" | /bin/grep ${i} | /usr/bin/awk '{print $4}' | /bin/sed -r 's/id=//g'`
    NODE_ID=`/usr/bin/curl -s "http://localhost:8080/upstream_conf?upstream=test" | /bin/grep ${i} | /bin/grep -oP 'id=Kd+'`
    NODE_COUNT=`/usr/bin/curl -s "http://localhost:8080/upstream_conf?upstream=test" | /bin/grep -n ${i} | /bin/grep -oP 'd+:server' | /bin/sed -r 's/:server//g'`
    JSON_NODE_NUM=$(expr $NODE_COUNT - 1)
    NODE_CONNS=`/usr/bin/curl -s "http://localhost:8080/status" | /usr/bin/jq ".upstreams.test.peers[${JSON_NODE_NUM}].active"`
    NODE_STATE=`/usr/bin/curl -s "http://localhost:8080/status" | /usr/bin/jq ".upstreams.test.peers[${JSON_NODE_NUM}].state"`
    if [[ ${NODE_STATE} == '"up"' ]] && [[ ${NODE_CONNS} == 0 ]]; then
	echo "nodes is up with no active connections, removing ${i}"
	/usr/bin/curl -s "http://localhost:8080/upstream_conf?remove=&upstream=test&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"draining"' ]] && [[ ${NODE_CONNS} == 0 ]]; then
    echo "nodes is draining with no active connections, removing ${i}"
    /usr/bin/curl -s "http://localhost:8080/upstream_conf?remove=&upstream=test&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"down"' ]]; then
	echo "node state is down, removing ${i}":
	/usr/bin/curl -s "http://localhost:8080/upstream_conf?remove=&upstream=test&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"unhealthy"' ]]; then
	echo "node state is down, removing ${i}":
	/usr/bin/curl -s "http://localhost:8080/upstream_conf?remove=&upstream=test&id=${NODE_ID}"
    elif [[ ${NODE_STATE} == '"up"' ]] && [[ ${NODE_CONNS} != 0 ]]; then
	echo "node has active connections, draining connections on ${i}"
    fi
done

rm $NGINX_NODES $CONFIG_NODES $DIFF_OUT

This script is called every time Chef runs, and compares the existing running config to the upstream config file defined for the autoscaling group. As you can see from the recipe snippet below, Chef manages the config file, but doesn’t reload NGINX Plus when it is updated. Instead, it calls the apt_update script:

template "/etc/nginx/conf.d/#{node[:nginx][:upstream]}-upstream.conf" do
  source 'upstreams.conf.erb'
  owner 'root'
  group node['root_group']
  mode 0644
  variables(
    hosts: upstream_node_ips
  )
  # notifies :reload, 'service[nginx]', :delayed
  notifies :run, 'execute[run_api_update_script]', :delayed
end

Now we start the autoscaler script and bring some application servers online. We use fully qualified path to the Ruby binary, because we need to use the one shipped with the Chef client:

ubuntu$ /opt/chef/embedded/bin/ruby /usr/bin/autoscale_nginx.rb
Current connections = 0
connections per server = NaN
Creating new ec2 Instance
No existing hosts
test-app-1
Instance ID: i-0c671d851a1c5e6d0
Flavor: m1.medium
Image: ami-93d80ff3
Region: us-west-2
Availability Zone: us-west-2b
Security Group Ids: chef-demo
Tags: Name: test-app-1
SSH Key: chef-demo

Waiting for EC2 to create the instance...
…
ec2-35-165-4-158.us-west-2.compute.amazonaws.com Chef Client finished, 16/26 resources updated in 34 seconds
…
Private IP Address: 172.31.40.186
Environment: _default
Run List: role[test-upstream]
.
.
.
Current connections = 0
connections per server = 0.0
Current connections = 0
connections per server = 0.0

Now that we have one application node up, when we back to the NGINX Plus node we see this demo page instead of the 502 error page:

And there is now an upstream defined on the NGINX Plus dashboard:

Next, we use a tool like wrk to generate some load against the site:

wrk$ ./wrk -c 25 -t 2 -d 10m http://ec2-35-165-171-46.us-west-2.compute.amazonaws.com/
Running 10m test @ http://ec2-35-165-171-46.us-west-2.compute.amazonaws.com/
  2 threads and 25 connections

On the autoscaler node, we see the script catch the increase in connections and start a new instance:

Current connections = 0
connections per server = 0.0
Current connections = 24
connections per server = 24.0
Creating new ec2 Instance
new number is
2
test-app-2
Instance ID: i-07186f5451c7d9e77
Flavor: m1.medium
Image: ami-93d80ff3
Region: us-west-2
Availability Zone: us-west-2b
Security Group Ids: chef-demo
Tags: Name: test-app-2
SSH Key: chef-demo

Waiting for EC2 to create the instance......
...
ec2-35-166-214-136.us-west-2.compute.amazonaws.com Chef Client finished, 16/26 resources updated in 35 seconds
Current connections = 24
connections per server = 12.0
Current connections = 24
connections per server = 12.0

There are now two upstream nodes on the dashboard. The script remains at this point, because it’s configured to scale up when the nodes have more than 20 active connections on average. When we refresh the browser pointed at port 80 on the NGINX Plus server, we see the data change as it switches between the different backend nodes. If we stop the traffic from being generated, we see the script take one of the nodes offline, as it is configured to always keep at least one server running.

Current connections = 24
connections per server = 12.0
Current connections = 0
connections per server = 0.0
Removing test-app-2
no instance id is specific, trying to retrieve it from node name
WARNING: Deleted server i-0dcf4740c1b34417f
WARNING: Deleted node test-app-2
WARNING: Deleted client test-app-2
Current connections = 0
connections per server = 0.0

Conclusion

This is a rather basic setup, intended as a starting point for building a customized autoscaling solution that fits your environment. And if you ever want to migrate to a different cloud provider, it’s as simple as changing one attribute, [‘nginx’][‘cloud_provider’], in the Chef configuration.

Try out autoscaling with NGINX Plus for yourself – start your free 30-day trial today or contact us to discuss your use cases.

Hero image

Learn how to deploy, configure, manage, secure, and monitor your Kubernetes Ingress controller with NGINX to deliver apps and APIs on-premises and in the cloud.



About The Author

Damian Curry

Damian Curry

Community & Alliances Technical Director

About F5 NGINX

F5, Inc. is the company behind NGINX, the popular open source project. We offer a suite of technologies for developing and delivering modern applications. Together with F5, our combined solution bridges the gap between NetOps and DevOps, with multi-cloud application services that span from code to customer.

Learn more at nginx.com or join the conversation by following @nginx on Twitter.