AWS autoscaling : avoid ELB routing to a provisioning instance

I was doing load testing on instances earlier this week to see how our autoscale policies worked out and noticed something strange. Whenever traffic was getting too high, a cloudwatch alarm kicked off the launch of a new instance (rightly so), and then all of a sudden, nearly ALL of the requests were being routed to this new instance, which wasn’t completely done provisioning yet. Nginx threw a flurry of 502s’ while our gateway was setting up, but the ELB, for some strange reason, threw nearly all our requests at that specific instance – instead of the 10+ others that were up – why?

After some digging, I realized it was a mix of 2 things – but first it’s useful to understand how AWS performs their health checks. The TCP health check is :

TCP is the default, specified as a TCP: port pair, for example “TCP:5000″. In this case a healthcheck simply attempts to open a TCP connection to the instance on the specified port.

Which means that if you have an API that’s public-facing the internet, this is no good. Suppose Nginx or Apache is done setting up, but Rails or Django is not ready yet? Well according to the description, this would pass the test. First thing to fix : create a specific testing url for health check (in nginx in my case – I created a /test/health path) to make sure we don’t put instances that are not ready yet behind the load balancer.

For HTTP or HTTPS protocol, the situation is different. You have to include a ping path in the string. HTTP is specified as a HTTP:port;/;PathToPing; grouping, for example “HTTP:80/weather/us/wa/seattle”. In this case, a HTTP GET request is issued to the instance on the given port and path. Any answer other than “200 OK” within the timeout period is considered unhealthy.

So the HTTP(S) health check actually need to return a 200 OK status code, which gives us much more control. So even if nginx is ready but our gateway isn’t, our 502 will cause the health check to fail.

So that explained why some requests failed, but how come nearly all of them did? First, here’s a few interesting things about ELBs :

  • They scale as your number of requests grow in time. Which means that for load-testing, you need to slowly increment the number of requests using a long ramp-up time.
  • They can only accommodate instances in the availability zones for which then were initially configured for. This means that you need to add an instance from the us-east-a and us-east-b if you plan of using both those zones later.
  • ELBs will attempt to distribute requests evenly, unless certain instances are busier than others. If an instance can process requests faster than others, more of them will be sent to this particular instance.

Ah ha! Since returning a 502 code is surely faster than anything else my other instances were doing, the ELB probably thought that this machine was way faster than all others and passed over all the requests to it – very bad.

Servers of Hacker News

I did a little experiment:

  • For about a month I scraped the front page of NH
  • For every link of the front page I looked at the “server” header value and stored it for every unique URL.
  • Repeat, every hour.

Goals of experiment, in order of importance

  • Have fun
  • Satisfy my curiosity
  • Try to find servers I have never heard of (achievement unlocked!)

Results

v1

 

Full breakdown of server count (total count = 2415)

Apache total : 995

Apache : 558
Apache-Coyote/1.1 : 66
Apache/1.3.41 : 2
Apache/1.3.42 : 2
Apache/2 : 4
Apache/2.0.52 (Red Hat) : 5
Apache/2.2 : 18
Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.8e-fips-rhel5 PHP/5.2.14 : 17
Apache/2.2.12 (Ubuntu) : 3
Apache/2.2.14 (Ubuntu) : 27
Apache/2.2.15 (CentOS) : 51
Apache/2.2.15 (Red Hat) : 9
Apache/2.2.15 (Red Hat) mod_ssl/2.2.15 OpenSSL/1.0.0-fips PHP/5.3. : 28
Apache/2.2.15 (Scientific Linux) : 2
Apache/2.2.16 (Debian) : 32
Apache/2.2.17 (Ubuntu) : 6
Apache/2.2.22 : 16
Apache/2.2.22 (Debian) : 12
Apache/2.2.22 (Ubuntu) : 50
Apache/2.2.22 (Unix) FrontPage/5.0.2.2635 : 7
Apache/2.2.23 (Amazon) : 6
Apache/2.2.24 : 6
Apache/2.2.24 (Amazon) : 4
Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.0-fips mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 : 5
Apache/2.2.3 (CentOS) : 33
Apache/2.2.3 (Red Hat) : 26

Nginx total : 703

nginx total : 435
nginx/0.7.65 : 6
nginx/0.7.67 : 9
nginx/0.8.53 : 7
nginx/0.8.54 : 10
nginx/0.8.55 : 5
nginx/1.0.11 + Phusion Passenger 3.0.11 (mod_rails/mod_rack) : 10
nginx/1.0.15 : 5
nginx/1.1.19 : 66
nginx/1.2.1 : 21
nginx/1.2.3 : 7
nginx/1.2.4 : 6
nginx/1.2.6 : 64
nginx/1.2.7 : 12
nginx/1.2.8 : 4
nginx/1.2.9 : 3
nginx/1.3.11 : 3
nginx/1.4.1 : 30

GitHub.com total : 207

GSE total : 90

Microsoft-IIS/ (6.0, 7.0, 7.5, 8.0) total : 72

Others total : 

cloudflare-nginx : 62
AmazonS3 : 30
Sun-Java-System-Web-Server/7.0 : 25
HTTP server (unknown) : 22
Google Frontend : 19
WP Engine/4.0 : 19
lighttpd/ (1.4.18, 1.4.31-devel-783a962) : 19
tfe : 17
YTS (1.19.11, 1.20.10, 1.20.27, 1.20.28) : 16
Economist Web Server : 12
WP Engine/1.2.0 : 11
TheAtlantic Web Server : 11
gunicorn/0.14.3 : 11
SSWS : 11
WEBrick/1.3.1 : 10
LiteSpeed : 10
Resin/4.0.34 : 8
thin 1.5.1 codename Straight Razor : 6
gwiseguy/2.0 : 6
ECD (dca/24FD) : 4
gws : 3
NPG Webhosting/1.0 : 3
Varnish : 3
IBM_HTTP_Server : 2
Oracle-Application-Server-11g Oracle-Web-Cache-11g/11.1.1.6.0 : 1
TornadoServer/3.0.2 : 1
Oracle-iPlanet-Web-Server/7.0 : 1
PCX : 1
publicfile : 1
PWS/8.0.15 : 1
QRATOR : 1
R2D2 : 1

Critical analysis : what does this mean?

Not much! Because… (read on)

Notes

  • I’m aware that the “server” header is not 100% reliable to determine the server type.
  • If 5 articles from the same organization’s website (github.com, cnn.com or google.com, etc.) made the front page in a day, that’s 5 “counts” for a single server.
  • I didn’t count all the single, weird server instances – including empty header values.
  • I didn’t scrape much metadata with this yet, so it’s hard to see it in full context.

Nginx and Apache on the same server

Need to run multiple projects on the same server – ruby, python, php, nodeJS projects, all at once? It’s possible to have Nginx and Apache running side by side. The preferred way to do this is to have one server in front of another – basically you must choose which server will accept the initial requests and proxy to the correct application or to the second server if needed.

Since nginx is a little bit simpler to configure and more flexible (I find), we’ll put nginx in front of apache. To keep it simple, let’s say we wanted to have

  • a django project (python) running at mydjangourl.com
  • a wordpress (php) site at my-php-project-url.com

Both servers are configured to listen to port 80, so that’s not going to work. Since we’re putting nginx in front of apache though, nginx will have port 80.

Configuring nginx for your django app

    • Go to /etc/nginx/conf.d
    • Add (touch) a new file called django_project.conf
    • Assuming your django app is already running on port 8000 (in a screen session, or preferably by a process supervisor), add something like this to the file:
upstream my_django_app {
    server 127.0.0.1:8000;
}
 
server {
   listen       80;
   server_name  mydjangourl.com; #put the domain here
   location / {
       proxy_pass_header Server;
       proxy_set_header Host $http_host;
       proxy_redirect off;
       proxy_set_header X-Real-IP $remote_addr;
       proxy_set_header X-Scheme $scheme;
       proxy_pass http://my_django_app;
   }
}
        • An upstream server is defined as “my_django_app”
        • All requests with the “host” as mydjangourl.com will be routed through this directive
        • Proxy pass these requests to my_django_app, at 127.0.0.1:8000

Configuring nginx for the wordpress site

Since nginx is listening to port 80, we need to grab the requests for our php app and route them to apache on a different port. You should still be in /etc/nginx/conf.d

        • Create a new file called php_project.conf (name doesn’t matter, except for the .conf extension)
        • Add this to it (make sure to change the domain and note the port number, in this case 8050):
server {
   listen 80;
   server_name  my-php-project-url.com;
   location / {
       proxy_pass http://127.0.0.1:8050;
    }
    proxy_set_header Host $host;
}
        • This means that all requests looking for my-php-project-url.com will be routed to port 8050 on our localhost
        • Make sure to reload nginx (sudo service nginx reload)

Configuring apache for the wordpress site

Now that we’re proxying requests from nginx to our port 8050, let’s make sure apache is listening on that one.
Firstly, go to /etc/apache2/ports.conf and add the following:

NameVirtualHost *:8050
Listen 8050

Then go to wherever you keep the virtualhost configuration (it should be in /etc/apache2/sites-available or /etc/apache2/conf.d) and change the VirtualHost, ServerName and DocumentRoot lines like this:

        ServerName my-php-project-url.com
        DocumentRoot /var/www/mysite
 
                Options FollowSymLinks
                AllowOverride All
 
                Options Indexes FollowSymLinks MultiViews
                AllowOverride All
                Order allow,deny
                allow from all
 
        ...

Don’t forget to enable your site (sudo a2ensite mysite.conf) and reload apache (sudo service apache2 reload)!