AWS autoscaling : avoid ELB routing to a provisioning instance

I was doing load testing on instances earlier this week to see how our autoscale policies worked out and noticed something strange. Whenever traffic was getting too high, a cloudwatch alarm kicked off the launch of a new instance (rightly so), and then all of a sudden, nearly ALL of the requests were being routed to this new instance, which wasn’t completely done provisioning yet. Nginx threw a flurry of 502s’ while our gateway was setting up, but the ELB, for some strange reason, threw nearly all our requests at that specific instance – instead of the 10+ others that were up – why?

After some digging, I realized it was a mix of 2 things – but first it’s useful to understand how AWS performs their health checks. The TCP health check is :

TCP is the default, specified as a TCP: port pair, for example “TCP:5000″. In this case a healthcheck simply attempts to open a TCP connection to the instance on the specified port.

Which means that if you have an API that’s public-facing the internet, this is no good. Suppose Nginx or Apache is done setting up, but Rails or Django is not ready yet? Well according to the description, this would pass the test. First thing to fix : create a specific testing url for health check (in nginx in my case – I created a /test/health path) to make sure we don’t put instances that are not ready yet behind the load balancer.

For HTTP or HTTPS protocol, the situation is different. You have to include a ping path in the string. HTTP is specified as a HTTP:port;/;PathToPing; grouping, for example “HTTP:80/weather/us/wa/seattle”. In this case, a HTTP GET request is issued to the instance on the given port and path. Any answer other than “200 OK” within the timeout period is considered unhealthy.

So the HTTP(S) health check actually need to return a 200 OK status code, which gives us much more control. So even if nginx is ready but our gateway isn’t, our 502 will cause the health check to fail.

So that explained why some requests failed, but how come nearly all of them did? First, here’s a few interesting things about ELBs :

  • They scale as your number of requests grow in time. Which means that for load-testing, you need to slowly increment the number of requests using a long ramp-up time.
  • They can only accommodate instances in the availability zones for which then were initially configured for. This means that you need to add an instance from the us-east-a and us-east-b if you plan of using both those zones later.
  • ELBs will attempt to distribute requests evenly, unless certain instances are busier than others. If an instance can process requests faster than others, more of them will be sent to this particular instance.

Ah ha! Since returning a 502 code is surely faster than anything else my other instances were doing, the ELB probably thought that this machine was way faster than all others and passed over all the requests to it – very bad.