I just read a post on Slashdot about a poor guy getting a huge chunk of Netflix traffic to his server.
The problem seemed to have been caused by the nature of IP address in EC2 which are quite fluid and gets reassigned when you spin up and down a machine. The same goes for Elastic Load Balancers (ELB) which are managed by Amazon and may switch the IP address as well (that’s why they ask to map to their CNAME record for the ELB instead of the IP).
In the Slashdot post, there is a link to this article, which describes the problem and lists some possible implications and possible ways of avoiding leaking data such as passwords and session ids when such a problem occurs.
The article mostly talks about what happend if someone hijacks your ELB, but the original problem reported was that you accidentally got someone elses traffic. This can lead to some other severe consequences:
Your servers crashing (in which case you should probably notice that rather quickly. Duh!)
If you are running some kind of a content site that depends on SEO and crawlers picked on the wrong IP, you might end up with a HUGE SEO penalty because another site’s content will be crawled on your domain
There is a very simple and quick solution for the problem I am describing above. Make sure you configure your web server to answer only to YOUR host names. Your servers will return response ONLY for a preconfigured set of hostnames, so if you get Netflix traffic, which probably has netflix.com hostname, your server will reject it immediately.
You can easily configured that in Nginx, Apache or if you have a caching proxy such as Varnish or squid.
A better solution for this problem is to add hostname checks support to ELB itself. I’ve posted a feature request on the AWS EC2 forum with the hopes that it will get implemented.
CloudFront is a great cost effective Content Delivery Network (CDN). When it first started it only supported files located on Amazon’s Simple Storage Service (S3) and on November 2010 Amazon releasedthe “Origin Pull” feature. Origin Pull allows defining a CDN distribution that pull content directing from a preconfigured site (preconfigured hostname) instead of pulling the content from S3.
The benefits of using the Origin Pull feature includes:
You can serve via the CDN dynamically generated content (like modified images or text fiels) without pre-generating it and putting it inside an S3 bucket.
One of the problems that may occur when intorudcing any caching mechanism is the need to invalidate all or parts of the data. CloudFront provides an invalidation API, however, it has various limitations such as:
You need to call it on each object
First 1,000 requests are free, each additional one will cost $0.005.
It may take up to 15 minutes for the cache to actually clear from all edge locations
There are some techniques to avoid calling the invalidation API but using versioned URLs.
What are versioned URLs?
A versioned URL contain a version part in it, i.e. “http://cdn.example.com/1.0/myimage.jpg”. The version part doesn’t affect the content of the URL, but since the URL to the resource is different, systems using the URL as a key for caching will think of URLs with 2 different version as 2 different resources.
It’s a nice trick to use when you want to quickly invalidate URLs and make a client pull a different/modified version of a resource.
Versioned URLs granularity
You can determine the granularity of the version value to suite your needs. The granularity will allow you to invalidate as little as one file, or every file served via the origin pull in your application.
Common granualirty levels are:
A value determined by the build version (i.e. invalidate all static CSS, JS and images one every new build deployed)
A value in the configuration, updated automatically or manually to invalidte parts or all of the objects
An automatically generated value per file determined by the file content by utilizing a hash function
An automatically generated value per file determined by its last modification date
CloudFront will disregard URL query string versioning
Amazon CloudFront (and quite a few other CDN providers) disregard the query string value of a URL (the part after the question mark), whether it is served from an S3 bucket or via an origin pull. This means you will have to rewrite your URLs to contain the version part inside the URL itself. For example:
CloudFront will disregard a versioned URL of the following format and consider both URLs the same resource:
CloudFront will consider these 2 URLs 2 different resources:
You can easily use Apache Rewrite module or Nginx URL rewriting to quickly rewrite the URL http://cdn.example.com/css/v123/myfile.css to http://cdn.example.com/css/myfile.css.
Some common web frameworks put the versioning part in the query string. Be minded about that and change the code appropriately to place the version part somewhere in the URL path (before the question mark).
I would recommend using CloudFront or any other CDN supporting origin pull in any project as it will significantly reduce the loading time of your pages with minimal cost and reduce the load on your servers. It’s a great, quick and easy way to make your site (or even API) work much better.