URL Considerations When Using Amazon CloudFront Origin Pull

CloudFront is a great cost effective Content Delivery Network (CDN). When it first started it only supported files located on Amazon’s Simple Storage Service (S3) and on November 2010 Amazon releasedthe “Origin Pull” feature. Origin Pull allows defining a CDN distribution that pull content directing from a preconfigured site (preconfigured hostname) instead of pulling the content from S3.

Cloud Front
by outdoorPDK

The benefits of using the Origin Pull feature includes:

  • No need to sync an S3 bucket with your static resources (CSS, Images, Javascripts)
  • You can serve via the CDN dynamically generated content (like modified images or text fiels) without pre-generating it and putting it inside an S3 bucket.
One of the problems that may occur when intorudcing any caching mechanism is the need to invalidate all or parts of the data. CloudFront provides an invalidation API, however, it has various limitations such as:
  • You need to call it on each object
  • First 1,000 requests are free, each additional one will cost $0.005.
  • It may take up to 15 minutes for the cache to actually clear from all edge locations
There are some techniques to avoid calling the invalidation API but using versioned URLs.

What are versioned URLs?

A versioned URL contain a version part in it, i.e. “http://cdn.example.com/1.0/myimage.jpg”. The version part doesn’t affect the content of the URL, but since the URL to the resource is different, systems using the URL as a key for caching will think of URLs with 2 different version as 2 different resources.
It’s a nice trick to use when you want to quickly invalidate URLs and make a client pull a different/modified version of a resource.

Versioned URLs granularity

You can determine the granularity of the version value to suite your needs. The granularity will allow you to invalidate as little as one file, or every file served via the origin pull in your application.

Common granualirty levels are:
  • A value determined by the build version (i.e. invalidate all static CSS, JS and images one every new build deployed)
  • A value in the configuration, updated automatically or manually to invalidte parts or all of the objects
  • An automatically generated value per file determined by the file content by utilizing a hash function
  • An automatically generated value per file determined by its last modification date

CloudFront will disregard URL query string versioning

Amazon CloudFront (and quite a few other CDN providers) disregard the query string value of a URL (the part after the question mark), whether it is served from an S3 bucket or via an origin pull. This means you will have to rewrite your URLs to contain the version part inside the URL itself. For example:
  • CloudFront will disregard a versioned URL of the following format and consider both URLs the same resource:
    • http://cdn.example.com/css/myfile.css?v123
    • http://cdn.example.com/css/myfile.css?v333
  • CloudFront will consider these 2 URLs 2 different resources:
    • http://cdn.example.com/css/v123/myfile.css
    • http://cdn.example.com/css/v333/myfile.css
You can easily use Apache Rewrite module or Nginx URL rewriting to quickly rewrite the URL http://cdn.example.com/css/v123/myfile.css to http://cdn.example.com/css/myfile.css.
Some common web frameworks put the versioning part in the query string. Be minded about that and change the code appropriately to place the version part somewhere in the URL path (before the question mark).

I would recommend using CloudFront or any other CDN supporting origin pull in any project as it will significantly reduce the loading time of your pages with minimal cost and reduce the load on your servers. It’s a great, quick and easy way to make your site (or even API) work much better.

8 thoughts on “URL Considerations When Using Amazon CloudFront Origin Pull”

    1. DNS records in general (and CNAME in specific) do not relate to a specific folder. You can map them to an IP address which can identify a specific server.

      CNAME records are basically pointers to other names, so if you have your site on mysite.com and want myothersite.com to point to the save server you can make a CNAME record named http://www.myothersite.com which points to mysite.com

      On your server you will have to run an HTTP server like Apache, Nginx or Lighttpd in which you can map a folder on your server that will be accessible via a browser.

      If you are running Windows Server you can install Internet Information Server (IIS) which does the same as Apache/Nginx/Lighttpd.

      1. Thanks Eran. My question is – how or where do I do this mapping. I understand the CNAME setting can only point to another domain. Then I need some functionality on Cloudfront end to be able to map this, right? This is how Internap does it via SoftLayer. I can specify the origin pull “folder” and it relate it to the CNAME. Everything happens automatically after.

        1. This is true only if you are using Origin Pull – i.e when CloudFront pulls the data from your servers.

          If you are using S3 bucket mapping all of this is irrelevant and I propose you create a folder for each version of the files and place it there.

          Eran

  1. Hi i tried to leave a message before but for some reason it didnt apply. Anyway i wanted to ask how you write the rewrite rule to do this?

    also are these changes to be made on my web server or on cloudfront i.e. if we rewrite the urls in this way dont i need a new folder for each version? e.g.

    http://cdn.example.com/css/v123/

    Sorry if its a stupid question.

    1. It’s not a stupid question 🙂

      Depends on the server you are using (Nginx or Apache, etc) it’s roughly:

      /css/v([0-9a-zA-Z]+)/(.*)

      so you have 1 capture group to catch the version and another to catch the rest of the URL. You can then reconstruct it and move the version part to a query string:

      /css/$2?v$1

      something like that.

      Hope that helps.

      Eran

      1. Cool thanks Eran. My server is Apache. So that rule will rewrite folder names to query strings, or query strings to folder names? – it seems like the former.

        shouldnt it be something like:

        http://cdn.example.com/css/(.*)?v([0-9a-zA-Z]+)
        ->
        http://cdn.example.com/css/v$2/$1
        where $2 is the version and $1 is the filename.
        Alternatively we could just add the version to the filenames? but i assume that means we manually have to add version numbers to our actual filenames first so cloudfront can find them.

Leave a Reply