The good guys at Amazon Web Services just announced a new feature in Simple Notifications Service (SNS) which allows settings a retry policy for SNS notifications.
Up until now SNS had an unknown publish retry policy (and maybe non existing). I always suspected it had some logic for different subscription types (Email, HTTP, Text, SQS) but it was never mentioned anywhere.
The new retry policy feature allows you to define the number of retries and the wait period between them (even if its a linear or exponential wait!) as well as set a throttling policy so that if your server is currently down it won’t get flooded with notifications once its back up.
This allows for some very interesting patterns. Most notably is push based messaging mechanism in which instead of writing a dedicated process to poll an SQS queue you can use SNS as sort of an ad-hoc push queue that will post the messages to an HTTP/S URL. Setting a reasonable retry policy and throttling policy will also ensure that if your server is down, messages won’t get lost.
I posted a while back a suggestion for a hack which utilized SNS as a notification mechanism to start polling SQS, however now that SNS has a retry policy its a good candidate for allowing you to handle your async tasks using your regular HTTP servers with all the goodies of logging, multi-threading, debugging, etc.
Before you run to start implementing a push based messaging (or re-implement Google AppEngine’s TaskQueue Push Queue API), there are certain things which are yet unknown and/or require further consideration:
SQS has a 15 days storage policy, so you have up to 15 days to fix a bug or setup a system that will empty a queue. In the new SNS retry policy you may reach a similar long period of time however, the maximum values to set in the policy are not yet known and may pose a limit.
I am no aware and couldn’t find any documentation which related to the HTTP status code of a message pushed to an HTTP/S subscriber (other than answering the subscription request). How can you tell SNS that if a message was pushed to an HTTP subscriber, the subscriber failed due to an HTTP error? In that case will SNS consider a non HTTP Status 200 request a failed request and will do the retry policy?
What happens if a message pushed to an HTTP/S subscriber takes a long time to process due to load or any other reason? When will SNS decide if the request failed due to timeout?
I hope some of these questions will get cleared up and SNS can become a viable push based messaging mechanism.
I previously recommended to always push via SNS (even if its just to an SNS queue) just to get the added benefits of easier debugging (subscribe to an SQS queue AND send email or HTTP request to debug, etc). The new features only proves that its becoming a very interesting building block to use inside and/or outside of AWS.
In a recent post on the AWS blog, Jeff Barr and Matt Wood, showed the architecture and code they wrote which lists the most interesting AWS related jobs from the Amazon Jobs site.
It serves as a rather good example of how service components such as the ones AWS provides (SNS, SQS, S3 to name a few that are AWS agnostic) a great set of building blocks that can easily help you focus on writing the code you really need to write.
I found the auto scaling policy for spinning up & down machines just to tweet a bit of an over kill at first (and Jeff could have easily added the code on the same instance running the cron), however thinking about it a bit more and considering the various pricing strategies it actually makes a lot sense.
Want to have an ElastiCache like service in all regions, not just US-EAST?
Want to utilize the new reserved instances utilization model to lower your costs?
Want to have your cache persistent and easily backup and restore it?
Want to add (or remove) servers from your cache cluster with no down time?
We have just the solution for you. A simple CloudFormation template to create a Membase cluster which gives you all of the ElasticCache benefits and a lot more including:
Support for ALL regions (not just US-EAST Apparently Amazon beat me to the punch with support at US West (N. California), EU West (Dublin), Asia Pacific (Singapore), and Asia Pacific (Tokyo))