The good guys at Amazon Web Services just announced a new feature in Simple Notifications Service (SNS) which allows settings a retry policy for SNS notifications.
Up until now SNS had an unknown publish retry policy (and maybe non existing). I always suspected it had some logic for different subscription types (Email, HTTP, Text, SQS) but it was never mentioned anywhere.
The new retry policy feature allows you to define the number of retries and the wait period between them (even if its a linear or exponential wait!) as well as set a throttling policy so that if your server is currently down it won’t get flooded with notifications once its back up.
This allows for some very interesting patterns. Most notably is push based messaging mechanism in which instead of writing a dedicated process to poll an SQS queue you can use SNS as sort of an ad-hoc push queue that will post the messages to an HTTP/S URL. Setting a reasonable retry policy and throttling policy will also ensure that if your server is down, messages won’t get lost.
I posted a while back a suggestion for a hack which utilized SNS as a notification mechanism to start polling SQS, however now that SNS has a retry policy its a good candidate for allowing you to handle your async tasks using your regular HTTP servers with all the goodies of logging, multi-threading, debugging, etc.
Before you run to start implementing a push based messaging (or re-implement Google AppEngine’s TaskQueue Push Queue API), there are certain things which are yet unknown and/or require further consideration:
- SQS has a 15 days storage policy, so you have up to 15 days to fix a bug or setup a system that will empty a queue. In the new SNS retry policy you may reach a similar long period of time however, the maximum values to set in the policy are not yet known and may pose a limit.
- I am no aware and couldn’t find any documentation which related to the HTTP status code of a message pushed to an HTTP/S subscriber (other than answering the subscription request). How can you tell SNS that if a message was pushed to an HTTP subscriber, the subscriber failed due to an HTTP error? In that case will SNS consider a non HTTP Status 200 request a failed request and will do the retry policy?
- What happens if a message pushed to an HTTP/S subscriber takes a long time to process due to load or any other reason? When will SNS decide if the request failed due to timeout?