Using Amazon’s Simple Notification Service (SNS) & Simple Queue Service (SQS) For a Reliable Push Based Processing of Messages

I’ve recently encountered a use case where I need to reliably send a message to a worker process that should handle the following requirements:

  • Message should persist until the action it specified is done.
  • Message should be processed (or wait to be processed) when the worker processes have a bug or are down
  • Message processing should be as fast as possible – process the message ASAP.

Using Amazon’s Simple Queue Service (SQS)

SQS can provide the persistency needed. The message will be available until it is deleted (at least up to ~3 days) even if the worker processes are down or have a bug.

The problem with using SQS is that it requires polling which introduces a certain delay between the time a message is published and until it is processed. That delay can be small, a couple of seconds, but can easily be up to 30 seconds and more (depending on the limitations of SQS polling and the polling interval used).

Using Amazon’s Simple Notification Service (SNS)

SNS can provide a push mechanism to notify in near-real-time to a subscriber that a message has been published.
However, SNS can only guarantee a single delivery to each subscriber of a given topic. This means that if there was a bug or a problem processing the message and there was no specific code to save it somewhere, the message is lost.

The Solution

SQS and SNS can be combined to produce a PUSH mechanism to reliably handle messages.
The general configuration steps are:
  • Create a topic
  • Subscribe an SQS queue to that topic
  • Subscribe a worker process that work via HTTP/S to that topic (for increased¬†reliability¬†this can be an Elastic Load Balancer (ELB) that hides a set of machines)
Then the flow goes like this:
  • Submit a message to the topic
  • SNS will publish the message to the SQS queue
  • SNS will then notify via HTTP/S POST to the handler to start processing the message
When the worker process gets the HTTP/S POST from SNS it will start polling the queue until it has no items in the queue and finish the HTTP request.
To handle failures when the worker process has a bug or is down or did not get the SNS message, a regular worker process can run and poll the queue in regular, longer, intervals to make sure all messages are processed and no one gets behind.

This solution covers the original 3 requirements of message reliability, handling cases where workers are down or have bugs and handling messages as soon as they are sent.

11 thoughts on “Using Amazon’s Simple Notification Service (SNS) & Simple Queue Service (SQS) For a Reliable Push Based Processing of Messages”

  1. I like the approach and recommendation here – however, could there be a race condition. if the worker process is notified before the message is posted to SQS. There is no guarantee of the order in which SNS notifications are sent, right?

    1. Sorry for the late response. Disqus decided not to alert me on new comments :-(

      SQS doesn’t guarantee the order in which SNS notifications are sent or the order in which SQS messages are received so none of this is guaranteed anyway.

      This method is mostly good for systems which doesn’t have a very high rate of postings to SQS so you can save time, money and resources by polling only when needed.

  2. SNS does not guarantee that it will deliver the message to the SQS subscription before it delivers it to the http subscription

    In other words, your worker process might get a notification, read from the SQS queue and find that it is empty. How could you handle that case?

    1. I might not have been too clear on this. When you start polling from the HTTP request you should consider that option and not poll once, not get a message and stop, but instead poll for a specific interval, say 60 seconds.

      That way you can cover this case as well.

Leave a Reply