Ever since Amazon introduced tags in EC2 I felt such a relief that I can name my instances and actually remember which one is which.
It took a while for Name tags to find its way to various aspects of the AWS Console, however, connecting to machines still requires a way to find the IP address from the console or via a command line using the EC2 API Tools.
I thought that it would be much easier for me and others to utilize the Name tags to connect more easily to the machines.
Initially, the script was a simple bash script which utilizes a the ec2-describe-instances command with a flag that matched the Name attribute, however, managed various other command line parameters such as the user name to connect with (ubuntu images, for example, users the ‘ubuntu’ user instead of ‘root’. Amazon Linux AMI uses ‘ec2user’, etc) so I’ve decided to rewrite it in Python and use the magnificent Boto Python library.
This allowed better argument handling as well as remove the need to install Java and the EC2 API Tools to access the EC2 API.
Grab the code here
Don’t forget to send some feedback!
Thinking a bit more about my last post, I’d come to the conclusion that unless there is a really good excuse, one should always submit items to Amazon’s Simple Queue Service (SQS) via publishing to a topic in Simple Notification Service (SNS).
In the simplest case, you’ll publish to a topic and that topic will have a single subscriber that will post the message to the queue.
However, since SNS allows multiple subscribers you can get a couple of features free of charge without changing a single line of code. For example you can:
- Add a temporarily an Email address to get messages going to the queue via Email for easy debugging (you can easily and quickly unsubscribe via the link at the end of the Email)
- Add additional logging by adding an HTTP/S subscriber, getting the message and perform some logging on it
- Notify other monitoring systems that a certain process has started
I know that from now on I’ll try to think really hard if I really need to publish directly to a queue instead of using SNS.
I’ve recently encountered a use case where I need to reliably send a message to a worker process that should handle the following requirements:
- Message should persist until the action it specified is done.
- Message should be processed (or wait to be processed) when the worker processes have a bug or are down
- Message processing should be as fast as possible – process the message ASAP.
Using Amazon’s Simple Queue Service (SQS)
SQS can provide the persistency needed. The message will be available until it is deleted (at least up to ~3 days) even if the worker processes are down or have a bug.
The problem with using SQS is that it requires polling which introduces a certain delay between the time a message is published and until it is processed. That delay can be small, a couple of seconds, but can easily be up to 30 seconds and more (depending on the limitations of SQS polling and the polling interval used).
Using Amazon’s Simple Notification Service (SNS)
SNS can provide a push mechanism to notify in near-real-time to a subscriber that a message has been published.
However, SNS can only guarantee a single delivery to each subscriber of a given topic. This means that if there was a bug or a problem processing the message and there was no specific code to save it somewhere, the message is lost.
SQS and SNS can be combined to produce a PUSH mechanism to reliably handle messages.
The general configuration steps are:
- Create a topic
- Subscribe an SQS queue to that topic
- Subscribe a worker process that work via HTTP/S to that topic (for increased reliability this can be an Elastic Load Balancer (ELB) that hides a set of machines)
Then the flow goes like this:
- Submit a message to the topic
- SNS will publish the message to the SQS queue
- SNS will then notify via HTTP/S POST to the handler to start processing the message
When the worker process gets the HTTP/S POST from SNS it will start polling the queue until it has no items in the queue and finish the HTTP request.
To handle failures when the worker process has a bug or is down or did not get the SNS message, a regular worker process can run and poll the queue in regular, longer, intervals to make sure all messages are processed and no one gets behind.
This solution covers the original 3 requirements of message reliability, handling cases where workers are down or have bugs and handling messages as soon as they are sent.
UPDATE (2011-07-16): I just got a newsletter Email from Amazon stating that they have added SQS and SNS to CloudWatch which allows monitor SQS queues not just for the length of the queue, but for others metrics as well, so there is no real need in my script. Unless you really really want to use it 🙂
All you have to do is select SQS in the metrics type drop down and you will see a set of metrics to select from for all of your existing queues.
Amazon’s CloudWatch is a great tool for monitor various aspects of your service. Last May Amazon introduced custom metrics to CloudWatch which allows sending any metrics data you wish to CloudWatch. You can then store it, plot it and also create CloudWatch Alerts based on it.
One of the things missing from CloudWatch is Simple Queue Service (SQS) monitoring, so I’ve written a small script to update a queue’s count in a CloudWatch custom metric.
Having the queue’s count in CloudWatch allows adding alerts and actions based on the queue’s length.
For example, if the queue’s length is above a certain amount of a certain period of time, one of 2 things happened:
- There is a bug in the code causing the worker processes that process the queue’s message to fail
- There is a higher than usual load on the system causing the queue fill up and get more and more messages while there aren’t enough worker processes to process these messages in reasonable time
If the load is higher than usual you can easily tell via a CloudWatch alert to add an additional machine instance running more worker processes or simply send an Email alert saying there is something wrong.
The script is very easy to use and can be run from a cron job. I’m running it as a cron job in 1 minute intervals and have set up various CloudWatch alerts to better monitor my queue.
Grab the script on Github at: SQS Cloud Watch Queue Count.
My name is Eran Sandler and I love to write software. I’ve been fortunate enough to work on both small and big web sites serving hundreds of unique users per month to millions of unique users per day.
In the last couple of years I’ve found myself, like others, drifting towards this hurricane called “Cloud Computing”.
I wasn’t too intrigued about the virtualization side of things. After all, I’ve been using virtualization on both clients and servers for years now. What I did find very compelling is the fact that cloud services providers have given us, the developers, full access to build, start, stop and configure hardware.
On top of these APIs they have also built architectural building blocks which provide black box functionality for anything from storing and serving files, reliable message delivery via queues, pub/sub mechanism to sending Emails, handling file conversions and even configure and update DNS entries.
As someone who helped build a startup in the days “before” the cloud having this power at your disposal changes everything!
It changes the way we deploy software, the way we architect systems and the way we operate sites.
This blog is my way of sharing the knowledge I have accumulated in the last couple of years (and continue to accumulate every day as I work on new projects) in regards to architecting, developing, deploying and operating web sites and services on cloud infrastructures.
I will try to share stories, tips & tricks and code I’ve managed to stumble upon, handle or write and hope you will all enjoy it as much as I am.
Stay tuned for more posts. It’s about to get cloudier 🙂