Category Archives: Amazon Web Services (AWS)

Quick & Dirty API for Accessing Amazon Web Services (AWS) EC2 Pricing Data

Continuing my post about the JSON files used in the Amazon EC2 Page, I’ve created a small Python library that also acts as a command line interface to get the data.
The data in the JSON files does not contain the same values as the EC2 API for things like region name and instance types so the library/cli translates these values to their corresponding values in the EC2 API.

You can filter the output by region, instance type and OS type.

The command line output support a human readable table format, JSON and CSV.

To use the command line you’ll need to install the following Python libraries:

  • argparse – only if you are using Python < 2.7 (argparse is included in Python 2.7 and 3.x)
  • prettytable – if you want to print the human readable and pretty good looking ASCII table output

Both libraries can be installed using pip.
Grab the code from Github

Don’t forget to send feedback! Enjoy!

 

Amazon Web Services (AWS) EC2 Pricing Data

Have you ever wanted a way to access the Amazon Web Services EC2 pricing data from code?

It seems Amazon uses predefined JSON files in the EC2 page to display the pricing per region per instance type and type of utilization.

You can easily access these JSON files, load it and use it in your own apps (at least until Amazon changes these URLs).

The naming in these JSON files is sometimes different than the naming used in the API so for example, a small instance (m1.small) is “size” : “sm” and its type is “stdODI” or “stdResI” for reserved instances.

Below are the links to the relevant files:

On Demand Instances

Reserved Instances

Data Transfer Pricing

Cloud Watch Pricing

Elastic IPs Pricing

Elastic Load Balancer (ELB) Pricing

 

Crunch these files and enjoy it while its there 🙂

The cool example of SaaS for developers by @mza and @jeffbarr

In a recent post on the AWS blog, Jeff Barr and Matt Wood, showed the architecture and code they wrote which lists the most interesting AWS related jobs from the Amazon Jobs site.

It serves as a rather good example of how service components such as the ones AWS provides (SNS, SQS, S3 to name a few that are AWS agnostic) a great set of building blocks that can easily help you focus on writing the code you really need to write.

I found the auto scaling policy for spinning up & down machines just to tweet a bit of an over kill at first (and Jeff could have easily added the code on the same instance running the cron), however thinking about it  a bit more and considering the various pricing strategies it actually makes a lot sense.

AWS Elastic (accidental) Load Balancer Man-in-the-middle Attack

I just read a post on Slashdot about a poor guy getting a huge chunk of Netflix traffic to his server.

The problem seemed to have been caused by the nature of IP address in EC2 which are quite fluid and gets reassigned when you spin up and down a machine. The same goes for Elastic Load Balancers (ELB) which are managed by Amazon and may switch the IP address as well (that’s why they ask to map to their CNAME record for the ELB instead of the IP).

In the Slashdot post, there is a link to this article, which describes the problem and lists some possible implications and possible ways of avoiding leaking data such as passwords and session ids when such a problem occurs.

The article mostly talks about what happend if someone hijacks your ELB, but the original problem reported was that you accidentally got someone elses traffic. This can lead to some other severe consequences:

  • Your servers crashing (in which case you should probably notice that rather quickly. Duh!)
  • If you are running some kind of a content site that depends on SEO and crawlers picked on the wrong IP, you might end up with a HUGE SEO penalty because another site’s content will be crawled on your domain
There is a very simple and quick solution for the problem I am describing above. Make sure you configure your web server to answer only to YOUR host names. Your servers will return response ONLY for a preconfigured set of hostnames, so if you get Netflix traffic, which probably has netflix.com hostname, your server will reject it immediately.

You can easily configured that in Nginx, Apache or if you have a caching proxy such as Varnish or squid.

A better solution for this problem is to add hostname checks support to ELB itself. I’ve posted a feature request on the AWS EC2 forum with the hopes that it will get implemented.

The cloud is more than just the hardware – its also about services!

When talking about the cloud, most people talk about running the servers on the cloud. They talk about the fact that they can start or stop virtual servers with a simple API call.

Cloud Services Toolbox
taken by keepthebyte

A lot of Cloud Providers do provide you with virtual servers, virtual load balancer and infinite (and/or scalable) storage, but these are all building block of servers. You still need to do the heavy lifting of taking a bunch of servers and making it do the work for you. You need to install the software (not just the one you wrote), mange, handling disaster recovery, backups, logging, etc.

What makes Amazon’s cloud unique on top of the servers and storage they provide is the fact that it provides a set of cloud services that can be used as black boxes for your application/service reducing the code you need to write/maintain as well as the number of servers you need to administer. These services are also available for use outside of Amazon’s cloud but there are benefits of using them within your EC2 instances.

The Amazon non-hardware cloud services offering is split into two categories:

  • Off the shelf services – These are preconfigured services that Amazon takes most of the hassle off of managing it. Such services are
    • Amazon Relational Database Services (RDS) – Hosted MySQL which Amazon manage, patches and helps you backup and restore as well as deploy in a smart way across multiple availability zones.
    • Amazon ElastiCache – Hosted Memcached which Amazon manages, allows to resize (add or remove servers) as well as patch with the latest software.
  • Development Building Blocks (Black Boxes) – Web services that provides functionality which you can mix and match to create you service, removing the need for you to handle the load, machines and configuration of these services. Such services are:
    • Amazon SimpleDB – an Amazon written key/value datastore that is hosted and operated by Amazon. Scalable and simple. It just works
    • Amazon Simple Queue Service (SQS) – A reliable, scalable, persistent queue service. Great for talking with distributed async components.
    • Amazon Simple Email Service (SES) – A scalable Email service which allows you to send Email (for humans and machines)
    • Amazon Simple Notification Service (SNS) –  A web service to send notifications to people/machines/service/etc. It allows to send the notification out as an Email or as an HTTP request as well as post it to an SQS queue

What I love the most about Amazon Web Services is the fact that when they do provide a certain Development Building Block such as Simple Email Service (SES), they do so without killing or harming the competition. There is still enough value and features in other Email services such as mailgun, SendGrid and MailChimp for them to co-exist with SES.

Not stepping (too much) on web services developers toes is not something to dismiss and I would love to see the innovation that comes out of Cloud based web services in the future.

Go cloud services! 🙂

ec2ssh – Quicker SSH-ing Into Your EC2 Instances

Ever since Amazon introduced tags in EC2 I felt such a relief that I can name my instances and actually remember which one is which.

It took a while for Name tags to find its way to various aspects of the AWS Console, however, connecting to machines still requires a way to find the IP address from the console or via a command line using the EC2 API Tools.

I thought that it would be much easier for me and others to utilize the Name tags to connect more easily to the machines.

Initially, the script was a simple bash script which utilizes a the ec2-describe-instances command with a flag that matched the Name attribute, however, managed various other command line parameters such as the user name to connect with (ubuntu images, for example, users the ‘ubuntu’ user instead of ‘root’. Amazon Linux AMI uses ‘ec2user’, etc) so I’ve decided to rewrite it in Python and use the magnificent Boto Python library.

This allowed better argument handling as well as remove the need to install Java and the EC2 API Tools to access the EC2 API.

Grab the code here

Don’t forget to send some feedback!

 

Using Amazon’s Simple Notification Service (SNS) & Simple Queue Service (SQS) For a Reliable Push Based Processing of Messages

I’ve recently encountered a use case where I need to reliably send a message to a worker process that should handle the following requirements:

  • Message should persist until the action it specified is done.
  • Message should be processed (or wait to be processed) when the worker processes have a bug or are down
  • Message processing should be as fast as possible – process the message ASAP.

Using Amazon’s Simple Queue Service (SQS)

SQS can provide the persistency needed. The message will be available until it is deleted (at least up to ~3 days) even if the worker processes are down or have a bug.

The problem with using SQS is that it requires polling which introduces a certain delay between the time a message is published and until it is processed. That delay can be small, a couple of seconds, but can easily be up to 30 seconds and more (depending on the limitations of SQS polling and the polling interval used).

Using Amazon’s Simple Notification Service (SNS)

SNS can provide a push mechanism to notify in near-real-time to a subscriber that a message has been published.
However, SNS can only guarantee a single delivery to each subscriber of a given topic. This means that if there was a bug or a problem processing the message and there was no specific code to save it somewhere, the message is lost.

The Solution

SQS and SNS can be combined to produce a PUSH mechanism to reliably handle messages.
The general configuration steps are:
  • Create a topic
  • Subscribe an SQS queue to that topic
  • Subscribe a worker process that work via HTTP/S to that topic (for increased reliability this can be an Elastic Load Balancer (ELB) that hides a set of machines)
Then the flow goes like this:
  • Submit a message to the topic
  • SNS will publish the message to the SQS queue
  • SNS will then notify via HTTP/S POST to the handler to start processing the message
When the worker process gets the HTTP/S POST from SNS it will start polling the queue until it has no items in the queue and finish the HTTP request.
To handle failures when the worker process has a bug or is down or did not get the SNS message, a regular worker process can run and poll the queue in regular, longer, intervals to make sure all messages are processed and no one gets behind.

This solution covers the original 3 requirements of message reliability, handling cases where workers are down or have bugs and handling messages as soon as they are sent.

Monitor Your Amazon Web Services Simple Queue Service (SQS) Queue Length in Amazon CloudWatch

UPDATE (2011-07-16): I just got a newsletter Email from Amazon stating that they have added SQS and SNS to CloudWatch which allows monitor SQS queues not just for the length of the queue, but for others metrics as well, so there is no real need in my script. Unless you really really want to use it 🙂

All you have to do is select SQS in the metrics type drop down and you will see a set of metrics to select from for all of your existing queues.

 

 

Amazon’s CloudWatch is a great tool for monitor various aspects of your service. Last May Amazon introduced custom metrics to CloudWatch which allows sending any metrics data you wish to CloudWatch. You can then store it, plot it and also create CloudWatch Alerts based on it.

One of the things missing from CloudWatch is Simple Queue Service (SQS) monitoring, so I’ve written a small script to update a queue’s count in a CloudWatch custom metric.
Having the queue’s count in CloudWatch allows adding alerts and actions based on the queue’s length.

For example, if the queue’s length is above a certain amount of a certain period of time, one of 2 things happened:

  1. There is a bug in the code causing the worker processes that process the queue’s message to fail
  2. There is a higher than usual load on the system causing the queue fill up and get more and more messages while there aren’t enough worker processes to process these messages in reasonable time

If the load is higher than usual you can easily tell via a CloudWatch alert to add an additional machine instance running more worker processes or simply send an Email alert saying there is something wrong.

The script is very easy to use and can be run from a cron job. I’m running it as a cron job in 1 minute intervals and have set up various CloudWatch alerts to better monitor my queue.

Grab the script on Github at:   SQS Cloud Watch Queue Count.