MongoDB ReplicaSet Backup in the Cloud

MongoDB replicaset is a great way to handle scalability and redundancy. In the age of the cloud nodes are added and removed from a replicaset easily and quickly and in most cases all are created from the same image.

So how can we make sure that we are always running backup from a non MASTER replica set node?

Below is a small script that will only run backup on non master replica set node.

It will also archive and compress the backup and upload it to a Google Cloud Storage bucket. You can easily modify the last part to upload the file to an AWS S3 bucket using s3cp or s3cmd.

This is a template that works best for a typical small replica set – 2 nodes and an arbiter. You will install it on both nodes, schedule it using cron and it will only run on the non master one. Even if you flip the master role between servers the script will still work well without changing a thing.

A simple and elegant solution if I may say so myself :-)

SSL Termination for Google Compute Engine (GCE) Load Balancer

I’ve recently been working on moving some apps that I have from Amazon Web Services (AWS) to Google Compute Engine (GCE) to test the service as well as learn the differences.

One of the things that I had to use was SSL termination in the load balancer. AWS’s Elastic Load Balancer (ELB) supports SSL termination on the load balancer side for quite a while now.

Out of the box, GCE’s load balancer does not support SSL termination at the load balancer level, however you can forward TCP port 443 (the ported used by HTTPS) to the instances and have each instance do the SSL termination.

While it will add some extra load on the CPU to decode the encrypted traffic, its a reasonable solution that is relatively easy to deploy via any of the popular web server (Nginx, Apache, etc).

 

 

AWS Reserved Instances Marketplace Should Also Give Back AWS Credits Not Just Cash

Amazon recently introduced the AWS Reserved Instances Marketplace. The idea is great – allow people to sell their reserved instances which they don’t need for whatever reason instead of losing the reservation money (or if you are in heavy utilization – the complete run cost of the instance 24 x 7 x Number of years you reserved).

Before you can sell a reserved instance you need to setup various details to which Amazon will wire the money – however if you are not located in the US or have a US bank account you are out of luck. Unfortunately for me – I’m located in Israel with no US bank account.

Instead of messing with various taxing issues I would like to suggest AWS to simply give back AWS credits. That is – if I sell my reserved instance for $100 I should have the option of directly crediting my AWS account with $100 which I can then use on various AWS services.

I know AWS has some mechanism to work with such a thing since they do give out gift/trial credits all the time. I also know that the Amazon Associates program for referring customers to Amazon can give you back Amazon gift certificates instead of actual money.

Just a thought that would keep the money inside the AWS ecosystem while making non US customers happy.

Quick & Dirty API for Accessing Amazon Web Services (AWS) EC2 Pricing Data

Continuing my post about the JSON files used in the Amazon EC2 Page, I’ve created a small Python library that also acts as a command line interface to get the data.
The data in the JSON files does not contain the same values as the EC2 API for things like region name and instance types so the library/cli translates these values to their corresponding values in the EC2 API.

You can filter the output by region, instance type and OS type.

The command line output support a human readable table format, JSON and CSV.

To use the command line you’ll need to install the following Python libraries:

  • argparse – only if you are using Python < 2.7 (argparse is included in Python 2.7 and 3.x)
  • prettytable – if you want to print the human readable and pretty good looking ASCII table output

Both libraries can be installed using pip.
Grab the code from Github

Don’t forget to send feedback! Enjoy!

 

Amazon Web Services (AWS) EC2 Pricing Data

Have you ever wanted a way to access the Amazon Web Services EC2 pricing data from code?

It seems Amazon uses predefined JSON files in the EC2 page to display the pricing per region per instance type and type of utilization.

You can easily access these JSON files, load it and use it in your own apps (at least until Amazon changes these URLs).

The naming in these JSON files is sometimes different than the naming used in the API so for example, a small instance (m1.small) is “size” : “sm” and its type is “stdODI” or “stdResI” for reserved instances.

Below are the links to the relevant files:

On Demand Instances

Reserved Instances

Data Transfer Pricing

Cloud Watch Pricing

Elastic IPs Pricing

Elastic Load Balancer (ELB) Pricing

 

Crunch these files and enjoy it while its there :-)

My ideal monitoring system

The server / application monitoring field is filled with lots of options these days. The solutions vary greatly with feature set and management in mind.

There are various parameters distinguishing between these systems:

  • Hosted (CloudKick, ServerDensity, CloudWatch, RevelCloud and others) vs Installed (NagiosMuninGangliaCacti)

    Mission Control by Wade Harpootlian

  • Hosted solutions pricing plans use varied parameters such as price/server, price/metric, retention policy, # of metrics tracked, realtime-ness, etc.
  • Poll based method – where collecting server polls the other servers/service vs. Push – where you have a client on the server that pushes locally collected data to logging/monitoring server
  • Allowing custom metrics – not all systems allows monitoring, plotting, sending and alert on custom data (at least not in a easy manner)
Some of these systems are better suited to some tasks more than the others but in general none of them provides a (good) solution for handling todays monitoring needs that spans from operational to applicative.

My ideal monitoring system for any application that have servers running in the background should have the following features:

  • Hosted – for when it doesn’t make sense for me to run the operations of this
  • Open Source – for when the sweet spot leans towards me taking control of the operations and management of collecting my own statistics with a CLEAR path of migration between the hosted solution and my installed one
  • Suitable for a cloud / virtual server environment – where servers go up and down and each server simply reports its data to the monitoring system without the need to pre-register it with the system. This suggests a small client running on each machine collecting local stats and relaying it to a central system
  • Supports custom metrics – allowing me to add whatever stats I want be it operational (CPU, network bandwidth, disk space) or application related (such as number of sign ups or a specific function run time in milliseconds)
  • Understand more than numbers – not all stats are equal. Some are counters which I just want to say “increment” or “decrement”. Others are single data points that I need to simply store. Others are special data points with a unit of measure (such as “milliseconds”, “Bytes/second”, etc)
  • Locally installed client must handle network failures – if there is a network failure or a collecting server down time, stats will be stored locally and relayed to the collecting server when its available again
  • Locally installed client should collect custom metrics – if I want to send some custom metrics from my app – say when a user signs up – my code would talk with the locally installed client and that client will relay the data to the collecting server. This ensures minimum configuration and my app code can assume that there is always a locally installed client which can communicate with the collecting server be it via UNIX sockets, UDP datagram, shared memory or anything else that is suitable for the job
  • Data should be query-able – that is, I really want to query and filter more than just the timeframes of the data and general statistics on it (i.e. group by server, show specific servers, show values higher than X, etc)
  • Reporting Console – somewhere to plot all these statistics which has embeddable graphs (for those who likes building their own dashboards)
  • Built-in near real-time alerts – I want to be able to set alerts that go out near real time when collecting the data to a diverse set of outlets be it Email, Text Messages, Push Notifications, WebHook (for automating some auto handling of failures or problems), etc.
  • API – because everything needs it :-)

It is very important to me in almost any hosted SaaS (Software-as-a-Service) solution I use that I will have a clear migration path if (or when) the time comes and I need to host a certain sub-system on my own. Sometimes I do have to compromise and use a system that I may not have the ability to migrate (or at least not easily) but the decision is made consciously.

From an architecture point of view, I would like to see these main building blocks:

  • Storage – Reliable, scalable, can handle lots of writes fast and handle any size of dataset for any reasonable retention period
  • Collectors – clients push data to these collectors which gets it and pass it on the processors
  • Processors – Handle incoming data to be written. Aggregate data for quicker reporting.
  • Reporting – something that will enable easy querying and filtering of the data
  • Real time alerts monitoring – handle preconfigured alerts and figuring in near real time if certain conditions are met to issue the relevant alerts/actions
  • Web Console – for configuration, querying and real-time plotting of data
  • API for querying
  • API for real time plotting – to be used for integration with other apps, embeddable chunks of code, etc.

While I’m sure with a little more thought more requirements can be added or some of these requirements can be merged and minimized, this set of features will create a system a lot of people would love to use and feel comfortable using.

Would you use such a system? Do you have anything else to add to the feature set?

AWS SNS Retry Policy – New Feature – Push Based Messaging Instead of SQS Polling

The good guys at Amazon Web Services just announced a new feature in Simple Notifications Service (SNS) which allows settings a retry policy for SNS notifications.

Up until now SNS had an unknown publish retry policy (and maybe non existing). I always suspected it had some logic for different subscription types (Email, HTTP, Text, SQS) but it was never mentioned anywhere.

PUSH

by Ed Russel

The new retry policy feature allows you to define the number of retries and the wait period between them (even if its a linear or exponential wait!) as well as set a throttling policy so that if your server is currently down it won’t get flooded with notifications once its back up.

This allows for some very interesting patterns. Most notably is push based messaging mechanism in which instead of writing a dedicated process to poll an SQS queue you can use SNS as sort of an ad-hoc push queue that will post the messages to an HTTP/S URL. Setting a reasonable retry policy and throttling policy will also ensure that if your server is down, messages won’t get lost.

I posted a while back a suggestion for a hack which utilized SNS as a notification mechanism to start polling SQS, however now that SNS has a retry policy its a good candidate for allowing you to handle your async tasks using your regular HTTP servers with all the goodies of logging, multi-threading, debugging, etc.

Before you run to start implementing a push based messaging (or re-implement Google AppEngine’s TaskQueue Push Queue API), there are certain things which are yet unknown and/or require further consideration:

  • SQS has a 15 days storage policy, so you have up to 15 days to fix a bug or setup a system that will empty a queue. In the new SNS retry policy you may reach a similar long period of time however, the maximum values to set in the policy are not yet known and may pose a limit.
  • I am no aware and couldn’t find any documentation which related to the HTTP status code of a message pushed to an HTTP/S subscriber (other than answering the subscription request). How can you tell SNS that if a message was pushed to an HTTP subscriber, the subscriber failed due to an HTTP error? In that case will SNS consider a non HTTP Status 200 request a failed request and will do the retry policy?
  • What happens if a message pushed to an HTTP/S subscriber takes a long time to process due to load or any other reason? When will SNS decide if the request failed due to timeout?
I hope some of these questions will get cleared up and SNS can become a viable push based messaging mechanism.
I previously recommended to always push via SNS (even if its just to an SNS queue) just to get the added benefits of easier debugging (subscribe to an SQS queue AND send email or HTTP request to debug, etc). The new features only proves that its becoming a very interesting building block to use inside and/or outside of AWS.

 

 

The cool example of SaaS for developers by @mza and @jeffbarr

In a recent post on the AWS blog, Jeff Barr and Matt Wood, showed the architecture and code they wrote which lists the most interesting AWS related jobs from the Amazon Jobs site.

It serves as a rather good example of how service components such as the ones AWS provides (SNS, SQS, S3 to name a few that are AWS agnostic) a great set of building blocks that can easily help you focus on writing the code you really need to write.

I found the auto scaling policy for spinning up & down machines just to tweet a bit of an over kill at first (and Jeff could have easily added the code on the same instance running the cron), however thinking about it  a bit more and considering the various pricing strategies it actually makes a lot sense.

Membase Cluster instead of ElastiCache in 5 minutes

Want to have an ElastiCache like service in all regions, not just US-EAST?Memby
Want to utilize the new reserved instances utilization model to lower your costs?
Want to have your cache persistent and easily backup and restore it?
Want to add (or remove) servers from your cache cluster with no down time?

We have just the solution for you. A simple CloudFormation template to create a Membase cluster which gives you all of the ElasticCache benefits and a lot more including:

  • Support for ALL regions (not just US-EAST Apparently Amazon beat me to the punch with support at US West (N. California), EU West (Dublin), Asia Pacific (Singapore), and Asia Pacific (Tokyo))
  • Support for reserved instances including the new utilization based model
  • Supports adding and removing servers to the cluster with no downtime and automatic rebalancing of keys amont the cluster’s servers
  • Support persistency (if you wish)
  • Supports multi-tenancy and SASL authentication
  • Supports micro instances (not recommended, but good for low volume environments and testing environments)
  • Easily backup and restore
  • Install the cluster in multiple availability zones and regions for maximum survivability and have all of them talk to each other
  • Using a vBucket aware memcache client or running through a Moxi proxy changes in topology will get communicated automatically with no need for code or configuration changes!
  • No need for a single address (the CNAME you get from ElastiCache) because if you are using a vBucket aware client (or going through Moxi Proxy) topology changes are communicated to your client.
Based on the CloudFormation scripts created by Couchbase Labs, this script is more generic and utilize the complete available RAM (80% of available ram) for each type of instance.

Notes:

  • All instances run the latest Amazon Linux (2011-09)
  • 64bit instances use instance-store for storage
  • m1.micro uses 32bit to utilize maximum amount of RAM (we don’t want those 64bit pointers eating our RAM)
  • m1.small is 32bit and while it does have instance-store support, we wanted to have a one formation script to rule them all so it uses an EBS supported AMI.
  • The CloudFormation script changes the names of the instance to their public DNS names so they are available from anywhere in the world and any Availability Zone and Region in AWS so you can have repl

Security (VERY IMPORTANT!):

  • The default script will create a security group for you which allows access to ALL of the servers.
  • If you created a default bucket via the script – that bucket, which uses port 11211 is open to everyone. Make sure to protect it or delete it and create a bucket with SASL protection.
  • In general, if you are creating a non SASL protected bucket, make sure to protect the port by updating the security group!

Download:

Instructions:

  1. Grab one of the existing templates from the GitHub repository or run gen-pack.py to generate a template with any number of instances in it.
  2. Go to the CloudFromation Console
  3. Click on “Create New Stack”
  4. Name your stack
  5. Select “Upload a Temaplate File”
  6. Click “Browse” and select the template file (use one of the defaults named “membase-pack-X”)
  7. Click “Continue”
  8. In “Specify Parameters” step the minimum parameters you are required to fill are:
    • RESTPassword – the password for the REST API and management console
    • KeyName – the name of the KeyPair you are using when creating instances (usually the name of the .pem file without the “.pem” suffix)
    • Instance Type – choose the size of the instance (t1.micro by default)
  9. Click “Continue”
  10. Sit back and relax. CloudFormation will do the rest of the work for you
  11. Select the “Outputs” tab (click refresh if it doesn’t show anything). It should contain the URL of the Membase management console
  12. Clap your hands with joy!

Script to update Beanstalkd work queue statistics in CloudWatch

I’ve written a small Python script which uses the excellent boto libraryto monitor Beanstalkd server statistics as well as a specific beanstalkd tube (queue) statistics.

Beanstalkd

By Evil Cheese Scientist (Back Intact!)

You can get the code herehttps://github.com/erans/beanstalkdcloudwatch

The easiest way to run it is via a cron job. I run it every 1 minute to monitor the “reserved” and “buried” state of a few of the tubes I use (if you want to read more about how beanstalkd works I suggest reading the protocol document).

I highly recommending checking beanstalkd out if you need a queue for offloading tasks to other processes/machines. It’s really geared towards that task and have a couple of dedicated features such as the buried state, time-to-live and time-to-run which makes managing this really painless.