Friday, November 13, 2009

Amazon Web Services Start-up Challenge Finalists

We are excited to be one of nine finalists in the AWS Start-up Challenge!

Monday, November 2, 2009

Using Hudson to manage crons

We've been using Hudson for several months now to manage our builds -- we probably have 80-90 different projects that it's responsible for. It's an awesome system for continuous integration and testing.


It's also an awesome system for scheduling and managing generic jobs. We've only just begun to use it as a cron server, but it's clear that it has numerous advantages over the more traditional way of using the unix cron service directly.


  • Notification plugins -- Hudson can be easily configured to send email and Jabber notifications when cron jobs start, succeed, or fail. You can also track your scheduled jobs via RSS.

  • Stdout/Sterr logging -- Hudson saves the stdout and stderr from each run automatically.

  • SCM integration -- if you need to update a job, just check the changes into SVN (or whatever SCM system you use). Hudson will automatically pick up the changes the next time your job is run.

  • Nice web interface -- never underestimate the productivity gains from having a good UI. It can be surprisingly tricky to determine exactly which crons are running on a generic Unix box. Not so with Hudson.


At Bizo, we believe that developers should be getting their hands dirty in the operational aspects of their projects -- Hudson gives us an easy interface for managing our scheduled jobs using the same tools that we're familiar with for managing our build processes. Hudson is such a great tool for continuous integration that it's easy to overlook how good it is at the simpler task of managing generic scheduled jobs.

Tuesday, October 20, 2009

Clearing Linux Filesystem Cache

I was doing some performance tuning of our mysql db and was having some trouble consistently reproducing query performance due to IO caching that was occuring in Linux. In case you're wondering, you can clear this cache by executing the following command as root:

echo 1 > /proc/sys/vm/drop_caches

Friday, October 16, 2009

bash, errors, and pipes

Our typical pattern for writing bash scripts has been to start off each script with:

#!/bin/bash -e

The -e option will cause the script to exit immediate if a command has exited with a non-zero status. This way your script will fail as early as possible, and you never get into a case where on the surface, it looks like the script completed, but you're left with an empty file, or missing lines, etc.

Of course, this is only for "simple" commands, so in practice, you can think of it terminating immediately if the entire line fails. So a script like:

#!/usr/bin/bash -e
/usr/bin/false || true
echo "i am still running"
will still print "i am still running," and the script will exit with a zero exit status.

Of course, if you wrote it that way, that's probably what you're expecting. And, it's easy enough to change (just change "||" to "&&").

The thing that was slightly surprising to me was how a script would behave using pipes.

#!/bin/bash -e
/usr/bin/false | sort > sorted.txt
echo "i am still running"
If your script is piping its output to another command, it turns out that the return status of a pipeline is the exit status of its last command. So, the script above will also print "i am still running" and exit with a 0 exit status.

Bash provides a PIPESTATUS variable, which is an array containing a list of the exit status values from the pipeline. So, if we checked ${PIPESTATUS[0]} it would contain 1 (the exit value of /usr/bin/false), and ${PIPESTATUS[1]} would contain 0 (exit value of sort). Of course, PIPESTATUS is volatile, so, you must check it immediately. Any other command you run will affect its value.

This is great, but not exactly what I wanted. Luckily, there's another bash option -o pipefail, which will change the way the pipeline exit code is derived. Instead of being the last command, it will become the last command with a non-zero exit status. So

#!/bin/bash -e -o pipefail
/usr/bin/false | sort > sorted.txt
echo "this line will never execute"
So, thanks to pipefail, the above script will work as we expect. Since /usr/bin/false returns a non-zero exit status, the entire pipeline will return a non-zero exit status, the script will die immediately because of -e, and the echo will never execute.

Of course, all of this information is contained in the bash man page, but I had never really ran into it / looked into it before, and I thought it was interesting enough to write up.

Monday, October 12, 2009

s3fsr 1.4 released

s3fsr is a tool we built at Bizo to help quickly get files into/out of S3. It's had a few 1.x releases, but by 1.4 we figured it was worth getting around to posting about.

Overview

While there a lot of great S3 tools out there, s3fsr's niche is that it's a FUSE/Ruby user land file system.

For a command line user, this is handy, because it means you can do:
# mount yourbucket in ~/s3
s3fsr yourbucketname ~/s3

# see the directories/files
ls ~/s3/

# upload
mv ~/local.txt ~/s3/remotecopy.txt

# download
cp ~/s3/remote.txt ~/localcopy.txt
Behind the scenes, s3fsr is talking to the Amazon S3 REST API and getting/putting directory and file content. It will cache directory listings (not file content), so ls/tab completion will be quick after the initial short delay.

S3 And Directory Conventions

A unique aspect of s3fsr, and a specific annoyance it was written to fulfill, is that it understands several different directory conventions used by various S3 tools.

This directory convention problem stems from Amazon's decision to forgo any explicit notion of directories in the API, and instead force everyone to realize that S3 is not a file system but a giant hash table of string key -> huge byte array.

Let's take an example--you want to store two files, "/dir1/foo.txt" and "/dir1/bar.txt" in S3. In a traditional file system, you'd have 3 file system entries: "/dir1", "/dir1/foo.txt", and "/dir1/bar.txt". Note that "/dir1" gets its own entry.

In S3, without tool-specific conventions, storing "/dir1/foo.txt" and "/dir1/bar.txt" really means only 2 entries. "/dir1" does not exist of its own accord. The S3 API, when reading and writing, never parses keys apart by "/", it just treats the whole path as one big key to get/set in its hash table.

For Amazon, this "no /dir1" approach makes sense due to the scale of their system. If they let you have a "/dir1" entry, pretty soon API users would want the equivalent of a "rm -fr /dir1", which, for Amazon, means instead of a relatively simple "remove the key from the hash table" operation, they have to start walking a hierarchical structure and deleting child files/directories as they go.

When the keys are strewn across a distributed hash table like Dynamo, this increases the complexity and makes the runtime nondeterministic.

Which Amazon, being a bit OCD about their SLAs and 99th percentiles, doesn't care for.

So, no S3 native directories.

There is one caveat--the S3 API lets you progressively infer the existence of directories by probing the hash table keys with prefixes and delimiters.

In our example, if you probe with "prefix=/" and "delimiter=/", S3 will then, and only then, split & group the "/dir1/foo.txt" and "/dir1/bar.txt" keys on "/" and return you just "dir1/" as what the S3 API calls a "common prefix".

Which is kind of like a directory. Except that you have to create the children first, and then the directory pops into existence. Delete the children, and the directory pops out of existence.

This brings us to the authors of tools like s3sync and S3 Organizer--their users want the familiar "make a new directory, double click it, make a new file in it" idiom, not a backwards "make the children files first" idiom. It is, understandably, different from what users expect.

So, the tool authors got creative and basically added their own "/dir1" marker entries to S3 when users' perform a "new directory" operation to get back to the "directory first" idiom.

Note this is a hack, because issuing a "REMOVE /dir1" to S3 will not recursively delete the child files, because to S3 "/dir1" is just a meaningless key with no relation to any other key in the hash table). So now the burden is on the tool to do its own recursive iteration/deletion of the directories.

Which is cool, and actually works pretty well, except that the two primary tools implemented marker entries differently:
  • s3sync created marker entries (e.g. a "/dir1" entry) with a hard-coded content that etags (hashes) to a specific value. This known hash is nice because it makes it easy to distinguish directory entries from file entries when listing S3 entries and, S3 knowing nothing about directories, the tool having to infer on its own which keys represent files and which represent directories.
  • S3 Organizer created marker entries as well, but instead of a known etag/hash, they suffixed the directory name, so the key of "/dir1" is actually "/dir1_$folder$". It's then the job of the tool is recognize the suffix as a marker directory entry, strip off the suffix before showing the name to the user, and use a directory icon instead of a file icon.
So, if you use a S3 tool that does not understand these 3rd party conventions, browsing a well-used bucket will likely end up looking odd with obscure/duplicate entries:
/dir1 # s3sync marker entry file
/dir1 # common prefix directory
/dir1/foo.txt # actual file entry
/dir2_$folder$ # s3 organizer maker entry file
/dir2 # common prefix directory
/dir2/foo.txt # actual file entry
This quickly becomes annoying.

And so s3fsr understands all three conventions, s3sync, S3 Organizer, and common prefixes, and just generally tries to do the right thing.

FUSE Rocks

One final note is that the FUSE project is awesome. Implementing mountable file systems that users can "ls" around in usually involves messy, error-prone kernel integration that is hard to write and, if the file system code misbehaves, can screw up your machine.

FUSE takes a different approach and does the messy kernel code just once, in the FUSE project itself, and then it acts as a proxy out to your user-land, process-isolated, won't-blow-up-the-box process to handle the file system calls.

This proxy/user land indirection does degrade performance, so you wouldn't use it for your main file system, but for scenarios like s3fsr, it works quite well.

And FUSE language bindings like fusefs for Ruby make it a cinch to develop too--s3fsr is all of 280 LOC.

Wrapping up

Let us know if you find s3fsr useful--hop over to the github site, install the gem, kick the tires, and submit any feedback you might have.


Want to be challenged at work?

We've got a few challenges and are looking to grow our (kick ass) engineering team. Check out the opportunities below and reach out if you think you've got what it takes...

Thursday, October 8, 2009

Efficiently selecting random sub-collections.

Here's a handy algorithm for randomly choosing k elements from a collection of n elements (assume k < n)


public static <T> List<T> pickRandomSubset(Collection<T> source, int k, Random r) {
  List<T> toReturn = new ArrayList<T>(k);
  double remaining = source.size();
  for (T item : source) {
    double nextChance = (k - toReturn.size()) / remaining;
    if (r.nextDouble() < nextChance) {
      toReturn.add(item);
      if (toReturn.size() == k) {
        break;
      }
    }
    --remaining;
  }
  return toReturn;
}

The basic idea is to iterate through the source collection only once. For each element, we can compute the probability that it should be selected, which simply equals the number of items left to pick divided by the total number of items left.

Another nice thing about this algorithm is that it also works efficiently if the source is too large to fit in memory, provided you know (or can count) how many elements are in the source.

This isn't exactly anything groundbreaking, but it's far better than my first inclination to use library functions to randomly sort my list before taking a leading sublist.