Google Storage Available to All

Google made its Google Storage API available to all today. This is the service that is likely to make Google AppEngine useful, as the existing BigTable storage system was just too painful for words and not designed for large blobs. Its certainly feels a bit slicker in execution than AWS S3 and has a more polished user experience from a getting started perspective. It was several months after S3 launched before somebody built a third party browser that would allow you to look at you buckets online.

They use a similar model to S3 of unique bucket names, so get in quick in you want a bucket called “test” or “src” 🙂  The naming conventions for objects are restricted, so you cannot expect to upload an arbitrary directory of files and expect it to succeed. The uploading process must perform some kind of name mapping process that converts illegal names to legal ones.

The do make a big deal about allowing developers to specify whether buckets and their contents are located in Europe (where in Europe?) or not  but read the T&Cs carefully. Section 2.2 makes it clear that Google can process your data just about anywhere including the US. It’s only “data at rest” that can be specified as stored in Europe. So all your data is essentially available to US agencies should that choose to take a peak. The most lame restriction in the T&C’s is the restriction on using Google Storage to create a “Google Storage like” system. Let’s put a layer in front of Google Storage that is like Google Storage but slower and less resilient and costs more, oh I definitely want to sign up for that service 🙂

They provide a version of the excellent Boto library that has been repurposed for use against Google Storage, this is the best indication yet that the Google API’s must be pretty close to the S3 APIs in structure. The main difference is the use of OAuth to give fine grained access to the storage objects. This is the biggest win for Google and I hope to see Microsoft Azure, RackSpace CloudFiles and AWS S3 following suite fairly quickly.

It would also be great to see the Boto changes for Google Storage rolled back into the Boto mainline.

Cloud Computing Cost Models : Don’t Sweat It

pile of dollars

I get asked reasonably often to help companies and individuals come up with a cost model for their cloud computing. People get really exercised about the cost of hundreds of compute nodes and terabytes. I know what these models should look like because I built and insanely complex model for PutPlace in 2006 when we founded the company and decided to deploy it on Amazon. I had the same concerns that most people had, was  I building a business that was going to explode in my face because I had made some fundamentally flawed economic assumptions?

Once we launched PutPlace we rapidly discovered a number of  interesting facts about our cost structures. The first one was that in a small online business such as PutPlace the compute costs dominate to the point that storage, bandwidth and transaction costs are essentially rounding errors. The second thing we discovered is that attracting enough users to move the needle from a compute perspective is “ahem” challenging for most companies.  With consistent upload rates of over 10k files per day (with occasional peaks exceeding 100k files daily) our grid wasn’t even breaking a sweat. We had absolutely no red-line events on compute and of course AWS happily absorbed everything we threw at it without blinking.

Even at the end of 2008 when the service had been up and running for 6 months over 75% of our costs were compute nodes.

So if you want to understand your cost basis a very simple model is to work out the number of compute nodes you want to run, price those nodes in AWS (or Slicehost or Rackspace) and use that as a monthly cost model for your whole environment. Once you have a few months of price data, you can subtract your compute costs to find out your variable costs in storage, transactions and bandwidth (which I’m betting will be marginal).  Now you have the marginal costs you can compute your variable cost per active user and now you know what your cost-plus price model basis is for each user.

You should still analyse your bill once a month to prevent surprises (like when we discovered we had 12 months of database snapshots taken at 5 minute intervals that no one was cleaning up) and to understand how the dynamics of your system are changing, but your key focus should be on your overall business model and your customer acquisition strategy.

Think of cloud computing like any other variable cost in your business, when you are small they are marginal (have you every priced up electricity usage in your startup financials?) and if you get big it just becomes a cost of doing business, so don’t sweat it!

(Most of the above applies equally well to building a “scalable” system, NoSQL boosters should take note!)