This week, SpectraLogic announced DS3 and “BlackPearl”, an innovative product for tape storage using a cloud API. Although BlackPearl sounds like an Amazon Glacier clone, it’s really nothing of the sort. BlackPearl extends the S3 API for tape storage but this “DS3” API requires well-behaved clients and disciplined access. BlackPearl is exciting, it’s novel, and it’s useful. But it’s not S3 or Glacier, despite what some initial coverage may say.
There’s lots of great detail on DS3 and BlackPearl in this post by Ray Lucchesi
Amazon designed S3 to be massive, flexible, and always online. Typical users keep multiple buckets “open” all the time – reading and writing to a bucket per application or app profile and always assuming they’ll be available. This is one of the core values of S3, and in fact I use it to host all the images you see on this very blog!
SpectraLogic’s S3 protocol extension, known as “DS3” (“Deep Simple Storage Services”), extends the basic S3 protocol with a bunch of commands intended to make it work with tape. In addition to the standard S3 CRUD (“create, read, update, delete”), DS3 includes commands to load and unload buckets (tape sets) and put and get (read and write) bulk objects. This last bit is critical, since it’s important to keep tape drives streaming data.
Spectra also introduced an appliance instantiation of DS3. Known as “BlackPearl” (CamelCase, no space), this front-end for a Spectra library allows a DS3 (or even plain S3) client to read and write to tape.
BlackPearl has some flash storage inside, but this is a buffer rather than a cache or tier. Let me explain:
- A storage tier is the final landing spot for data. It can be permanent or actively managed.
- A cache is an alternate location for data that is stored elsewhere.
- A buffer is really ephemeral – it just serves to hold data momentarily while the real storage is unavailable.
In BlackPearl, the flash buffer allows data to be ordered for streaming to tape. It never holds data longer than necessary, and mainly serves to “coalesce” bulk writes. No writes are “committed” until the data is safely written to tape, and nothing is stored in the buffer after this, so this definitely isn’t a cache.
As a new product, BlackPearl is necessarily limited. It can’t handle more than a few open buckets at once – since each bucket maps to one or more tapes, and the appliance can’t address more than 4 drives, the current maximum number of open buckets is 4. And it can’t handle too many active reads or writes at once, either, since they have to be buffered and written immediately. In fact, BlackPearl has a clever mechanism to keep unserviceable PUT and GET operations alive while waiting for tape loads – it just issues a 300 redirect to itself every 30 seconds or so!
Although BlackPearl implements DS3, and DS3 is a superset of S3, one shouldn’t just go out and hitting Spectra’s new baby with S3 applications. They will probably function (after all, the S3 protocol “just works”) but probably not in a satisfactory manner. Nearly every S3 application I know of would choke at the limited availability of buckets and concurrent I/O operations, though they’d likely manage to keep alive.
Stephen’s Stance
DS3 and BlackPearl is intended as an entirely new frontier for web application storage. It shouldn’t take too much effort and time to develop applications with real DS3 integration assuming the developer can get over the concept of buckets as tape sets. Used properly, BlackPearl will offer massive (sequential) performance, exabyte-level scalability, and eye-popping cost savings over disk.
I would love to see an open-source Glacier alternative built around BlackPearl. Heck, Amazon should just rewrite Glacier internally to use a Spectra library rather than the disks we presume they’re using today! This is exciting stuff!
Leave a Reply