I’m a big fan of “sparse bundle” disk images in Mac OS X. They allow me to create encrypted repositories for valuable data that can efficiently be rsync-ed between disks and don’t waste a lot of space. So I thought I’d write up a bit on what they are and how they can be used.
What’s a Disk Image?
I suppose I should start with a bit of background info on why I love sparse bundles so much. Here’s the low-down!
We’re all used to dealing with hard disk drives and thumb drives: They offer raw “block” storage that is formatted with a file system and used by the operating system, applications, and us users. Buy a 4 TB hard disk drive and you can format it and store (about) 4 TB of stuff on it.
Disk images are a little less familiar to average folks, but they work pretty much the same way. A disk image is a file on a disk that acts like a separate disk. It could be a virtual hard drive for a virtual machine, a copy of a DVD or Blu-Ray disc, or an archive for an application that wants to use an entire disk.
My favorite use of disk images is as a secure, encrypted drive for important data. Create a disk image with encryption and you can move it around from drive to drive or machine to machine without having to worry that someone else will get their hands on the content. Sure, you could encrypt your whole drive but this isn’t always desirable for removable media (portable hard disk drives and thumb drives) since you often want to have some “wide open” space, too.
When you create a disk image, you must specify the size of the virtual disk drive, and this space is typically consumed immediately regardless of how much data you actually write to it. So a 1 TB disk image will take up 1 TB of actual capacity on whatever drive you write it to. And the entire image is treated as a single huge file, so it’s not efficient to update a copy of it after you’ve changed something.
What’s a Sparse Bundle?
Happily, Mac OS X supports a “sparse bundle” disk image that solves these issues. Sparse bundles are thin provisioned, meaning they grow as you add data. And they consist of many (many!) “bands” of data, each stored in a separate file. So it’s very efficient to keep sparse bundles images in sync between media using a command like rsync.
Create a 1 TB sparse bundle disk image and it will take up only a few megabytes of physical space until you add some data to it. Then it will grow as you write to it, creating new few-MB files one after another to handle your data. The size of the bands is determined by the size of the total image, and might be 1, 2, 4, or 8 MB.
You can create a sparse bundle using Disk Utility in Mac OS X, as shown below.
If you’re curious about the sparse bundle format, you can examine it from the command line. It consists of a “bundle” in Mac OS X parlance, which is a directory which is treated as a single file by Finder. 1 Inside this directory, you’ll find a few reference files as well as a subdirectory called “bands” with the files of actual data. As you add data to the sparse bundle, it will create more bands to store it.
You can also examine a sparse bundle by right-clicking in Finder and selecting “Show Package Contents”. But in everyday use you will see the bundle as single large file. Double-click on it in Finder and it will mount as a new drive that you can use just like any other.
Efficiently Moving Sparse Bundle Disk Images
Let’s say you wanted to create a sparse bundle on a thumb drive to hold some important data. I recommend formatting smaller flash drives using the exFAT filesystem so they can be read on both Windows and Mac OS X machines. exFAT is better than FAT32 and Mac OS X is happy to write a sparse bundle there, but it’ll work on FAT32 or regular HFS+ too.
The quickest way to move data in Mac OS X is usually dragging-and-dropping in Finder. 2 And since Finder treats a sparse bundle as a single file, you can easily drag and drop your newly-created bundle to another drive.
But things get more complicated once you’ve started using the drive. As you add files, you have to be careful to keep both the original and copy in perfect sync or you will corrupt the bundle and lose the data. That’s bad.
I like to use rsync to keep bundles in sync across drives or machines. It’s purpose-built to do this, included in Mac OS X by default, and works wonderfully with sparse bundle bands!
For example, let’s say “Theon.sparsebundle” was an encrypted sparse bundle on a thumb drive called “Winterfell” and you wanted to keep it in sync with another drive called “Pike”. Here’s the rsync command you would use:
rsync -hav --progress --delete /Volumes/Winterfell/Theon.sparsebundle /Volumes/Pike
Enter that on the command line in Terminal and rsync will examine each file and synchronize everything that changed. The “–delete” part is important since it will remove anything on Pike that’s not also on Winterfell: A disk image must have only one master; as soon as it is split it will become corrupted! 3
Let me reiterate that last point very clearly: Decide which image you will treat as the master and always only write to that one! Do not try to modify the same image in multiple locations or you will lose track of which is which and you will lose data! In this example, you could use the Theon image on Winterfell for regular reading and writing and keep the copy on Pike as a backup in case you lose Winterfell, but don’t try using Theon on Pike, too, or he’ll get very confused! 4
I like to create a simple shell script to store my rsync operation so it’s executed the same way every time. Just write the rsync command to a file ending in “.sh” and “chmod +x” the file to make it executable.
Taking Out the Trash: Compress Your Sparse Bundle
One aspect of sparse bundles that is not handled well is trash collection. Just like a normal Mac disk drive, deleted files will be stored in the “trash can” rather than deleted for good. But when you empty the trash, the sparse bundle won’t get any smaller!
Although sparse bundles are thin provisioned, they don’t have any built-in un-provisioning mechanism. Once a band is created, it continues to exist. It can be re-used by new data, but your “sparse” bundle will eventually grow to the total size you created at the start, and this can be problematic.
Thankfully, there is a command line function to “compact” a sparse bundle! The Mac OS X command “hdiutil” can do lots of great stuff with disk images, including mounting, un-mounting, and compacting them. It can even create them if you’re so inclined.
First, delete what you don’t want anymore and empty the trash. This tells the file system on the sparse bundle what is and is not being used.
Next, un-mount the sparse bundle but don’t pull out the drive:
hdiutil eject /Volumes/Theon
Now you’re ready to reclaim space and remove any un-used bands:
hdiutil compact /Volumes/Winterfell/Theon.sparsebundle
And we’re good! The hdiutil command will remove any un-needed bands and return the sparse bundle to the minimum size possible. Then you can re-mount it and use it as normal. Just compact it again once you’ve deleted a bunch of data and feel it’s gotten too big.
If you’re careful about keeping them in sync, sparse bundle disk images can be a wonderfully useful tool in Mac OS X. You can have a secure repository for sensitive data on thumb drives or even in the cloud! But make sure to treat only one as the “master” and use rsync carefully to avoid losing data.