ZFS should have been great, but I kind of hate it: ZFS seems to be trapped in the past, before it was sidelined it as the cool storage project of choice; it’s inflexible; it lacks modern flash integration; and it’s not directly supported by most operating systems. But I put all my valuable data on ZFS because it simply offers the best level of data protection in a small office/home office (SOHO) environment. Here’s why.
The ZFS Revolution, Circa 2006
In my posts on FreeNAS, I emphatically state that “ZFS is the best filesystem”, but if you follow me on social media, it’s clear that I don’t really love it. I figured this needs some explanation and context, so at the risk of agitating the ZFS fanatics, let’s do it.
When ZFS first appeared in 2005, it was absolutely with the times, but it’s remained stuck there ever since. The ZFS engineers did a lot right when they combined the best features of a volume manager with a “zettabyte-scale” filesystem in Solaris 10:
- ZFS achieves the kind of scalability every modern filesystem should have, with few limits in terms of data or metadata count and volume or file size.
- ZFS includes checksumming of all data and metadata to detect corruption, an absolutely essential feature for long-term large-scale storage.
- When ZFS detects an error, it can automatically reconstruct data from mirrors, parity, or alternate locations.
- Mirroring and multiple-parity “RAID Z” are built in, combining multiple physical media devices seamlessly into a logical volume.
- ZFS includes robust snapshot and mirror capabilities, including the ability to update the data on other volumes incrementally.
- Data can be compressed on the fly and deduplication is supported as well.
When ZFS appeared, it was a revolution compared to older volume managers and filesystems. And Sun open-sourced most of ZFS, allowing it to be ported to other operating systems. The darling of the industry, ZFS quickly appeared on Linux and FreeBSD and Apple even began work to incorporate it as the next-generation filesystem for Mac OS X! The future seemed bright indeed!
Checksums for user data are essential or you will lose data: Why Big Disk Drives Require Data Integrity Checking and The Prime Directive of Storage: Do Not Lose Data
2007 to 2010: ZFS is Derailed
But something terrible happened to ZFS on the way to its coronation: Lawsuits, licensing issues, and FUD.
The skies first darkened in 2007, as NetApp sued Sun, claiming that their WAFL patents were infringed by ZFS. Sun counter-sued later that year, and the legal issues dragged on. Although ZFS definitely did not copy code from NetApp, the copy-on-write approach to snapshots was similar to WAFL, and those of us in the industry grew concerned that the NetApp suit could impact the future availability of open-source ZFS. And this appears to have been concerning enough to Apple that they dropped ZFS support from Mac OS X 10.6 “Snow Leopard” just before it was released.
Here’s a great blog about ZFS and Apple from Adam Leventhal, who worked on it: ZFS: Apple’s New Filesystem That Wasn’t
By then, Sun was hitting hard times and Oracle swooped in to purchase the company. This sowed further doubt about the future of ZFS, since Oracle did not enjoy wide support from open source advocates. And the CDDL license Sun applied to the ZFS code was judged incompatible with the GPLv2 that covers Linux, making it a non-starter for inclusion in the world’s server operating system.
Although OpenSolaris continued after the Oracle acquisition, and FreeBSD embraced ZFS, this was pretty much the extent of its impact outside the enterprise. Sure, NexentaStor and GreenBytes helped push ZFS forward in the enterprise, but Oracle’s lackluster commitment to Sun in the datacenter started having an impact.
What’s Wrong With ZFS Today
OpenZFS remains little-changed from what we had a decade ago.
Many remain skeptical of deduplication, which hogs expensive RAM in the best-case scenario. And I do mean expensive: Pretty much every ZFS FAQ flatly declares that ECC RAM is a must-have and 8 GB is the bare minimum. In my own experience with FreeNAS, 32 GB is a nice amount for an active small ZFS server, and this costs $200-$300 even at today’s prices.
And ZFS never really adapted to today’s world of widely-available flash storage: Although flash can be used to support the ZIL and L2ARC caches, these are of dubious value in a system with sufficient RAM, and ZFS has no true hybrid storage capability. It’s laughable that the ZFS documentation obsesses over a few GB of SLC flash when multi-TB 3D NAND drives are on the market. And no one is talking about NVMe even though it’s everywhere in performance PC’s.
Then there’s the question of flexibility, or lack thereof. Once you build a ZFS volume, it’s pretty much fixed for life. There are only three ways to expand a storage pool:
- Replace each and every drive in the pool with a larger one (which is great but limiting and expensive)
- Add a stripe on another set of drives (which can lead to imbalanced performance and redundancy and a whole world of potential stupid stuff)
- Build a new pool and “zfs send” your datasets to it (which is what I do, even though it’s kind of tricky)
Apart from option 3 above, you can’t shrink a ZFS pool. Worse, you can’t change the data protection type without rebuilding the pool, and this includes adding a second or third parity drive. The FreeNAS faithful spend an inordinate amount of time trying to talk new users out of using RAID-Z1 1 and moaning when they choose to use it anyway.
These may sound like little, niggling concerns but they combine to make ZFS feel like something from the dark ages after using Drobo, Synology, or today’s cloud storage systems. With ZFS, it’s “buy some disks and a lot of RAM, build a RAID set, and never touch it again”, which is not exactly in line with how storage is used these days.2
Where Are the Options?
I’ve probably made ZFS sound pretty unappealing right about now. It was revolutionary but now it’s startlingly limiting and out of touch with the present solid-state-dominated storage world. So what are your other choices?
Linux has a few decent volume managers and filesystems, and most folks use a combination of LVM or MD and ext4. Btrfs really got storage nerds excited, appearing to be a ZFS-like combination of volume manager and filesystem with added flexibility, picking up where ReiserFS flopped. And Btrfs might just become “the ZFS of Linux” but development has faltered lately, with a scary data loss bug derailing RAID 5 and 6 last year and not much heard since. Still, I suspect that I’ll be recommending Btrfs for Linux users five years from now, especially with strong potential in containerized systems.3
On the Windows side, Microsoft is busy rolling out their own next-generation filesystem. ReFS uses B+ trees (similar to Btrfs), scales like crazy, and has built-in resilience and data protection features4. When combined with Storage Spaces, Microsoft has a viable next-generation storage layer for Windows Server that can even use SSD and 3D-XPoint as a tier or cache.
Then there’s Apple, which reportedly rebooted their next-generation storage layer a few times before coming up with APFS, launched this year in macOS High Sierra. APFS looks a lot like Btrfs and ReFS, though implemented completely differently with more of a client focus. Although lacking in a few areas (user data is not checksummed and compression is not supported), APFS is the filesystem iOS and macOS need. And APFS is the final nail in the coffin for the “ZFS on Mac OS X” crowd.
Each major operating system now has a next-generation filesystem (and volume manager): Linux has Btrfs, Windows has ReFS and Storage Spaces, and macOS has APFS. FreeBSD seems content with ZFS, but that’s a small corner of the datacenter. And every enterprise system has already moved way past what ZFS can do, including enterprise-class offerings based on ZFS from Sun, Nexenta, and iXsystems.
Still, ZFS is way better than legacy storage SOHO filesystems. The lack of integrity checking, redundancy, and error recovery makes NTFS (Windows), HFS+ (macOS), and ext3/4 (Linux) wholly inappropriate for use as a long-term storage platform. And even ReFS and APFS, lacking data integrity checking, aren’t appropriate where data loss cannot be tolerated.
Stephen’s Stance: Use ZFS (For Now)
Sad as it makes me, as of 2017, ZFS is the best filesystem for long-term, large-scale data storage. Although it can be a pain to use (except in FreeBSD, Solaris, and purpose-built appliances), the robust and proven ZFS filesystem is the only trustworthy place for data outside enterprise storage systems. After all, reliably storing data is the only thing a storage system really has to do. All my important data goes on ZFS, from photos to music and movies to office files. It’s going to be a long time before I trust anything other than ZFS!
- RAID-Z2 and RAID-Z3, with more redundancy, is preferred for today’s large disks to avoid data loss during rebuild ↩
- Strangely, although multiple pools and removable drives work perfectly well with ZFS, almost no one talks about using it that way. It’s always a single pool named “tank” that includes every drive in the system. ↩
- One thing really lacking in Btrfs is support for flash, and especially hybrid storage. But I’d rather that they got RAID-6 right first. ↩
- Though data checksums are still turned off by default in ReFS ↩