• Skip to main content
  • Skip to primary sidebar
  • Home
  • About
    • Stephen Foskett
      • My Publications
        • Urban Forms in Suburbia: The Rise of the Edge City
      • Storage Magazine Columns
      • Whitepapers
      • Multimedia
      • Speaking Engagements
    • Services
    • Disclosures
  • Categories
    • Apple
    • Ask a Pack Rat
    • Computer History
    • Deals
    • Enterprise storage
    • Events
    • Personal
    • Photography
    • Terabyte home
    • Virtual Storage
  • Guides
    • The iPhone Exchange ActiveSync Guide
      • The iPhone Exchange ActiveSync Troubleshooting Guide
    • The iPad Exchange ActiveSync Guide
      • iPad Exchange ActiveSync Troubleshooting Guide
    • Toolbox
      • Power Over Ethernet Calculator
      • EMC Symmetrix WWN Calculator
      • EMC Symmetrix TimeFinder DOS Batch File
    • Linux Logical Volume Manager Walkthrough
  • Calendar

Stephen Foskett, Pack Rat

Understanding the accumulation of data

You are here: Home / Everything / Apple / Bizarre HFS+ Tricks in Mac OS X 10.6 Snow Leopard

Bizarre HFS+ Tricks in Mac OS X 10.6 Snow Leopard

September 11, 2009 By Stephen 2 Comments

I don’t usually excerpt large amounts of text from other blogs. But this is just too cool. UNIX nerds and Mac OS X weenies alike will either shake their heads and jump out a window or laugh out loud at one of the under-reported changes in Snow Leopard.

See, Snow Leopard’s version of HFS+ allows per-file compression using three very creative filesystem hacks. I’ll let John Siracusa from Ars Technica take the story from here, and I urge you to read John’s complete (and very, very long) Snow Leopard review!

In Snow Leopard, other kinds of files climb on board the compression bandwagon. To give just one example, ninety-seven percent of the executable files in Snow Leopard are compressed. How compressed? Let’s look:

% cd Applications/Mail.app/Contents/MacOS

% ls -l Mail

[email protected] 1 root wheel 0 Jun 18 19:35 Mail

Boy, that’s, uh, pretty small, huh? Is this really an executable or what? Let’s check our assumptions.

% file Applications/Mail.app/Contents/MacOS/Mail

Applications/Mail.app/Contents/MacOS/Mail: empty

Yikes! What’s going on here? Well, what I didn’t tell you is that the commands shown above were run from a Leopard system looking at a Snow Leopard disk. In fact, all compressed Snow Leopard files appear to contain zero bytes when viewed from a pre-Snow Leopard version of Mac OS X. (They look and act perfectly normal when booted into Snow Leopard, of course.)

So, where’s the data? The little “@” at the end of the permissions string in the ls output above (a feature introduced in Leopard) provides a clue. Though the Mail executable has a zero file size, it does have some extended attributes:

% xattr -l Applications/Mail.app/Contents/MacOS/Mail

com.apple.ResourceFork:

0000 00 00 01 00 00 2C F5 F2 00 2C F4 F2 00 00 00 32 …..,…,…..2

0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …………….

(184,159 lines snipped)

2CF610 63 6D 70 66 00 00 00 0A 00 01 FF FF 00 00 00 00 cmpf…………

2CF620 00 00 00 00 ….

com.apple.decmpfs:

0000 66 70 6D 63 04 00 00 00 A0 82 72 00 00 00 00 00 fpmc……r…..

Ah, there’s all the data. But wait, it’s in the resource fork? Weren’t those deprecated about eight years ago? Indeed they were. What you’re witnessing here is yet another addition to Apple’s favorite file system hobbyhorse, HFS+.

At the dawn of Mac OS X, Apple added journaling, symbolic links, and hard links. In Tiger, extended attributes and access control lists were incorporated. In Leopard, HFS+ gained support for hard links to directories. In Snow Leopard, HFS+ learns another new trick: per-file compression.

The presence of the com.apple.decmpfs attribute is the first hint that this file is compressed. This attribute is actually hidden from the xattr command when booted into Snow Leopard. But from a Leopard system, which has no knowledge of its special significance, it shows up as plain as day.

Even more information is revealed with the help of Mac OS X Internals guru Amit Singh’s hfsdebug program, which has quietly been updated for Snow Leopard.

% hfsdebug /Applications/Mail.app/Contents/MacOS/Mail

…

compression magic = cmpf

compression type = 4 (resource fork has compressed data)

uncompressed size = 7500336 bytes

And sure enough, as we saw, the resource fork does indeed contain the compressed data. Still, why the resource fork? It’s all part of Apple’s usual, clever backward-compatibility gymnastics. A recent example is the way that hard links to directories show up–and function–as aliases when viewed from a pre-Leopard version of Mac OS X.

In the case of a HFS+ compression, Apple was (understandably) unable to make pre-Snow Leopard systems read and interpret the compressed data, which is stored in ways that did not exist at the time those earlier operating systems were written. But rather than letting applications (and users) running on pre-10.6 systems choke on–or worse, corrupt through modification–the unexpectedly compressed file contents, Apple has chosen to hide the compressed data instead.

And where can the complete contents of a potentially large file be hidden in such a way that pre-Snow Leopard systems can still copy that file without the loss of data? Why, in the resource fork, of course. The Finder has always correctly preserved Mac-specific metadata and both the resource and data forks when moving or duplicating files. In Leopard, even the lowly cp and rsync commands will do the same. So while it may be a little bit spooky to see all those “empty” 0 KB files when looking at a Snow Leopard disk from a pre-Snow Leopard OS, the chance of data loss is small, even if you move or copy one of the files.

The resource fork isn’t the only place where Apple has decided to smuggle compressed data. For smaller files, hfsdebug shows the following:

% hfsdebug /etc/asl.conf

…

compression magic = cmpf

compression type = 3 (xattr has compressed data)

uncompressed size = 860 bytes

Here, the data is small enough to be stored entirely within an extended attribute, albeit in compressed form. And then, the final frontier:

% hfsdebug /Volumes/Snow Time/Applications/Mail.app/Contents/PkgInfo

…

compression magic = cmpf

compression type = 3 (xattr has inline data)

uncompressed size = 8 bytes

That’s right, an entire file’s contents stored uncompressed in an extended attribute. In the case of a standard PkgInfo file like this one, those contents are the four-byte classic Mac OS type and creator codes.

% xattr -l Applications/Mail.app/Contents/PkgInfo

com.apple.decmpfs:

0000 66 70 6D 63 03 00 00 00 08 00 00 00 00 00 00 00 fpmc…………

0010 FF 41 50 50 4C 65 6D 61 6C .APPLemal

There’s still the same “fpmc…” preamble seen in all the earlier examples of the com.apple.decmpfs attribute, but at the end of the value, the expected data appears as plain as day: type code “APPL” (application) and creator code “emal” (for the Mail application–cute, as per classic Mac OS tradition).

You may be wondering, if this is all about data compression, how does storing eight uncompressed bytes plus a 17-byte preamble in an extended attribute save any disk space? The answer to that lies in how HFS+ allocates disk space. When storing information in a data or resource fork, HFS+ allocates space in multiples of the file system’s allocation block size (4 KB, by default). So those eight bytes will take up a minimum of 4,096 bytes if stored in the traditional way. When allocating disk space for extended attributes, however, the allocation block size is not a factor; the data is packed in much more tightly. In the end, the actual space saved by storing those 25 bytes of data in an extended attribute is over 4,000 bytes.

But compression isn’t just about saving disk space. It’s also a classic example of trading CPU cycles for decreased I/O latency and bandwidth. Over the past few decades, CPU performance has gotten better (and computing resources more plentiful–more on that later) at a much faster rate than disk performance has increased. Modern hard disk seek times and rotational delays are still measured in milliseconds. In one millisecond, a 2 GHz CPU goes through two million cycles. And then, of course, there’s still the actual data transfer time to consider.

Granted, several levels of caching throughout the OS and hardware work mightily to hide these delays. But those bits have to come off the disk at some point to fill those caches. Compression means that fewer bits have to be transferred. Given the almost comical glut of CPU resources on a modern multi-core Mac under normal use, the total time needed to transfer a compressed payload from the disk and use the CPU to decompress its contents into memory will still usually be far less than the time it’d take to transfer the data in uncompressed form.

That explains the potential performance benefits of transferring less data, but the use of extended attributes to store file contents can actually make things faster, as well. It all has to do with data locality.

If there’s one thing that slows down a hard disk more than transferring a large amount of data, it’s moving its heads from one part of the disk to another. Every move means time for the head to start moving, then stop, then ensure that it’s correctly positioned over the desired location, then wait for the spinning disk to put the desired bits beneath it. These are all real, physical, moving parts, and it’s amazing that they do their dance as quickly and efficiently as they do, but physics has its limits. These motions are the real performance killers for rotational storage like hard disks.

The HFS+ volume format stores all its information about files–metadata–in two primary locations on disk: the Catalog File, which stores file dates, permissions, ownership, and a host of other things, and the Attributes File, which stores “named forks.”

Extended attributes in HFS+ are implemented as named forks in the Attributes File. But unlike resource forks, which can be very large (up to the maximum file size supported by the file system), extended attributes in HFS+ are stored “inline” in the Attributes File. In practice, this means a limit of about 128 bytes per attribute. But it also means that the disk head doesn’t need to take a trip to another part of the disk to get the actual data.

As you can imagine, the disk blocks that make up the Catalog and Attributes files are frequently accessed, and therefore more likely than most to be in a cache somewhere. All of this conspires to make the complete storage of a file, including both its metadata in its data, within the B-tree-structured Catalog and Attributes files an overall performance win. Even an eight-byte payload that balloons to 25 bytes is not a concern, as long as it’s still less than the allocation block size for normal data storage, and as long as it all fits within a B-tree node in the Attributes File that the OS has to read in its entirety anyway.

There are other significant contributions to Snow Leopard’s reduced disk footprint (e.g., the removal of unnecessary localizations and “designable.nib” files) but HFS+ compression is by far the most technically interesting.

via Mac OS X 10.6 Snow Leopard: the Ars Technica review – Ars Technica.

You might also want to read these other posts...

  • Electric Car Over the Internet: My Experience Buying…
  • Tortoise or Hare? Nvidia Jetson TK1
  • Introducing Rabbit: I Bought a Cloud!
  • Liberate Wi-Fi Smart Bulbs and Switches with Tasmota!
  • Ranting and Raving About the 2018 iPad Pro

Filed Under: Apple, Computer History Tagged With: Ars Technica, compression, filesystem, HFS, John Siracusa, Mac OS X, Snow Leopard

Primary Sidebar

This is our mission: To be the Daleks of God

Shriekback

Subscribe via Email

Subscribe via email and you will receive my latest blog posts in your inbox. No ads or spam, just the same great content you find on my site!
 New posts (daily)
 Where's Stephen? (weekly)

Download My Book


Download my free e-book:
Essential Enterprise Storage Concepts!

Recent Posts

Electric Car Over the Internet: My Experience Buying From Vroom

November 28, 2020

Powering Rabbits: The Mean Well LRS-350-12 Power Supply

October 18, 2020

Tortoise or Hare? Nvidia Jetson TK1

September 22, 2020

Running Rabbits: More About My Cloud NUCs

September 21, 2020

Introducing Rabbit: I Bought a Cloud!

September 10, 2020

Remove ROM To Use LSI SAS Cards in HPE Servers

August 23, 2020

Test Your Wi-Fi with iPerf for iOS

July 9, 2020

Liberate Wi-Fi Smart Bulbs and Switches with Tasmota!

May 29, 2020

What You See and What You Get When You Follow Me

May 28, 2019

GPS Time Rollover Failures Keep Happening (But They’re Almost Done)

April 6, 2019

Symbolic Links

    Featured Posts

    ZFS Is the Best Filesystem (For Now…)

    July 10, 2017

    Go Get a ProtonMail Account and Protect Your Online Life!

    July 19, 2017

    Top VMware Blogs 2014: How I Voted

    February 25, 2014

    Scaling Storage At The Client

    November 25, 2013

    Ranting and Raving About the 2018 iPad Pro

    November 11, 2018

    My Advice For New Business Travelers: Get The Credit Cards!

    March 20, 2014

    A Fairy Tale of Two Storage Protocols

    September 23, 2014

    Scaling Storage In Conventional Arrays

    November 19, 2013

    It’s Time To Speak Out Against Sexism In IT Recruiting

    May 6, 2013

    Why You Should Never Again Utter The Word, “CIFS”

    February 16, 2012

    Copyright © 2021 · Log in