• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • About
    • Stephen Foskett
      • My Publications
        • Urban Forms in Suburbia: The Rise of the Edge City
      • Storage Magazine Columns
      • Whitepapers
      • Multimedia
      • Speaking Engagements
    • Services
    • Disclosures
  • Categories
    • Apple
    • Ask a Pack Rat
    • Computer History
    • Deals
    • Enterprise storage
    • Events
    • Personal
    • Photography
    • Terabyte home
    • Virtual Storage
  • Guides
    • The iPhone Exchange ActiveSync Guide
      • The iPhone Exchange ActiveSync Troubleshooting Guide
    • The iPad Exchange ActiveSync Guide
      • iPad Exchange ActiveSync Troubleshooting Guide
    • Toolbox
      • Power Over Ethernet Calculator
      • EMC Symmetrix WWN Calculator
      • EMC Symmetrix TimeFinder DOS Batch File
    • Linux Logical Volume Manager Walkthrough
  • Calendar

Stephen Foskett, Pack Rat

Understanding the accumulation of data

You are here: Home / Everything / Enterprise storage / Microsoft Adds Data Deduplication to NTFS in Windows 8

Microsoft Adds Data Deduplication to NTFS in Windows 8

January 3, 2012 By Stephen 16 Comments

Windows 8 server editions will include a filter driver for NTFS for data deduplication

The next version of Microsoft Windows Server includes integrated data deduplication technology. Microsoft is positioning this as a boon for server virtualization and claims it has very little performance impact. But how exactly does Microsoft’s de-duplication technology work?

Introducing Windows 8 Deduplication

Let’s make one thing clear right from the start: Microsoft started from a clean sheet and invented their own deduplication technology. This is not a licensed, cloned, or copied feature as far as I can tell. There are some clever aspects to it, along with a few head scratchers for folks like me who’ve seen lots of different deduplication approaches.

Microsoft’s deduplication is layered onto NTFS in Windows 8, and will be a feature add-on for Server users. It is implemented as a filter driver on a per volume basis, with each volume a complete, self describing unit. It is cluster aware, and fully crash consistent on all operations. This is a pretty neat trick: As is typical for Microsoft, deduplication will be a simple, transparent feature.

Now let’s talk for a moment about what Windows 8 deduplication is not.

  • It is a server-only feature, like so many of Microsoft’s storage developments. But perhaps we might see it deployed in low-end or home servers in the future.
  • It is not supported on boot or system volumes.
  • Although it should work just fine on removable drives, deduplication requires NTFS so you can forget about FAT or exFAT. And of course the connected system must be running a server edition of Windows 8.
  • Although deduplication does not work with clustered shared volumes, it is supported in Hyper-V configurations that do not use CSV.
  • Finally, deduplication does not function on encrypted files, files with extended attributes, tiny (less than 64 kB) files, or re-parse points.

Some Technical Details on Deduplication in Windows 8

Microsoft Research spent 2 years experimenting with algorithms to find the “cheapest” in terms of overhead. They select a chunk size  for each data set. This is typically between 32 KB and 128 KB, but smaller chunks can be created as well. Microsoft claims that most real-world use cases are about 80 KB. The system processes all the data looking for “fingerprints” of split points and selects the “best” on the fly for each file.

After data is de-duplicated, Microsoft compresses the chunks and stores them in a special “chunk store” within NTFS. This is actually  part of the System Volume store in the root of the volume, so dedupe is volume-level. The entire setup is self describing, so a deduplication NTFS volume can be read by another server without any external data.

There is some redundancy in the system as well. Any chunk that is referenced more than x times (100 by default) will be kept in a second location. All data in the filesystem is checksummed and will be proactively repaired. The same is done for the metadata. The deduplication service includes a scrubbing job as well as a file system optimization task to keep everything running smoothly.

Windows 8 deduplication cooperates with other elements of the operating system. The Windows caching layer is dedupe-aware, and this will greatly accelerate overall performance. Windows 8 also includes a new “express” library that makes compression “20 times faster”. Compressed files are not re-compressed based on filetype, so zip files, Office 2007+ files, etc will be skipped and just deduped.

New writes are not deduped – this is a post-process technology. The data deduplication service can be scheduled or can run in “background mode” and wait for idle time. Therefore, I/O impact is between “none and 2x” depending on type. Opening a file is less than 3% greater I/O and can be faster if it’s cached. Copying a large file can make some difference (e.g. 10 GB VHD) since it adds additional disk seeks, but multiple concurrent copies that share data can actually improve performance.

Stephen’s Stance

Although I am intrigued by Microsoft’s new deduplication technology in Windows 8 server, I still have many questions about its usefulness and impact on performance. Concentrating duplicate data in the system volume makes sense from a technical perspective, but could lead to an I/O hotspot on the disk. This is especially true for external caching storage systems, since there is no integration between Microsoft deduplication and storage array features. I am particularly concerned about the use of deduplication with VHD files in Hyper-V, since it could eat up valuable system RAM and impact I/O performance.

If you would like to try Microsoft deduplication for yourself, I am happy to report that it is included in the developer preview of Windows 8 that is available on Dev Center. Here are a few commands to get you started, and read Rick Vanover’s post too!

Import-Module ServerManager
Add-WindowsFeature -name FS-Data-Deduplication
Import-Module Deduplication
Enable-DedupVolume E:
get-dedupvolume

Note: I am a Microsoft MVP and Microsoft briefs me on upcoming technologies under NDA. This post is based on a Microsoft briefing from November which was said at the time not to be covered by any NDA. All of this information could be gleaned by experimenting with the Windows 8 developer preview, but it’s much easier to just go to the source.

You might also want to read these other posts...

  • Electric Car Over the Internet: My Experience Buying From…
  • Tortoise or Hare? Nvidia Jetson TK1
  • How To Connect Everything From Everywhere with ZeroTier
  • Liberate Wi-Fi Smart Bulbs and Switches with Tasmota!
  • Scam Alert: Fake DMCA Takedown for Link Insertion

Filed Under: Enterprise storage, Everything, Gestalt IT, Personal, Terabyte home, Virtual Storage Tagged With: CSV, data deduplication, deduplication, file system, Hyper-V, I/O, Microsoft, NTFS, performance, Rick Vanover, server, Windows 8, Windows Server 2012

Primary Sidebar

If you wish to make an apple pie from scratch, you must first invent the universe

Carl Sagan

Subscribe via Email

Subscribe via email and you will receive my latest blog posts in your inbox. No ads or spam, just the same great content you find on my site!
 New posts (daily)
 Where's Stephen? (weekly)

Download My Book


Download my free e-book:
Essential Enterprise Storage Concepts!

Recent Posts

How To Install ZeroTier on TrueNAS 12

February 3, 2022

Scam Alert: Fake DMCA Takedown for Link Insertion

January 24, 2022

How To Connect Everything From Everywhere with ZeroTier

January 14, 2022

Electric Car Over the Internet: My Experience Buying From Vroom

November 28, 2020

Powering Rabbits: The Mean Well LRS-350-12 Power Supply

October 18, 2020

Tortoise or Hare? Nvidia Jetson TK1

September 22, 2020

Running Rabbits: More About My Cloud NUCs

September 21, 2020

Introducing Rabbit: I Bought a Cloud!

September 10, 2020

Remove ROM To Use LSI SAS Cards in HPE Servers

August 23, 2020

Test Your Wi-Fi with iPerf for iOS

July 9, 2020

Symbolic Links

    Featured Posts

    The Myths of Standardization

    December 15, 2011

    Storage Changes in VMware vSphere 5

    July 16, 2011

    Why I Am Biased Against FCoE

    October 21, 2011

    Generation 3 drobo: Fall In Love All Over Again

    April 9, 2015

    Are You a Hypervisor Hugger or a Storage Stalwart?

    November 14, 2011

    The Rack Endgame: Converged Infrastructure and Disaggregation

    September 19, 2014

    Storage Changes in VMware vSphere 5.1

    September 4, 2012

    Edward Snowden Is Right: We Must Protect The Internet

    March 19, 2014

    Defining Failure: What Is MTTR, MTTF, and MTBF?

    July 6, 2011

    Here’s Something Your Raspberry Pi Can’t Do: Gigabit Ethernet and SATA in the Olimex A20-OLinuXIno-LIME2

    May 25, 2016

    Footer

    Legalese

    Copyright © 2022 · Log in