The latest beta of the server version of Microsoft’s forthcoming Windows 8 operating system includes a handy tool related to the new data deduplication feature. DDPEVAL will test a given dataset using the new deduplication and compression engine and report the savings to be expected. And it works even on non-Windows 8 systems!
Windows Server 8 Data Deduplication Review
For more details, see my post, Microsoft Adds Data Deduplication to NTFS in Windows 8
As I covered previously, the as-yet unnamed Server version of Windows 8 will include an optional data deduplication and compression engine. This new feature is quite high-tech, with variable chunk sizes and intelligent handling of file types.
But all is not rosy in Windows deduplication land. First, it’s a server-only feature so Windows 8 desktop users are out of luck. Microsoft doesn’t support deduplication with system or boot drives, either. And the company has clarified that it won’t be supported for production Hyper-V VHD files or SQL Server data. This is disappointing, since VHD files are one of the most-duplicated data sets most people will ever see!
Data Deduplication in Windows Server 8 will be a huge boon for corporate file servers and other low-I/O environments. In my testing, it was highly effective when tested with a typical corporate fileserver dataset: My files were reduced in size between 40% and 60%, depending on the dataset used. It wasn’t useful at all on a small set of pdf files (1% reduction) and ignored a set of media files I tested.
How did I know this? Microsoft made it extremely easy to test data deduplication and compression rate thanks to a brand-new utility, DDPEVAL.EXE. Found in the Windows\System32 folder (once the deduplication feature is installed), this simple utility parses a directory and reports the expected capacity optimization success rate.
DDPEVAL is part of the Data Deduplication feature. Once you have installed the Beta of Windows 8 Server, launch PowerShell and type the following to install it:
Import-Module ServerManager Add-WindowsFeature -name FS-Data-Deduplication
Experimenting With DDPEVAL
Here’s an example of DDPEVAL’s output:
PS C:\> C:\Windows\System32\ddpeval.exe f:\ Data Deduplication Savings Evaluation Tool Copyright (c) 2011 Microsoft Corporation. All Rights Reserved. Evaluated folder: f:\ Processed files: 1170 Processed files size: 1.73 GB Optimized files size: 909.80 MB Space savings: 862.93 MB Space savings percent: 48 Optimized files size (no compression): 1002.43 MB Space savings (no compression): 770.30 MB Space savings percent (no compression): 43 Files with duplication: 400 Files excluded by policy: 152 Files excluded by error: 0
As illustrated in this sample output, DDPEVAL processes files in a given directory and reports the estimated space savings with deduplication. The “no compression” values reflect pure deduplication success, while the total “Space savings” figures include both compression and deduplication.
Note that these savings are not necessarily a promise: By default, Windows Server 8 deduplication will not do anything for 30 days after it is enabled. This can be confusing after enabling the feature for the first time!
I ran DDPEVAL on a fresh installation of Windows Server 8 Beta, and was surprised to see that it could reduce these files by over 60%, if only it functioned on system drives. Interestingly, the tool shows 28% duplicate date, with the remainder saved due to compression.
PS C:\> C:\Windows\System32\ddpeval.exe c:\ Data Deduplication Savings Evaluation Tool Copyright (c) 2011 Microsoft Corporation. All Rights Reserved. Evaluated folder: c:\ Processed files: 14183 Processed files size: 7.20 GB Optimized files size: 2.65 GB Space savings: 4.39 GB Space savings percent: 62 Optimized files size (no compression): 5.04 GB Space savings (no compression): 2.00 GB Space savings percent (no compression): 28 Files with duplication: 4571 Files excluded by policy: 58816 Files excluded by error: 111
I also tested deduplication with massive duplicate files. Unsurprisingly, it was able to reduce VHD images and duplicate media files by almost 100%, leaving just a 4 KB stub in the filesystem.
DDPEVAL is Self-Contained for Cross-Platform Use!
Interestingly, the DDPEVAL.EXE file is entirely self-contained, so it will run on many prior versions of Windows. I copied the EXE file from my x64 Server 8 Beta to a variety of virtual machines, and successfully ran it on Windows 7 (x64) and Windows Server 2008 R2 (x64). It failed to run on 32-bit Windows 7, however.
I’m sure Microsoft doesn’t support moving operating system components from machine to machine, but this is a useful tool for systems administrators as they evaluate moving to Windows Server 8. They will be able to run DDPEVAL on an existing Server 2008 R2 machine without moving the data. And the 64-bit executable is only 1.8 MB, so it fits nicely on any USB drive.
Capacity optimization technologies like data deduplication and compression are often tricky to evaluate, leaving many disappointed once these tools are put into production. By creating a simple, portable utility, Microsoft enables future Windows Server 8 customers to get a taste of this technology and decide if it will be worthwhile to roll out. I look forward to playing with DDPEVAL and the other new Server 8 technologies, and will be reporting more here in the future!
Disclaimer: As a Microsoft MVP, I have an NDA which covers my discussions with the team. I asked and was told that the Windows Server 8 Beta, released today, is “fair game” NDA-wise since it is publicly available. This post is entirely based on information from that beta release, not discussions with Microsoft employees.
Why on earth cant they just call it compression and get it over with? it’s functionally *NO* different than the compression they spent the last 20 years telling people not to use.
Oh well – for archival storage I don’t see the harm actually…the funny part is always when people try to compress…excuse me…DE-DUPLICATE their boot volumes.
It is *entirely* different and unrelated to NTFS compression.
Deduplication (single instancing identical chunks of data that are present in more than one file) is entirely different to compression. The de-duplication feature can *additionally* apply standard compression algorithms to the identified unique chunks of data.