Boston Metro-West Tweetup Framingham, MA Feb 10
Corporate Event Providence, RI Feb 11
Microsoft MVP Summit 2010 Seattle, WA Feb 15
HP Blade Tech Day Houston, TX Feb 25
Tech Field Day Boston Boston, MA Apr 8
  • Compression, encryption, de-duplication, and replication can all coexist, you just need to do it in the right order.

    You first de-duplicate, then you compress, then you encrypt, and last of all, you replicate.

    And, of course, if you really care about your data, you take a has of it at the beginning so you can verify that after all those machinations, you're still getting your original data back at the end of the day.
  • That's the traditional way of doing it. But I'm excited by the idea of doing deduplication AFTER compression and encryption using gzip-rsyncable and rsyncrypto!
  • Hi Stephen,
    Encryption seems to be evolving as a multi-level requirement. Encryption of data at the end of the line with self-encrypting drives covers data at rest and avoids the dedupe issue.

    It's trickier, as you say, for encrpting data on the move. When do you see a viable solution standardized?
  • As David says, the right way to do this is deduplicate (or, at least, segment for deduplication), compress, encrypt, replicate (,wash, rinse, repeat). --rsyncable totally works, but it's a bit of a hack... it's doing non-optimal segmentation for deduplication, and of course doesn't help unless you reset your encryption cipher on the same boundaries as well. As you say, this makes the encryption somewhat less secure -- again, you ought to be doing your replication over an encrypted channel anyhow.

    At Permabit (http://www.permabit.com), we incorporates all of these technologies in our Enterprise Archive product in this order for maximum benefit. As data is being written an in-line process breaks files up in to variable-sized segments for optimal deduplication. Then these segments are (optionally) compressed, (optionally) encrypted, deduplicated, and written to disk. These compressed, encrypted chunks can be replicated, which is also done over an encrypted channel to eliminate traffic analysis. This provides the best of all worlds.

    Regards,
    Jered Floyd
    CTO, Permabit
blog comments powered by Disqus
Improve the web with Nofollow Reciprocity.
  • Vimeo Videos