Posts Tagged ‘Capacity’

Understanding Data Deduplication

Thursday, November 12th, 2009

As we look at the many ways to improve storage utilization, data deduplication often pops up as a potential technique. Data deduplication, or sometimes referred to as “intelligent compression” or “single-instance storage”,  is a method of reducing storage needs by eliminating redundant data. Deduplication is quite similar to data compression, but it looks for repeating sequence of very large chunks of data across very large comparison windows. Long sequences are compared to the history of other such sequences, and where matched, only one unique instance of the data sequence is actually retained on storage media. Redundant data is replaced with a pointer to that first unique data sequence copy. dedupeFor example, a typical email system might contain 300 instances of the same two megabyte (2 MB) file attachment. If the email platform is backed up or archived, all 300 instances are saved, requiring 600 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 600 MB storage demand could be reduced to just 2 MB.  Imagine the huge economic benefits! Of course, in a storage system, this is all hidden from users and applications, so the whole file is readable after having been written.

(more…)