How it works
Most de-duplication solutions work by
- Dividing the input into individual chunks
- Calculating a hash value for the chunk of data and storing the hash in an index
- Using the hash value of the original chunk of data and comparing it with the hash value of another new chunk of data to determine whether to store or ignore (de-duple) the new data
Data Encryption Standards Used In Data De-Duplication
- Data Encryption Standard (DES): This standard, 56 bits long is a federal information processing standard (FIPS) that the US government once used to encrypt data
- Advanced Encryption Standard (AES): AES is a complex encryption standard known as a cipher block. It can be 128 to 256 bits long and uses a secure key that makes it much more secure tan DES, which it has replaced. AES conforms to the FIPS standard.
- Secure Has Algorithm (SHA): The SHA family (SHA-1 to SHA-3) of hashing algorithms also conforms to FIPS standards and is very secure. The hash lengths start at 160 bits and go all the way to 512 bits.
- Message Digest Algorithm 5 (MD5): This standard 128 bits long is a common data security system and is implementation is fairly simple and robust. The hash is a sting of hexadecimal numbers 32 characters long that looks like something like this A4BEC893BEE67418975BEAAC53298721. Because each hex character is made of 4 bits (A=1010, for example) the 32 character string adds up to 128 bits.
How Data Gets Duplicated
- File-based compare and compression
- File-level hashing
- Block-level hashing
- Sub-block level hashing
- Delta versioning (non-dupe or not duplicating in the first place)