Now, we can use disk as data backup media, so it appeared, the dedupe.
We think disk can be more available than tape, and it’s more suitable for random read and write I/O, and there is no need to mount tape and find record position. All of physical operations can be replace by disk header’s moving.
It’s funny.
Dedupe depends on the safety of the first backup data, if it lost for any reason, the hazard can not be evaluated.
To make sure the first data is available at any time, let’s look at what we can do.
First, we using RAID, most is RAID5, even RAID6. With hot spare disk, we can prevent data lost under two disks broken.
Second, Volume Mirror.
Third, data file can be splitted to blocks and stored in distribution storage, maybe it like the concept of cloud storage, which is prompted by EMC now.
Forth, multicopy over network.
Fifth, tape backup. Haaaaaaaaaaaaaaaaaaaaa, we return to the start point. Under most circumstance tape-backup is not the best choice comparing with disk-backup, but no one can discard it. If using this method, we assume disk dedupe will not lose data in most situation, if lose, whether it can be recovered, depends on our fortune or planning. If you have a careful design of tape-backup planning, such as shedule testing and recovery, dual copies in different sites, tape-backup is reliable. So why we still use dedupe?
Dedupe, using time to change space. We have powerful calculation, but limit disk space(more expensive than tape). To save space must consume CPU, alse using more CPU in recovery.
Another question, how long the method of dedupe can exist. Tape can be preserved in tens of year, but dedupe device? The deduped data must be re-stored or re-backuped in tapes at last, untill we found other mothed to replace tape.
by IT1999
at 1:00am, Feb. 19, 2009