I need a backup solution for the several machines at home. The standard idea is tape, but this need re-examining:
Yikes! Disks are cheaper than tape, that's even before you spend thousands on the tape drive.
But let's take a step back a bit. Why back up?
Disaster - Your current data (hardware failure/loss, data corruption)
Archival - Your old data (unnoticed accidental loss, compliance)
What do you need for a backup? Simply, to be able to recreate the original data. Backups can be compressed, incremental, or otherwise encoded. They're often not considered a online system, and recovery can take a bit of effort (tape systems can often take feeding tapes from several backups thru). Why can't I just fill a drive with copies of all my files? Better, only copies of what's changed (like Apple's Time Machine)? Even better, only unique files... or file parts?
I've tried Acronis True Image, and its incremental backup option. For my notebook, this creates an initial 40GB file, and then ~1GB incrementals every day after that. After a month, when I'm not interested in retaining the old files, I move the original large image and all its associated incrementals to another directory. Towards the end of a month, I've got both last month's and this month's initial & increment images. To hold 30 days of state, it consumes 2x(40GB+30x1GB)=140GB.
Duplicates? Windows File Protection (WFP) only gives me over 200MB of duplicate files on each XP machine. With multiple machines... why do I need to keep seven backup copies of "notepad.exe"?
Out of interest, I analysed my notebook & desktop. These are very different XP based machines, and don't share any applications. The notebook has all the standard productivity applications, and the desktop has the power hungry video, 3D modelling and gaming applications.
Notebook: 93% unique
Desktop: 94% unique
Combined: 87% unique
Copying unique files alone gives a reasonable saving, and between two very different machines gives a good saving as well. An interesting find, was that my notebook had 362 files (~1GB) which had the same 64KB at the start, middle and end. I manually inspected a couple of these files, and found 14MB files with only a couple of hundred bytes different 3/4 of the way thru. You'd also, find large similarities between aged versions of files, and it'd be nice to take advantage of that.
This is what I'd like: A disk based solution that can be fed files, find similarities and deduplicate, and then recreate the files when requested. It should also be able to delete no longer needed files from its store and reclaim space.
- Set Store
- Set Management: Manage backups of "sets".
- Timeline Recording: Files & directories have timelined events (creation/updates/removal)
- Blob Store
- Smart Chunking: Break blobs into variable length chunks based on data (rolling hash matches).
- Single Instance Chunk Storage: Store chunks referenced by size & hash.
- Single Instance Blob Storage: : Store blobs referenced by size & hash as references to multiple chunks.
- Compression: Compress chunks. Maybe 7z? (compressed: PPMd/BCJ2, non-compressed: LZMA/BCJ2).
- Cross Volume Storage: Use multiple store locations to support large stores.
- Open files: Backup open files using Volume Snapshot Service.
- Bypass ACLs: Backup all files using SeBackupPrivilege.
- Efficient change discovery: Read changes from NTFS Change Journal.
How hard could it be?