It’s all under (version) control…

After a lot of looking around, I have currently settled on Subversion as a core part of my backup strategy. For the most part it is going to operate like this:

important “controlled” data <-> repository server <-> versioning “hot” backup server <-> offline storage

important “non controlled” data <-> repository server <-> versioning “hot” backup server <-> offline storage

This is true for >all< the important data. There are really two different types of versioning servers in my worldview and each have their place.

repository servers: these are the traditional “source code control” servers. In a repository server when you “check in” or “commit” a change you are asked to supply a comment or explanation of the change. It handles grouping multiple files at a particular point in time into a “release” and so on. This allows me to batch up all the files form a web site or programming project and say “this is release 1.0” and so on. Very useful stuff for the right kind of data. Currently I am using Subversion for this.
versioning hot backup servers: these are systems that monitor activity for specific files, folders and make backups of those changes in close to real time. No specific action is necessary to “check in” a file, as soon as you save it it is backed up. You can “roll back” to any previous version fo the file you might need. At the moment I am probably going to wait for “Microsoft Data Protection Manager” to handle this part.

The important difference is that repositories know about your data’s logical history and organization. They understand what files go together, can show you the commented history of a file and do a good job of merging changes made by multiple people. Versioned backups don’t always have that.

For me this means that data that is part of an organized, incremental development process (websites, software, artwork, etc.) where you might want to gather and put out “releases” or have multiple people working on them at the same time should go in repositories. Data that is not part of a formalized incremental development process (email data files, the “My Documents” folder, even the repositories themselves) should just go straight to the hot backups.

So far, stage one complete. I have my repositories in place and I am also using them as a limited “hot backup” system. For example my email files are in a “soulhuntre” repository, and I check them in ever day or so. Eventually I will take the next step.