Protecting Your Data

February 15th, 2011

Protecting your data is critical and unfortunately it is often something you don’t adequately consider until it’s too late. Fortunately there are many great solutions available in the marketplace, both commercially available and free that can assist you in consistently securing your data before it is too late.

Open Source Back Ups Using Dirvish

First of all, I’m a Linux fan and so the solution I will be exploring in this article revolves around a particular open source solution that I have deployed numerous times for a broad range of clients and that has served me quite well.

When first considering alternatives and then selecting a back up solution, I always evaluate a few important aspects:

  1. Is the solution attended or unattended? We all know that time is money and if you’re spending too much time per week on managing your back ups then there’s probably a better solution available that should be identified as soon as possible.
  2. What about storage requirements? Some solutions are very inefficient as to how they utilize available disk space. Evaluate this aspect thoroughly to ensure the efficient use of valuable disk space. You should not be required to throw an excessive amount of disk space at a good back up solution.
  3. How easy is it to find and restore data in a crunch? Let’s say you have a great unattended back up solution, but it takes you a day to dig through the information to find the data you need. This is not a good back up solution. Time is money. Deadlines are crucial. Back up solutions should make search and retrieval functions quick and accurate.

When evaluating these critical aspects, one of the back up solutions I found that seems to meet all of my specified needs is the open source solution called Dirvish. This back up solution is very easy to learn and to use, and it is a very quick, very efficient and priced right – Free!

How does Dirvish stack up to my requirements specified earlier? Let’s take a look.

  1. Dirvish is an unattended solution. I like that aspect. Once you’ve configured your Dirvish “vaults” you can literally sit back and let it “do it’s thing” every night. The only thing you’ll want to do is to ensure periodically that the back ups are still running and that your disk space isn’t becoming an issue.
  2. Dirvish meets my storage requirements. This back up solution makes use of hard links which allows Dirvish to consume disk space efficiently. Dirvish will make a complete back up on day one; on day two it will analyze files for changes and only back up data that is needed to align the back up files with your original files. The rest of that back up set will consist of pointers to the original file. This system of hard links allows you to retain the data completely without creating full copies each day. It is a more efficient use of valuable disk space.
  3. Dirvish makes it easy for me to restore data in a crunch. This back up solution does not compress the back up files and because they are mostly hard links, it is a simple matter of changing into the directory matching the date that you wish to restore from, and copying the files. I recommend storing all critical system files. This ensures that in the event of a total server failure, you can reinstall Debian and get back up and running in short order.

Planning and Deploying Dirvish

It’s always important to carefully identify what data is critical and requires back up. Make sure that whatever it is you’re backing up will be beneficial in an emergency restore situation. Be careful not to miss any vital data.

In the Linux environment, one of the very first things I choose to back up – and what I will be using for the following example – is the Unix/Linux “etc” folder, because this is where most of the critical configuration files exist in this environment.

Setting up your Bank and Vaults

Dirvish stores back ups in Vaults and the Vaults are stored within Banks. In this example, we will just be creating one bank and one vault within that bank to store our “etc” back up. Our bank will be “/backups” which points to an external USB back up device. Inside the folder “/backups”, we will create a folder called “etc”.

#mkdir /backups/etc

Now that we have our folder structure set up, we are going to need to edit the “/etc/dirvish/master.conf” file to let the back up system know about our “bank” and the “vault” we have created within that bank. Here is the configuration file as I have defined in my example set up.

# Banks

# Vaults

# Expire more than 5 days old
expire-default: +5 days

This configuration tells Dirvish where the bank vaults are located and it also tells the system to keep 5 days worth of changes, which is sufficient in most cases

The second configuration file is located in “/backups/etc/dirvish/default.conf” and its purpose is to let Dirvish know where the source files that you are backing up are located, how the back up folders will be named and if you wish to exclude anything from the back ups. Please note that there is much more you can do here but for the sake of this example, this is all we’ll be discussing for now.

The default.conf I have created is basic; it tells Dirvish to back up the systems “/etc/” folder, to create the back up folders in YYYY-MM-DD format and to exclude any files ending in a tilde character. Please note that typically this is an emacs editor back up file so these things are all over the place on some systems. The configuration file looks like this:

client: myServer
tree: /etc
xdev: 0
index: gzip
log: gzip
image-default: %Y-%m-%d

For a line by line explanation of this example, I offer the following:

  • Client – This is the name of the client machine we are backing up
  • Tree – This is the full path to the source files
  • xdev – Please see the man page, but in the event of an external drive and the fact it’s a different filesystem, this would be set to 0 ( or false )
  • index: gzip – This tells Dirvish to compress it’s index, which is useful
  • log: gizp – This tells Dirvish to compress it’s logs, which is also very useful
  • image-default – This specifies the folder naming conventions
  • Exclude – This is a pattern for files we wish to have excluded from our back ups
  • Initializing your Bank Vault

    Now that we have configured our bank and our vault, we have to initialize so that the initial back up can be started. This is important because without completing this step, the scheduler will not process our back up every night as it should.

    #dirvish –vault etc –init

    Once you execute this command, it will immediately begin copying the contents of “/etc/” to the back up folder you created. Once the command completes, you should see a new folder called “tree” in “/backups/etc” and inside that folder should be another folder with today’s date on it. This folder will contain the files found in “/etc”.

    If all goes well, when you check tomorrow, you should find another folder under “tree” that contains the files for that day, and so on. These files can be restored by simply copying!

    Lonnie Waugh is a web developer/programmer for Bitstorm Web, a division of TDH Marketing, Inc., a Dayton, Ohio-based marketing firm with alliances and joint ventures in the U.S., England and Singapore that support its global client base. TDH Marketing provides business development, strategic, marketing and operational planning and implementation for large, mid-size and small corporations looking to develop profitable, technology-driven business growth. Bitstorm Web offers award-winning custom website design, custom apps and outstanding flash animated sequences to attract and impress visitors. The division also provides 2D and 3D illustration, CAD visualization and digital animation used to visually explain complex engineered products and processes, entertain consumers or train employees and customers for greater retention.