Snapshot backups

I’ve been wanting a way to easily recover a file that is accidentally deleted from one of our websites, either by us or by a client. Also, it would be useful to be able to get back to the state your code was in X number of days ago. For example when the client changes his mind about the current direction you’ve been developing. Source control can offer a solution to some degree, but won’t help you if the client has access to the website and they’ve changed a file. And some shops just don’t use source control for all their projects.

Tape backups also offer a partial solution, I’ve had to pull a file off yesterday’s tape several times. But restoring from tape is a hassle, especially if its stored off site (which it should be!).

Enter rsnapshot. Rsnapshot is a perl script that uses rsync to take snapshots of any set of files you want. Rsnapshot only runs on Linux, but you can easily backup any machine running any OS with it.

It would be fairly easy to code a bash script or windows batch file to copy ’snapshots’ of your souce code to a backup area. Rsnapshot does more than this - it only backs up files that have changed, while still offering you an interface where you can see all the souce code as it was yesterday, the day before, etc. This results in a very easy way to pull up old files, without taking up a lot of disk space. I’ve looked at other tools that only do incremental backups to save disk space. To use them you must use their special tools to pull the files back out, since all the incremental bits need to be put back together.

Rsnapshot solves this problem by using a feature of the Linux filesystem called hard links. Hard links are kind of like a Windows shortcut, but the ’shortcut’ appears to be the actual file in every way. The first time a snapshot is taken, all your files are copied to the daily.0 (or weekly.0 if you are only doing weekly, etc.) directory. The second snapshot only copies changed files into the daily.1 directory. For the files that have not changed, rsnapshot creates hard links to the actual files in the daily.0. So when you browse through the daily.1 directory, it appears that all your files are there, even though that directory may only contain a few kb of data, depending on how many files changed.

You can configure rsnapshot to backup on a very flexible schedule. I have mine setup to backup once a day for 7 days, once a week for 4 weeks, and once a month for 3 months. That is, I can look back for 7 days, then after that I only have snapshots of every week for 4 weeks, then after that I only have a snapshot for every month, going back 3 months. Some people like to also backup hourly for 8 hours. Here’s what my backup area looks like:

rsnapshot sceenshot of backup area

Here is how you implement it:

Installation

  • Install rsync on your linux box if you don’t already. Chances are its already on there. Type ‘rsync’ at a shell to see if its installed. If its not, look to your OS documentation on how to install it. On a RedHat or similar os use “yum install rsync”.
  • Install rsnapshot. You can download it from rsnapshot.org. I downloaded the RPM file from there and then installed it with “yum localinstall rsnapshot-1.3.0-1.noarch.rpm”.
  • Install the rsync daemon on the windows machines you want to backup. The windows port of rsync requires Cygwin to run. Cygwin is a windows dll that provides a lot of linux functionality on a windows platform. Many linux-ported-to-windows applications require this. Handily, there is version of rsync for windows that bundles the necessary Cygwin stuff with an rsync implementation - cwRsync. Go to the the cwRsync website and download the cwRsync Server. Its a straight forward windows installer. All my backup transfers are done inside our secure network, so I did not install the OpenSSH part of cwRsync. If want to do snapshots across the internet you should install that component and setup keys so your transfers will be encrypted.
  • Start the cwRsync service - go to the services applet in your control panel, and start the cwRsync service. Set it to Automatic so it will startup upon boot.

Configuration

In the Start menu on your Windows machine you’ll find an entry for “cwRsync Server”, and in there is a shortcut to the rsyncd.conf file. In here you need to setup what files are allowed to be accessed through rsync. Rsync calls a group of accessible files a ‘module’. My module is called websites:

[websites]
path = /cygdrive/d/websites/
read only = true
transfer logging = yes

That shares out the D:\websites area. Notice the cygwin naming convention for accessing your drives. You’ll need to restart the cwRsync service after changing this file.

To test your windows rsync setup, run this command on your linux server: “rsync mywindowsserver.mydomain.com::”. This should list the available modules on that machine:

[root@web-dev3 ~]# rsync cf7dev.cfwebtools.com::
websites
[root@web-dev3 ~]#

Next configure the /etc/rsnapshot.conf file on your linux server. Here’s a tip, parameters in this config file must be separated by tabs. This allows you to easily specify spaces in your file paths.

The first thing I changed was the snapshot_root directive, I pointed this to the area on my linux server where I wanted the snapshots to be stored. I put them in an area where we have a samba mapping so any developer can easily browse the snapshots.

Then scroll down to the BACKUP INTERVALS section. Define here the resolution of your snapshots. Here’s what I have:
interval daily 7
interval weekly 4
interval monthly 2

Note that they must be in order of most often to least often, because of the way the hard linking works.

Then in the BACKUP POINTS / SCRIPTS section, define what you want backed up. Here’s my entry for the windows server mentioned above:
backup cf7dev.cfwebtools.com::websites cf7dev

The second parameter is the server and the module, the third parameter is the directory to place the snapshot. This is under the snapshot_root declared earlier.

The last step is to add rsnapshot to cron. You need to call rsnapshot every day with the daily parameter, every week with the weekly parameter, etc. So every day you should run “rsnapshot daily”, every week you should run “rsnapshot weeky”, etc. I just added these lines to my cron.daily, cron.weekly, and cron.monthly files. To test my configuration I just manually ran “rsnapshot weekly”.

Thats it! It takes less than an hour to setup and you’ll have easily accessible snapshots to refer back to when something goes wrong.

11 Responses to “Snapshot backups”

  1. Cory Says:

    Nicely written article. I came across this page by your comment on my blog, also about backing up computers with rsnapshot.

  2. Kyle Piper Says:

    Very nice, when I move to Ubuntu Linux, Ill keep this in mind

  3. Sexy Bern Says:

    To be fair to rsnapshot virgins, the following should be noted.

    rsnapshot is not magic. It’s a very well-structured wrapper around “cp -al” and “rsync”.

    rsnapshot uses Linux hard links. If you ever edit any of the files in the “rsnapshot tree”, you will hose all other links to it. Treat your rsnapshot tree as read-only, warts’n'all.

    rsnapshot uses rsync. rsync doesn’t copy changes as such, it synchronises trees. If you delete a load of files in the “source tree” then the corresponding files will be deleted in the “rsnapshot tree”. This won’t affect previous snapshots, only the one that’s in progress.

    If you move or rename a directory in the “source tree” you will break “true” synchronisation at the same point in the “rsnapshot tree” - rsync will delete the sub-tree under the old name and create a new tree under the new name (since you can’t hard link directories). eg.

    foo/bar/(5 gigs of data) -> foo/wibble/(5 gigs of data)

    Here, “bar” was renamed to “wibble” and you lose the benefit of 5 gigs of hard links.

    You can avoid this problem if you know in advance that it’s going to take place. Go into your most-frequently-created rsnapshot (eg. daily.0) and do the corresponding “mv bar wibble” before rsnapshot runs. Nothing will be broken as the “wibble” directory now exists in both places and rsync won’t go through the delete/create phase.

    I’ve used rsnapshot to synchronise trees with literally MILLIONS of files in them. It needs RAM but it works a treat.

  4. David Cantrell Says:

    Some of the most common questions people have about rsnapshot are about how to back up Windows machines, so thanks - I’ll link to you from the website shortly :-)

    One point needs clarification though - rsnapshot doesn’t only run on Linux. We aim to support any operating system and filesystem that supports hard links. That includes *BSD, Solaris, Irix, AIX, HPUX and others.

  5. David Keegel Says:

    It is also a good idea to add “hosts allow = …” in rsyncd.conf on the windows machine to restrict the IP addresses which can connect to the rsync server. And possibly also to add “auth users = …” (and probably “secrets file = …”) if you would like user/password authentication. By default rsync server allows anonymous access from anywhere.

    Or for advanced users, you could set rsyncd.conf to have “hosts allow = 127.0.0.1″ and access the rsync server through an ssh tunnel (eg from linux run ssh -L 873:localhost:873 cf7dev and then have rsnapshot do a backup of localhost::websites to cf7dev .) That would also mean the network traffic would be encrypted.

  6. LD Says:

    Davdi Cantrell,

    NTFS does support hard links and junctions (like soft links). In theory it should be possible to perform a similar function on windows that you do with *nix systems.

  7. Matthew Says:

    This solution works great, however is there anyway you can initiate this process from the client being backed-up, instead of vice-a-versa? In my scenario the clients being backed up are behind routers with dynamic ips - hence I do not have a StaticIP to point the host at, nor the ability to always configure the router appropriately.

    ?

  8. Ryan Stille Says:

    Matthew - I don’t think so. You can initiate rsync from the client, which is what rsnapshot uses… you may be able to hack it together.

    One option would be to rsync data from the client into a temporary place on your rsync backup server. Then use rsnapshot (ran from the backup server) to snapshot that data, then delete it.

  9. David Keegel Says:

    If you are prepared to do some extra work and know what you’re doing, the link below describes a way you can do “push” backups with rsnapshot (in a manner of speaking) :-

    http://lists.samba.org/archive/rsync/2007-December/019470.html

    I haven’t tried it myself.

  10. Terry Barnum Says:

    To expand on David Cantrell’s statement that rsnapshot doesn’t only run on Linux, it’s been working great for us on Mac OS X (10.3, 10.4 & 10.5).

  11. Rax Says:

    Rsnapshot now seems to be available for Linux as well.. though only in RPM (for the moment), Debian appears to be “Coming soon”

Leave a Reply