I’ve been wanting a way to easily recover a file that is accidentally deleted from one of our websites, either by us or by a client. Also, it would be useful to be able to get back to the state your code was in X number of days ago. For example when the client changes his mind about the current direction you’ve been developing. Source control can offer a solution to some degree, but won’t help you if the client has access to the website and they’ve changed a file. And some shops just don’t use source control for all their projects.
Tape backups also offer a partial solution, I’ve had to pull a file off yesterday’s tape several times. But restoring from tape is a hassle, especially if its stored off site (which it should be!).
Enter rsnapshot. Rsnapshot is a perl script that uses rsync to take snapshots of any set of files you want. Rsnapshot only runs on Linux, but you can easily backup any machine running any OS with it.
It would be fairly easy to code a bash script or windows batch file to copy ‘snapshots’ of your souce code to a backup area. Rsnapshot does more than this – it only backs up files that have changed, while still offering you an interface where you can see all the souce code as it was yesterday, the day before, etc. This results in a very easy way to pull up old files, without taking up a lot of disk space. I’ve looked at other tools that only do incremental backups to save disk space. To use them you must use their special tools to pull the files back out, since all the incremental bits need to be put back together.
Rsnapshot solves this problem by using a feature of the Linux filesystem called hard links. Hard links are kind of like a Windows shortcut, but the ‘shortcut’ appears to be the actual file in every way. The first time a snapshot is taken, all your files are copied to the daily.0 (or weekly.0 if you are only doing weekly, etc.) directory. The second snapshot only copies changed files into the daily.1 directory. For the files that have not changed, rsnapshot creates hard links to the actual files in the daily.0. So when you browse through the daily.1 directory, it appears that all your files are there, even though that directory may only contain a few kb of data, depending on how many files changed.
You can configure rsnapshot to backup on a very flexible schedule. I have mine setup to backup once a day for 7 days, once a week for 4 weeks, and once a month for 3 months. That is, I can look back for 7 days, then after that I only have snapshots of every week for 4 weeks, then after that I only have a snapshot for every month, going back 3 months. Some people like to also backup hourly for 8 hours. Here’s what my backup area looks like:
Here is how you implement it:
- Install rsync on your linux box if you don’t already. Chances are its already on there. Type ‘rsync’ at a shell to see if its installed. If its not, look to your OS documentation on how to install it. On a RedHat or similar os use “yum install rsync”.
- Install rsnapshot. You can download it from rsnapshot.org. I downloaded the RPM file from there and then installed it with “yum localinstall rsnapshot-1.3.0-1.noarch.rpm”.
- Install the rsync daemon on the windows machines you want to backup. The windows port of rsync requires Cygwin to run. Cygwin is a windows dll that provides a lot of linux functionality on a windows platform. Many linux-ported-to-windows applications require this. Handily, there is version of rsync for windows that bundles the necessary Cygwin stuff with an rsync implementation – cwRsync. Go to the the cwRsync website and download the cwRsync Server. Its a straight forward windows installer. All my backup transfers are done inside our secure network, so I did not install the OpenSSH part of cwRsync. If want to do snapshots across the internet you should install that component and setup keys so your transfers will be encrypted.
- Start the cwRsync service – go to the services applet in your control panel (which is located in “Administrative Tools”), and start the cwRsync service. Set it to Automatic so it will startup upon boot.
In the Start menu on your Windows machine you’ll find an entry for “cwRsync Server”, and in there is a shortcut to the rsyncd.conf file. In here you need to setup what files are allowed to be accessed through rsync. Rsync calls a group of accessible files a ‘module’. My module is called websites:
path = /cygdrive/d/websites/
read only = true
transfer logging = yes
That shares out the D:\websites area. Notice the cygwin naming convention for accessing your drives. You’ll need to restart the cwRsync service after changing this file.
To test your windows rsync setup, run this command on your linux server: “rsync mywindowsserver.mydomain.com::”. This should list the available modules on that machine:
[root@web-dev3 ~]# rsync cf7dev.cfwebtools.com::
Next configure the /etc/rsnapshot.conf file on your linux server. Here’s a tip, parameters in this config file must be separated by tabs. This allows you to easily specify spaces in your file paths.
The first thing I changed was the snapshot_root directive, I pointed this to the area on my linux server where I wanted the snapshots to be stored. I put them in an area where we have a samba mapping so any developer can easily browse the snapshots.
Then scroll down to the BACKUP INTERVALS section. Define here the resolution of your snapshots. Here’s what I have:
interval daily 7
interval weekly 4
interval monthly 2
Note that they must be in order of most often to least often, because of the way the hard linking works.
Then in the BACKUP POINTS / SCRIPTS section, define what you want backed up. Here’s my entry for the windows server mentioned above:
backup cf7dev.cfwebtools.com::websites cf7dev
The second parameter is the server and the module, the third parameter is the directory to place the snapshot. This is under the snapshot_root declared earlier.
The last step is to add rsnapshot to cron. You need to call rsnapshot every day with the daily parameter, every week with the weekly parameter, etc. So every day you should run “rsnapshot daily”, every week you should run “rsnapshot weeky”, etc. I just added these lines to my cron.daily, cron.weekly, and cron.monthly files. To test my configuration I just manually ran “rsnapshot weekly”.
Thats it! It takes less than an hour to setup and you’ll have easily accessible snapshots to refer back to when something goes wrong.