Automatic backup plan for Linux servers using rsync and crontab

Linux servers are widely used by lots of companies for hosting their websites, databases, or other services.  I personally likes Linux system much more than Windows Server, not only because Linux is free, but you get better performance and more powerful tools on Linux.  To set up a Linux server, sometime you don’t actually need a physical server,  subscribing a VPS from Linode, DigitalOcean or Amazon could be a better choice.  Though most cloud server providers have  backup or snapshot services (usually not free), it’s better to have your own backup plan.

The rsync command is a ideal tool for copying and synchronizing files and directories to a remote computer, while the crontab command is used to schedule jobs to be executed periodically.  Combinding these two commands, we can setup a light-weight and effective backup solution.

Due to the difference of Linux distributions, in this article, we use CentOS/Redhat system as an example to introduce how to setup up the backup plan.  However, before that, here are several questions to think about:

First,  what data should be backed up?

Generally, we only need to backup data important to us, like website pages, databases, configuration files and personal data.  It is generally not necessary to backup data such as Linux system files, installed software.

Here  are several directories that need to be taken care of:

  • /etc directory:  Though some files in this directory don’t need to backup, I wouldn’t bother to pick them out.  And since the total size of this directory is usually no more than 50 Megabyte, it would not hurt to back up the whole directory.
  • /home directory: This is the location for personal user data of all accounts (except root),  and obviously, the backup plan should cover this directory.  However, there is a problem: There are lots of cache data, log files, download software, or history records located in this directory. It is just meaningless to backup these data.  Rather than backing up the whole /home directory,  putting only specific sub-directories, such as /home/someone/gitdata, /home/anotherone/documents into your backup list would be a better choice.
  • /var/www directory:  This is the default directory for website files. (If your web files are located in other directories, find them out and put into your backup list ).
  • /var/spool/mail:  This is where the mail data is located, and definitely should be  backed up.
  • /var/lib/mysql:  This is the directory for holding the database data.  Include this directory in your backup list.

You may have other utilities or service data scattered in other directories which need to be backed up,  think carefully and find them out before you take action.

Second, fully backup or incremental backup?

If you want to fully backup all data listed above every time, you can compose a bash script to archive all files using Linux tar command and then send (scp) the tarball to the backup location. This method works well for backup in a local area network, but might not feasible for backing up data from a remote VPS to a local computer because hundreds or even thousands of megabytes are transferred between the two remote computers each time. It is a waste of bandwidth and disk storage.

Incremental backup, which employs the rsync utility in Linux, backs up only modified data each time. For most cases, this is a right choice due to its efficiency and cost-saving.

Where to store the backup data?

Generally, backup data should be stored on a remote computer, either another Linux VPS, or a Linux computer inside your company.

How to schedule an automatic backup plan?

Cron job is the best choice to schedule command to be executed periodically, for example, a backup script can be scheduled at midnight each day.

The following are the detailed instructions for making an automatic and incremental backup:

Host A: the active server with CentOS/Redhat system.

Host B: the backup service with CentOS/Redhat system.

I.)  Make sure rsync is installed on both Host A and Host B.  If it is not installed, install it using command:

yum install rsync.

II.) Login to host B using root account (Crontab requires a root user permission, though theoretically it can be done using a non-root account, but not easy) ,  and create a directory to hold the backup data:

mkdir -p  /var/ServerBackup

III.) Generate the SSH key pair :

ssh-keygen

Two files are generated in the /root/.ssh directory: id_rsa is a private key file, while id_rsa.pub is a public key file, which must to be copied to Host A:

scp /root/.ssh/id_rsa.pub  root@A.com:/root/id_rsa.pub

IV.)  Login to host A using root account, and attach the content in id_rsa.pub file to the authorized_keys file.  If /root/.ssh/authorized_keys file doesn’t exist in host A, execute the following commands to create it first:

 mkdir -p /root/.ssh
 chmod 700 /root/.ssh
 touch /root/.ssh/authorized_keys
 chmod 600 /root/.ssh/authorized_keys

To attach the public key generated in host B to authorized_keys file:

cat /root/id_rsa.pub >> /root/.ssh/authorized_keys

Now we can use scp or rsync to transfer data from host A to host B without password required.

V.)  Modify /etc/crontab file to schedule the execution of backup script.  Add this line to the end of the crontab file:

0 2 * * * root bash backup.sh     # the script file backup.sh is scheduled to be executed every day at 2:00AM.

The content of the backup.sh script is something like this:

#!/bin/sh

/usr/bin/rsync -avz -e "ssh -i /root/.ssh/id_rsa.pub"  root@A.com:/etc  /var/ServerBackup
/usr/bin/rsync -avz -e "ssh -i /root/.ssh/id_rsa.pub"  --exclude mysite/updraft  --exclude mysite/.cache    root@A.com:/var/www   /var/ServerBackup
........  (other similar commands)
/usr/bin/rsync -avz -e "ssh -i /root/.ssh/id_rsa.pub"  root@A.com:/var/lib/mysql   /var/ServerBackup

This script is just a sample and you can modify it based on your need.  You can use Linux man pages to get more usage of rsync.

Now you can rest easy without worrying about the data loss.