Automatic backup plan for Linux servers using rsync and crontab

Linux servers are widely used by lots of companies for hosting their websites, databases, or other services.  I personally likes Linux system much more than Windows Server, not only because Linux is free, but you get better performance and more powerful tools on Linux.  To set up a Linux server, sometime you don’t actually need a physical server,  subscribing a VPS from Linode, DigitalOcean or Amazon could be a better choice.  Though most cloud server providers have  backup or snapshot services (usually not free), it’s better to have your own backup plan.

The rsync command is a ideal tool for copying and synchronizing files and directories to a remote computer, while the crontab command is used to schedule jobs to be executed periodically.  Combinding these two commands, we can setup a light-weight and effective backup solution.

Due to the difference of Linux distributions, in this article, we use CentOS/Redhat system as an example to introduce how to setup up the backup plan.  However, before that, here are several questions to think about:

First,  what data should be backed up?

Generally, we only need to backup data important to us, like website pages, databases, configuration files and personal data.  It is generally not necessary to backup data such as Linux system files, installed software.

Here  are several directories that need to be taken care of:

  • /etc directory:  Though some files in this directory don’t need to backup, I wouldn’t bother to pick them out.  And since the total size of this directory is usually no more than 50 Megabyte, it would not hurt to back up the whole directory.
  • /home directory: This is the location for personal user data of all accounts (except root),  and obviously, the backup plan should cover this directory.  However, there is a problem: There are lots of cache data, log files, download software, or history records located in this directory. It is just meaningless to backup these data.  Rather than backing up the whole /home directory,  putting only specific sub-directories, such as /home/someone/gitdata, /home/anotherone/documents into your backup list would be a better choice.
  • /var/www directory:  This is the default directory for website files. (If your web files are located in other directories, find them out and put into your backup list ).
  • /var/spool/mail:  This is where the mail data is located, and definitely should be  backed up.
  • /var/lib/mysql:  This is the directory for holding the database data.  Include this directory in your backup list.

You may have other utilities or service data scattered in other directories which need to be backed up,  think carefully and find them out before you take action.

Second, fully backup or incremental backup?

If you want to fully backup all data listed above every time, you can compose a bash script to archive all files using Linux tar command and then send (scp) the tarball to the backup location. This method works well for backup in a local area network, but might not feasible for backing up data from a remote VPS to a local computer because hundreds or even thousands of megabytes are transferred between the two remote computers each time. It is a waste of bandwidth and disk storage.

Incremental backup, which employs the rsync utility in Linux, backs up only modified data each time. For most cases, this is a right choice due to its efficiency and cost-saving.

Where to store the backup data?

Generally, backup data should be stored on a remote computer, either another Linux VPS, or a Linux computer inside your company.

How to schedule an automatic backup plan?

Cron job is the best choice to schedule command to be executed periodically, for example, a backup script can be scheduled at midnight each day.

The following are the detailed instructions for making an automatic and incremental backup:

Host A: the active server with CentOS/Redhat system.

Host B: the backup service with CentOS/Redhat system.

I.)  Make sure rsync is installed on both Host A and Host B.  If it is not installed, install it using command:

yum install rsync.

II.) Login to host B using root account (Crontab requires a root user permission, though theoretically it can be done using a non-root account, but not easy) ,  and create a directory to hold the backup data:

mkdir -p  /var/ServerBackup

III.) Generate the SSH key pair :

ssh-keygen

Two files are generated in the /root/.ssh directory: id_rsa is a private key file, while id_rsa.pub is a public key file, which must to be copied to Host A:

scp /root/.ssh/id_rsa.pub  [email protected]:/root/id_rsa.pub

IV.)  Login to host A using root account, and attach the content in id_rsa.pub file to the authorized_keys file.  If /root/.ssh/authorized_keys file doesn’t exist in host A, execute the following commands to create it first:

 mkdir -p /root/.ssh
 chmod 700 /root/.ssh
 touch /root/.ssh/authorized_keys
 chmod 600 /root/.ssh/authorized_keys

To attach the public key generated in host B to authorized_keys file:

cat /root/id_rsa.pub >> /root/.ssh/authorized_keys

Now we can use scp or rsync to transfer data from host A to host B without password required.

V.)  Modify /etc/crontab file to schedule the execution of backup script.  Add this line to the end of the crontab file:

0 2 * * * root bash backup.sh     # the script file backup.sh is scheduled to be executed every day at 2:00AM.

The content of the backup.sh script is something like this:

#!/bin/sh

/usr/bin/rsync -avz -e "ssh -i /root/.ssh/id_rsa.pub"  [email protected]:/etc  /var/ServerBackup
/usr/bin/rsync -avz -e "ssh -i /root/.ssh/id_rsa.pub"  --exclude mysite/updraft  --exclude mysite/.cache    [email protected]:/var/www   /var/ServerBackup
........  (other similar commands)
/usr/bin/rsync -avz -e "ssh -i /root/.ssh/id_rsa.pub"  [email protected]:/var/lib/mysql   /var/ServerBackup

This script is just a sample and you can modify it based on your need.  You can use Linux man pages to get more usage of rsync.

Now you can rest easy without worrying about the data loss.

Automatic backup plan for Linux servers using rsync and crontab

19 thoughts on “Automatic backup plan for Linux servers using rsync and crontab

  1. I’tested the command and it requires me to “Enter passphrase for key”
    Execute the bash script is on the server B , right?

    1. Yes, the script runs on Server B.
      @Tom, make sure the permission of /root/.ssh in A is 700 and authorized_key’s permission is 600. Note: 777 or 666 will not work, It’s kind of strange and ridiculous.

  2. I have to backup my data from linux database server to Remote Storage space.

    so for that what can i use the command with rsync.

    please tell me anybody knows.

    1. The rsync solution should work well for you. A is your database server, and B is the Remote Storage Host.

    2. You need an msqldump instruction added to your bash script, and on the same script you make de rsync job. Then you add on crontab

      Done

  3. it’s all good, but how can I add a layer of security here ? I need the backup server doesnt see what my data are. How can I encrypt all the data I send to the server B with this script ?

  4. First of all I want to thank Amazon for its wonderful delivery. Amazon delivered one day before they promised. The Installation of this product is very simple and wireless range is excellent. I like this product. It is worthful at this price

  5. Hi Jim Zhai,
    Thank you so much for this short but to the point tutorial.

    The only issue I had was in the backup.sh file here: “ssh -i /root/.ssh/id_rsa.pub” , the “id_rsa.pub” is the key normally in the authorizeded keys file in Server recieving the data, and the id_rsa used by Server A (sending the data), needs to be in this line as “ssh -i /root/.ssh/id_rsa”. Not sure if it was a typo, but I was not able to send data across until I made that change. But I learnt a great deal from this tutorial.

    Good job!!

    Jibagast

  6. I have tried to follow the instruction, but it still requires Host’s A password. Please kindly advise clearly. Appreciate the advice.

  7. I thanks the author for such a good tutorial.
    however, it is not clear at the end. I am confused which machine is used to generate key and which machine is used to receive the public key. It will much clearer if you can specify these details.
    What’s more, it is kind of wired that you run the cronjob on backup server. My instinct is to run the cronjob on local machine, to ssh remote sever without using a password from local machine.

  8. Thanks for this tutorial, I’m new to Linux and have been ask to manager storage and backup servers.
    The data comes in to OEM windows16 storage server with VMware linux virtual drive of 6OTB and I’ve been ask to backup data for 3months.
    So I plan to have 2 140TB Linux servers with raid 6 mirrored as suggested by our IT.
    My question is how do I write a bash script to mirror the incoming data onto both backup servers?
    1. Do I need to run a separate script from each backup server to rsync the incoming data?
    2. Can you have 2 jobs running the rscyn, moving the data across to each backup server at the same time?
    3. Will this slow down the Data coming onto the storage server from the source, if the rsyn scripts are running at the same time?

    The data coming into the storage server is from an electron microscope and cannot hang or be queued as it will put the automation acquisition software into error. We typically collect data for 24 hours and this can be 5days a week, so no available time to rsync after data collection. So the syncing of data across for backup must not impact the collection.

    Thanks in advance for your free time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to top