Automatic backup of users’ files on a NAS device to an external USB HDD

One of my Linux machines is a 4-bay server that performs various roles, one of which is as NAS (network-attached storage) for family and visitors’ devices connected to my home network. I had configured each pair of HDDs in a RAID 1 array in order to provide some internal redundancy, but I was nervous about not having an external backup for users’ shares. Therefore I recently purchased a 6TB external USB 3.0 HDD (Western Digital Elements Desktop WDBWLG0060HBK-EESN) to connect permanently to one of the server’s USB 3.0 ports for backup purposes. I created a Bash script ~/backup_to_usbhdd.sh to perform the backup, plus a cron job to launch it automatically at 05:01 daily:

user $ sudo crontab -e
user $ sudo crontab -l | grep -v ^# | grep backup
01 05 * * * sudo /home/fitzcarraldo/backup_to_usbhdd.sh

The use of ‘sudo‘ in the crontab command may appear superfluous because the cron job was created for the root user (i.e. by using ‘sudo crontab -e‘ rather than ‘crontab -e‘). However, this is done to make cron use the root user’s environment rather than the minimal set of environment variables cron would otherwise use [1].

#!/bin/bash
#
# This script backs up to an external USB HDD (NTFS) labelled "Elements" the contents
# of the directories /nas/shares/ on my server.
# It can be launched from the server either manually using sudo or as a root-user cron
# job (Use 'sudo crontab -e' to configure the job).
#
# Clean up if the backup did not complete last time:
umount /media/usbhdd 2>/dev/null
rm -rf /media/usbhdd/*
# Unmount the external USB HDD if mounted by udisks2 with the logged-in username in the path:
umount /media/*/Elements 2>/dev/null
# Find out the USB HDD device:
DEVICE=$( blkid | grep "Elements" | cut -d ":" -f1 )
# Create a suitable mount point if it does not already exist, and mount the device on it:
mkdir /media/usbhdd 2>/dev/null
mount -t ntfs-3g $DEVICE /media/usbhdd 2>/dev/null
sleep 10s
# Create the backup directories on the USB HDD if they do not already exist:
mkdir -p /media/usbhdd/nas 2>/dev/null
# Backup recursively the directories and add a time-stamped summary to the log file:
echo "********** Backing up nas shares directory **********" >> /home/fitzcarraldo/backup_to_usbhdd.log
date >> /home/fitzcarraldo/backup_to_usbhdd.log
# Need to use rsync rather than cp, so that can rate-limit the copying to the USB HDD:
rsync --recursive --times --perms --links --compress --bwlimit=22500 /nas/shares /media/usbhdd/nas/ 2>> /home/fitzcarraldo/backup_to_usbhdd.log
# No --delete option is used, so that any backed-up files deleted on the server are not deleted from the USB HDD.
echo "Copying completed" >> /home/fitzcarraldo/backup_to_usbhdd.log
date >> /home/fitzcarraldo/backup_to_usbhdd.log
df -h | grep Filesystem >> /home/fitzcarraldo/backup_to_usbhdd.log
df -h | grep usbhdd >> /home/fitzcarraldo/backup_to_usbhdd.log
echo "********** Backup completed **********" >> /home/fitzcarraldo/backup_to_usbhdd.log
cp /home/fitzcarraldo/backup_to_usbhdd.log /media/usbhdd/
# Unmount the USB HDD:
umount /media/usbhdd
exit 0

The initial version of the above script used ‘cp‘ rather than ‘rsync‘, which worked fine when I launched the script manually:

user $ sudo ./backup_to_usbhdd.sh

However, the script always failed when launched as a cron job. In this case the command ‘df -h‘ showed the root directory on the server was ‘100% used’ (full). Also, the mount point directory /media/usbhdd/ had not been unmounted. The log file had twenty or so lines similar to the following, indicating the script had failed due to the root filesystem becoming full:

cp: failed to extend ‘/media/usbhdd/nas/user1/Videos/20130822_101433.mp4’: No space left on device

Apparently data was being read from the server’s HDD into the RAM buffer/cache faster than it could be written to the external HDD. The bottleneck in this case is not USB 3.0, but the USB HDD itself. The specifications for the USB HDD do not mention drive write speed, but a quick search of the Web indicated that an external USB HDD might have a write speed of around 25 to 30 MBps (Megabytes per second). I do not know why the problem happened only when the script was launched as a cron job, but I clearly needed to throttle the rate of writing to the external HDD. Unfortunately the ‘cp‘ command does not have such an option, but the ‘rsync‘ command does:

--bwlimit=RATE          limit socket I/O bandwidth

where RATE is in KiB if no units are specified. I opted to use a rate of 22500 KiB to be safe, and it is not too far below the aforementioned 25 MBps. Indeed, using this limit the script runs to completion successfully when launched by cron:

user $ cat backup_to_usbhdd.log
********** Backing up nas shares directory **********
Thu Sep 13 05:01:26 BST 2018
Copying completed
Thu Sep 13 11:41:31 BST 2018
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf1       5.5T  386G  5.1T   7% /media/usbhdd
********** Backup completed **********
********** Backing up nas shares directory **********
Fri Sep 14 05:01:26 BST 2018
Copying completed
Fri Sep 14 05:20:08 BST 2018
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf1       5.5T  403G  5.1T   8% /media/usbhdd
********** Backup completed **********
********** Backing up nas shares directory **********
Sat Sep 15 05:01:26 BST 2018
Copying completed
Sat Sep 15 05:04:58 BST 2018
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf1       5.5T  404G  5.1T   8% /media/usbhdd
********** Backup completed **********
********** Backing up nas shares directory **********
Sun Sep 16 05:01:26 BST 2018
Copying completed
Sun Sep 16 05:15:14 BST 2018
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf1       5.5T  416G  5.1T   8% /media/usbhdd
********** Backup completed **********
********** Backing up nas shares directory **********
Mon Sep 17 05:01:26 BST 2018
Copying completed
Mon Sep 17 05:04:15 BST 2018
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdf1       5.5T  416G  5.1T   8% /media/usbhdd
********** Backup completed **********

Notice that the first job listed in the log file took much longer than subsequent jobs. This was because rsync had to copy every file to the external USB HDD. In subsequent runs it only had to copy new files and files that had changed since they were last copied.

The disk in the external USB HDD spins down after 10 minutes of inactivity and the drive goes into Power Saver Mode. Its LED blinks to indicate the drive is in this mode. Therefore the cron job only spins up and down the external HDD once per day.

Reference
1. Why does root cron job script need ‘sudo’ to run properly?

Advertisements

My personal backup strategy

There are plenty of tools to back up your personal files, and all sorts of backup strategies (such as performing a full backup once a week and incremental daily backups in between), but it seems that no two people have precisely the same approach, and there are many tutorials in books and on the Web on the subject of backing up in Linux. Anyway, here I explain my personal way of making backups, which fits my modest requirements and may be of interest to someone. If this post does nothing other than spur someone to backup their data, whatever the method, then I’ll be happy.

A recent round-up of eight GUI backup tools was published in the December 2010 issue of Linux Format magazine. It is also readable in the on-line article Best Linux backup software: 8 tools on test. Some of these GUI tools are front-ends for command line tools such as rsync and rdiff. You might want to evaluate those GUI tools, as they can make the backup process much easier for the desktop user.

Rather than backing up specific directories of files or individual files, some people prefer to create images of the whole hard disk or of specific partitions. Linux also offers some choices there, such as Clonezilla or Partimage (see, for example, the Sabayon Linux Wiki article HOWTO: Disk imaging using Partimage). I do make backup images of my partitions from time to time, but my personal files change too often for that method to be viable as my main backup strategy. And, to be honest, I find it rather inconvenient and time consuming.

GUI backup tools for Linux have improved significantly over the last three years. Some years ago my search came up blank for a Linux GUI tool having the simplicity and reliability of the shareware GUI tool I used with Windows XP. Things have certainly changed since then, but I continue to use the simple method I put together based on an rsync command given in Scott Granneman’s excellent Linux Phrasebook. Being paranoid, I eschewed encrypted backups, having had trouble recovering files from encrypted backups in the past. Anyway, without further ado, here is my backup strategy. I’m using KDE 4, but the principles apply whatever the desktop environment.

MY OBJECTIVES

This is my ‘functional specification’.

  1. Back up the contents of the Linux /home directory and the contents of a Thunderbird e-mail directory on a NTFS partition shared between Thunderbird for Windows and Thunderbird for Linux. I’m not interested in backing up files in the system directories such as /etc as I can reinstall if necessary. Any configuration files in /etc and the other system directories that I want to back up I simply copy to a directory in my /home directory, so that they are backed up with the other files in the /home directory.

  2. Have two separate external hard drives, and back up to them alternately.

  3. Copy any new files to the backup drive, copy any files that have changed since the previous backup to that drive and overwrite the corresponding files on the backup drive, and delete files from the backup drive if they were deleted from my laptop. This strategy is referred to as ‘incremental backup’, because it does not copy files that have not changed.

  4. Log the date and time of completion of each backup session in a text file.

  5. Launch a back up session manually by double-clicking on an icon on the Desktop. As I am often away from home with my laptop, I prefer to launch a back up session manually, rather than automatically using e.g. a cronjob.

HARDWARE

I have two external USB hard drives, each with 1 TB capacity, connected via a USB hub to my main laptop when I’m at home. Both are off-the-shelf NTFS drives. I gave them the volume names USBHDD01 and USBHDD02 (this can be done via Linux using, e.g. GParted, or via Windows).

SOFTWARE

I have two shell scripts backup1 and backup2 in my home directory:

~/backup1

#!/bin/bash
sudo rsync --verbose --progress --stats --recursive --times --perms --links --compress --delete /home/fitzcarraldo/ /media/USBHDD01/Backup_of_Mesh_DX_Linux_1

sudo rsync --verbose --progress --stats --recursive --times --perms --links --compress --delete "/media/Windows7/Documents and Settings/Fitzcarraldo/Application Data/Thunderbird/" /media/USBHDD01/Backup_of_Mesh_Thunderbird_1

echo "Mesh DX Linux back up 1" >> /home/fitzcarraldo/Desktop/Mesh_DX_Linux_back_up.log
date >> /home/fitzcarraldo/Desktop/Mesh_DX_Linux_back_up.log

echo "Backup completed"
date

~/backup2

#!/bin/bash
sudo rsync --verbose --progress --stats --recursive --times --perms --links --compress --delete /home/fitzcarraldo/ /media/USBHDD02/Backup_of_Mesh_DX_Linux_2

sudo rsync --verbose --progress --stats --recursive --times --perms --links --compress --delete "/media/Windows7/Documents and Settings/Fitzcarraldo/Application Data/Thunderbird/" /media/USBHDD02/Backup_of_Mesh_Thunderbird_2

echo "Mesh DX Linux back up 2" >> /home/fitzcarraldo/Desktop/Mesh_DX_Linux_back_up.log
date >> /home/fitzcarraldo/Desktop/Mesh_DX_Linux_back_up.log

echo "Backup completed"
date

I made them both executable:

$ chmod +x ~/backup1
$ chmod +x ~/backup2

I created two Desktop Configuration Files on my Desktop: Backup_1 backs up to USBHDD01, and Backup_2 backs up to USBHDD02.

~/Desktop/Backup_1

[Desktop Entry]
Comment[en_GB]=Backup /home directory to Seagate USB external HDD 01
Comment=Backup /home directory to Seagate USB external HDD 01
Encoding=UTF-8
Exec=sh /home/fitzcarraldo/backup1
GenericName[en_GB]=Backup /home to USBHDD01
GenericName=Backup /home to USBHDD01
Icon=/home/fitzcarraldo/Pictures/Icons/save_all.png
MimeType=
Name[en_GB]=Backup_1
Name=Backup_1
Path=
StartupNotify=true
Terminal=true
TerminalOptions=\s--noclose
Type=Application
X-DBUS-ServiceName=
X-DBUS-StartupType=none
X-DCOP-ServiceType=
X-KDE-SubstituteUID=false
X-KDE-Username=

~/Desktop/Backup_2

[Desktop Entry]
Comment[en_GB]=Backup /home directory to Seagate USB external HDD 02
Comment=Backup /home directory to Seagate USB external HDD 02
Encoding=UTF-8
Exec=sh /home/fitzcarraldo/backup2
GenericName[en_GB]=Backup /home to USBHDD02
GenericName=Backup /home to USBHDD02
Icon=/home/fitzcarraldo/Pictures/Icons/save_all.png
MimeType=
Name[en_GB]=Backup_2
Name=Backup_2
Path=
StartupNotify=true
Terminal=true
TerminalOptions=\s--noclose
Type=Application
X-DBUS-ServiceName=
X-DBUS-StartupType=
X-DCOP-ServiceType=
X-KDE-SubstituteUID=false
X-KDE-Username=

I downloaded a pretty PNG icon from the Web for the two Desktop Configuration Files, and stored it in a convenient directory, declared in the Desktop Configuration Files.

To launch a back up session I just need to double-click on the relevant icon.

Each of the two scripts records in a text file ~/Desktop/Mesh_DX_Linux_back_up.log the date and time when it completed. The next time I launch a back up, the event is appended to the log. An example of the contents of this log file is shown below:

Mesh DX Linux back up 1
Tue May 18 01:12:00 BST 2010
Mesh DX Linux back up 2
Fri May 28 19:47:33 BST 2010
Mesh DX Linux back up 1
Thu Jul  1 17:28:43 BST 2010
Mesh DX Linux back up 2
Sat Jul 31 01:03:03 BST 2010
Mesh DX Linux back up 1
Thu Aug 19 20:55:18 BST 2010
Mesh DX Linux back up 2
Sat Sep 11 19:13:12 BST 2010
Mesh DX Linux back up 1
Sun Sep 19 19:42:30 BST 2010
Mesh DX Linux back up 2
Fri Sep 24 23:51:19 BST 2010
Mesh DX Linux back up 1
Sun Sep 26 21:23:28 BST 2010
Mesh DX Linux back up 2
Tue Oct  5 01:19:18 BST 2010
Mesh DX Linux back up 1
Fri Oct  8 09:59:13 BST 2010
Mesh DX Linux back up 2
Fri Oct 22 11:36:29 BST 2010
Mesh DX Linux back up 1
Mon Nov  1 15:49:07 GMT 2010
Mesh DX Linux back up 2
Fri Dec  3 02:43:59 GMT 2010

By looking in this file I can see when I performed the latest backup and which of the two icons I need to double-click for my next backup.

If my house is unoccupied for any length of time then I take one of the drives or the laptop with me, or leave them in a secure remote location. So, if the house burns down or a thief were to break in, at least one hard disk would still be in my possession with my work files on it.

ENHANCEMENTS

Of course, I could increase the number of hard drives in order to enable me to retrieve files from further back in time, but my current strategy fits my needs. Your needs may be different to mine, so adapt the strategy as you wish. If you are not backing up your data, please do something about it now! I have been saved a few times in the past by retrieving a file from one of my backup drives that I inadvertently deleted or modified on my laptop.

EDIT (April 18, 2012): Here is a small improvement to my original scheme. It is better to move the sudo command to the Desktop Configuration File so that you only have to enter your password once and can leave the backup unattended. In my original scheme, if the number of files to backup is large enough, the sudo timeout would mean you would have to re-enter your password when the second sudo command is executed by the shell script. So I changed the two shell scripts and the two Desktop Configuration Files slightly as shown below. I won’t bother listing ~/backup2 and ~/Desktop/Backup_2 as the change is the same.

~/backup1

#!/bin/bash
echo -n "Mount the backup drive and then press ENTER: "
read ACKNOWLEDGE

echo "********** Backing up my home directory **********"

rsync --verbose --progress --stats --recursive --times --perms --links --compress --delete /home/fitzcarraldo/ /media/USBHDD01/Backup_of_Mesh_DX_Linux_1

echo "********** Backing up my Thunderbird directory **********"

rsync --verbose --progress --stats --recursive --times --perms --links --compress --delete "/media/Windows7/Documents and Settings/Fitzcarraldo/Application Data/Thunderbird/" /media/USBHDD01/Backup_of_Mesh_Thunderbird_1

echo "Mesh DX Linux back up 1" >> /home/fitzcarraldo/Desktop/Mesh_DX_Linux_back_up.log
date >> /home/fitzcarraldo/Desktop/Mesh_DX_Linux_back_up.log

echo "********** Backup completed **********"
date

~/Desktop/Backup_1

[Desktop Entry]
Comment[en_GB]=Backup /home directory to Seagate USB external HDD 01
Comment=Backup /home directory to Seagate USB external HDD 01
Encoding=UTF-8
Exec=sudo sh /home/fitzcarraldo/backup1
GenericName[en_GB]=Backup /home to USBHDD01
GenericName=Backup /home to USBHDD01
Icon=/home/fitzcarraldo/Pictures/Icons/save_all.png
MimeType=
Name[en_GB]=Backup_1
Name=Backup_1
Path=
StartupNotify=true
Terminal=true
TerminalOptions=\s--noclose
Type=Application
X-DBUS-ServiceName=
X-DBUS-StartupType=none
X-DCOP-ServiceType=
X-KDE-SubstituteUID=false
X-KDE-Username=