Transferring large amounts of data over unreliable network connections

Ever wanted to transfer a batch of multi-Gigabyte files through a slow DSL link and had trouble? Then this article may help – it shows how to do this properly, with standard Linux/Unix command line tools and working SSH key-auth between the two hosts.

Lets look at what the problems are one may encounter when attempting to, say, transfer an 8 GiB video file over the Internet, uploading through a slow link:

The connection to the destination host may break at any time, due to any network equipment failure in between, or DSL reconnects by the ISP, or your son pulling the plug of your home router, …
The data may get corrupted by faulty hardware or software along the way
The uploading may clog up your ADSL connection, essentially making it unusable for anything else
Transferring the data without encryption would allow the maybe sensitive data to be read by others, e.g. someone at your ISP, or nearby the destination host
The transfer may take hours or days, depending on the amount/link speed ratio. Not getting notified when the transfer finishes may be undesirable

Here is my solution to all the above problems, in a couple of commands. Once entered, you don’t need to worry about the transfer – just wait for a notification e-mail:

ssh-agent bash ssh-add while ! rsync \ --bwlimit <KB/s value> \ -rP \ /path/to/directory_that_contain_the_data_to_be_transferred \ user@destination.host:/path/to/target_directory ; \ do sleep 60 ; done && \ echo "File transfer completed successfully at $(date)." | mail -s "File transfer completed" your@e-mail.address

Quick explanation of the commands:

Start a shell that is ssh-agent enabled, i.e. the SSH key passphrase can be cached within that shell
Unlock the SSH key by entering the passphrase (which is then cached for all following commands in this shell session)
Start a loop that will only end when the ‘rsync’ command (used as the loop’s ending condition) completes successfully, i.e. the file transfer is done.
rsync is the perfect tool for the job, since it transfers files reliably (through checksums), can resume efficiently, and all traffic is going through an encrypted SSH connection.
The –bwlimit option of rsync throttles the transfer to <KB/s value> – with my 512 KBit/s = 64 KB/s ADSL upload that means I would use a value around 35 or 40, to guarantee there is still some upload bandwidth left for other things.
-r stands for recursive (transfer the whole directory, and all sub-directories), and -P keeps partially transferred files for resuming and shows progress information; for details see ‘man rsync’
Should be self-explanatory, if not, see ‘man rsync’
Should be self-explanatory, if not, see ‘man rsync’
The actual content of the while loop is just “wait for 60 seconds”. So in case there is a connection problem, the ‘rsync’ command will be retried every minute.
Once the whole loop has completed successfully, which equals a successful transfer, send a short e-mail that notifies you about the completed transfer.

(By the way, you can of course make one line out of lines 3 to 9, I just split it up to make it easier to read and explain)

Leave a Reply Cancel reply