Snapsync

Snapsync


Snapsync is a wrapper around rsync, enabling snapshot style backups of directory trees.

Download Snapsync



NAME

snapsync.pl


SYNOPSIS

snapsync.pl [OPTIONS] /path/to/dir /path/to/backups


DESCRIPTION

Snapsync is a wrapper around rsync, enabling snapshot style backups of directory trees. Snapsync is used on numerous production systems, working equally well on small sets of data encompassing only a few files, huge systems with gigs of files across thousands of directories, and anything in between.

In default mode, given a source dir and a destination dir, snapsync: 1. looks in destdir for backup sets named backup.datex 2. syncs srcdir against backup.datenow using hardlinks where the data has not changed 3. rotates or deletes sets based on settings in the --rotation argument

The set backup.dateoldest is a full copy of the data, but all other backup sets are mostly hardlinks, except for any differences between sets. This means that if your data is size n, your entire set of backup sets is only size n + differences.

Snapsync takes numerous arguments that can affect operation drastically.


SYNC ENGINE SELECTION

Snapsync was originally written with rsync in mind, a well-tested and widely used synchronization platform with a wide array of features. Psync is an experimental internal algorithm that pales in comparison to the number of features rsync offers, but was written to test various ideas about synchronizing file system paths with massive numbers of files in them. Nosync allows snapsync to do its rotation semantics with no syncing, allowing for the use of other external copying/syncing tools.

Rsync does comparisons by building lists of files in the source and destination locations, and appears to transfer data only after recursing the entire folder structure. Psync builds no comparison lists, recursing the folder structure comparing differences and transferring data in real time. This is done with the goal of keeping memory consumption down and doing data transfer at time of comparison.

Rsync is preferred, and psync is considered experimental. See --sync in the list of options below.


REMOTE SOURCE DIRECTORIES

Snapsync can backup source directories on remote hosts. This is achieved by specifying a source directory in the standard user@host:/path/to/dir format. This only works when rsync is the sync engine in question, as first snapsync connects to the remote host over ssh to determine if the directory exists, and then executes rsync using the default remote shell mechanism on your system. This is usually ssh as well on modern systems. While passwords can be entered at time of execution, automation will obviously require ssh keys.


GFS ROTATION

Snapsync's concept of rotation levels (--rotation) can be leveraged to implement a grandfather-father-son rotation scheme to store backups reaching back for as long as is desired. A setup might look like this:

 snapsync.pl [OPTIONS] --rotation=daily:7 (run daily)
 snapsync.pl [OPTIONS] --rotation=daily:7,weekly:4 (once per week)
 snapsync.pl [OPTIONS] --rotation=daily:7,weekly:4,monthly:3 (once per month)

As for scheduling, cron does not understand the concept of once-per-month, but one way to achieve that is by chaining to the date command to run on the first Saturday of the month, for example:

 1 0 * * sat [ `date '+\%e'` -le 7 ] && snapsync.pl ... # first saturday
 1 0 * * sat [ `date '+\%e'` -gt 7 ] && snapsync.pl     ... # any other saturday
 
Adjust accordingly.  Note that levels can be named arbitrarily, be run at 
arbitrary intervals, and there can be an arbitrary number of them.


OPTIONS

-d, --dry-run

Do not actually do any operations. Combines well with --verbose to debug problems.

-e, --exclude

Path to exclude in the source directory when doing sync. This currently applies when using sync engines rsync:atomic or psync. Behavior is the same as rsync --delete-excluded: the atom will not appear in the backup set in the destination directory. Multiple --exclude options are allowed.

Note that when using rsync non-atomically (the default), or when needing to exlude a file or folder inside an atom when running atomically, one would need to use rsync's native exclusion mechanisms, exposed by way of passing the needed --options to snapsync wich are in turn passed to rsync. See man rsync.

-f, --flock

Path to lockfile to prevent concurrent runs.

-h, --help

Print brief usage message.

-i, --include

Path to include in the source directory when doing sync. This currently applies when using sync engines rsync:atomic or psync. This option indicates that all non-specified atoms in the source directory will be excluded as in the exclude option above. See that section for details.

-l, --log

Path to logfile.

-m, --mode

Determines the characteristics of the backup set, most visibly, the file extension to place on the backup set folders in the destination folder. Can be 'count,' 'epoch,' 'date,' or 'reverse.' Count stamp mode simply counts up, the most recent set being the highest number. Both epoch and date modes use the current timestamp as the extension name. Reverse mode is a special version of count mode, in that sets are shuffled around so that the most recent backup is always the lowest number (set 0), and the oldest backup is the highest number set. Default: 'date'

WARNING: Running snapsync against a destination directory containing different backup set mode types is not recommended and could lead to unpredictable behavior. However, it may be possible for you to convert your sets to one mode. See the -z option.

-o, --options

String of command-line arguments to pass to the sync engine. Note that snapsync always passes '-a --delete' to rsync, and '--delete' to psync, while this argument merely specifies additional options. This is controlled by the $RSYNC_BASE_OPTS and $PSYNC_BASE_OPTS variables at the top of the script.

-r, --rotation

Defines how backup sets are named, rotated through levels, and purged. For example, 'daily:7' would begin creating folders named daily.X until the maximum seven sets existed, at which time it would purge the oldest set on the next run. If this was changed to 'daily:7,weekly:4' however, sets would be promoted (renamed) to weekly.X rather than being deleted, up to a maximum of four at that second level. Setting any level to 0 turns off both deletion and promotion at that level, and sets will never expire. Default: 'backup:0'

NOTE: See the section GFS ROTATION for ideas how this option can be used to implement a grandfather-father-son rotation scheme.

-s, --sync

Determines which underlying synchronization engine to use for the backup, in the form engine:option:option, where engine may be either 'nosync' (rotation only), 'psync' (internal algorithm), or 'rsync' (the venerable rsync program). The rsync engine offers these additional options:

 atomic - descend one directory deep and do all ops atomically
 linkcopy - seperate cp -al step instead of using rsync --link-dest

The atomic option is very handy for isolating rsync to one file/directory at a time, with the additional benefit of timing information for each atom. The linkcopy option exists for old pre rsync-2.5.6 which lacked the --link-dest option. It is still offered because it seems to be quicker in some circumstances.

Nosync takes no options, only rotating sets and leaving a new empty set. Psync takes no options, always doing a separate linkcopy step and inherently operating atomically. Default: 'rsync', no options

-v, --verbose

Print lots of useful information. For more verbosity, pass --verbose multiple times.

-z

Attempt to convert all backup sets from the destination directory to the new mode given by the --mode argument, then exit.

WARNING: It is highly recommended that you first run in dry-run mode to see how your backup sets will be affected by a conversion. Snapsync makes a best guess based on the mtime of the backup sets, which may or may not work for you.


EXAMPLES

Simple snapsync of a folder:

snapsync.pl /path/to/dir /path/to/backups

Simple snapsync with all defaults spelled out:

snapsync.pl --rotation=backup:0 --mode=date --sync=rsync /path/to/dir /path/to/backups

Use psync instead:

snapsync.pl --rotation=backup:0 --mode=date --sync=psync /path/to/dir /path/to/backups

Use rsync atomically:

snapsync.pl --sync=rsync:atomic /path/to/dir /path/to/backups

Exclude some atoms:

snapsync.pl --sync=rsync:atomic --exclude /path/to/dir/atom /path/to/dir /path/to/backups

Include only some atoms:

snapsync.pl --sync=rsync:atomic --include /path/to/dir/atom /path/to/dir /path/to/backups

A verbose dry-run test:

snapsync.pl --verbose --dryrun /path/to/dir /path/to/backups

Do a separate cp -al step instead of relying on rsync --link-dest

snapsync.pl --sync=rsync:linkcopy /path/to/dir /path/to/backups

And again atomically

snapsync.pl --sync=rsync:atomic:linkcopy /path/to/dir /path/to/backups

Against a remote source directory

snapsync.pl --sync=rsync:atomic:linkcopy user@host:/path/to/dir /path/to/backups

Convert sets to epoch format

snapsync.pl -z backup,weekly,mystuff --mode epoch /path/to/backups


CHANGES

snapsync 1.00 (20120418)

- Initial release.


RESOURCES

http://www.mikerubel.org/computers/rsync_snapshots/