What is Incremental Backup?
Incremental backup is a term used in reference to a methodology of digital data backup. In an incremental backup framework, an initial backup is performed of “all” data selected for backup. Following the initial full backup, only data which has changed since the previous backup was performed is backed up. This means that there’s no need to spend resources (storage, bandwidth, etc.) backing up a fresh copy of all data in scope and the latest versions are still preserved for everything.
What’s the difference between incremental, differential, and full backups?
There are three common different types of business data backup strategies that are traditionally used:
- Full backups create a complete copy of the entire set of data at each backup interval
- Differential backups backup all data modified since the last full backup
- Incremental backups backup all data modified since the last backup of any type
Why is incremental backup a good idea?
Incremental backups have two distinct advantages over a full backup.
- They complete much faster, since the only files included are ones that have changed since the last backup.
- Impact on resources such as disk space and network bandwidth is minimized for the same reason.
A concept called deduplication is key to the ability to perform an incremental backup. Effectively many solutions will perform scans of files that have been changed to ascertain whether any of the data in scope already exists within the stored backup. Deduplication further increases backup speed and minimizes resource consumption – enabling backup software to check the backup repository for data it already has to avoid duplication of effort. You can either perform this deduplication at the file level, the block level, or both.
What’s the difference between file level and block level backups?
Depending on the backup software in question, incremental backups will leverage either file level or block level deduplication. The strongest solutions leverage both.
- A file level solution will backup a complete copy of any changed file. This process is simpler than a block-level solution, however they present the disadvantage of treating every change to a file as though it were a “Save As” and storing multiple copies of the same data.
A block-level solution, will dissect each target file into components called “blocks.” Then, each individual block can be checked against the data which is already in storage. This minimizes the need for duplicate storage. An example of where a block level solution would save significant time and energy over file level is when you add a paragraph in the middle of a document however the beginning and the end are unchanged. A file level solution would treat the modified file as wholly new data; a block level solution would only need to backup the blocks related to your new paragraph.
A strong business data backup solution will leverage both file and block level backups. The software will first start by detecting file-level changes. Any files which haven’t changed are considered “backed up” already and will not be collected again. This signals to the system that some portion of the file needs to be re-collected but then upon performing the backup operation at your defined backup interval, a block level process will minimize wasted resources and optimize your backup speed. Both combined get you the best mixture of coverage and cost.
What are some methods of creating an incremental backup?
There are two main methods in incremental backups to determine whether a file needs to be backed up.
The first is to run a periodic scan of the entire backup selection to check for changes in files. On Mac and Linux systems, this is done by comparing the file modification timestamp to the timestamp of the most recent backup. If the modification timestamp is newer, the file is backed up. If not, then the file has not changed since the last backup, and it can be skipped. On the Windows platform there is also a special metadata flag called an“archive bit” which can be used to determine whether a file needs to be backed up. This flag is typically set when a file is modified, and cleared when a backup is run. However, since it is easy for software to change this bit, this is a less reliable method than using the timestamp.
The other method for checking for files to back up is to rely on the operating system to notify the backup software of any changes made to files. For example, the FileSystemWatcher service can be used in Windows, and Spotlight can be used in mMacOS. There are two primary advantages of this method over the periodic scan. For one, it uses far fewer system resources than running a scan on the entire backup selection. Changes are also reported immediately, so the backup software can back up changes much more quickly than if it needed to wait for a scan.
Similarly to file vs. block level deduplication, a strong backup solution will use both methods to ensure a complete backup. Having the operating system notify the system of file changes is the fastest method of knowing which files to back up. However, running a periodic file verification scan secures files which may otherwise have been missed due to software glitches or operating system inconsistencies.
What are the drawbacks of incremental backup?
In traditional backup software, incremental backups are stored as unique blobs (not the technical term) of data across the system of the backup provider’s storage infrastructure. This means that restoring files from backup can require accessing multiple separate locations from within the provider’s storage. This typically means that restores are much slower than when restoring from a full backup. The best analogy we’ve found to explain this phenomenon is to imagine that you’re cooking dinner and need to gather ingredients (backed up data blocks) to assemble your recipe (the file(s) you’re trying to restore). Most of the time, an incremental backup stores ingredients separately across a large store requiring a lot of back-and-forth between aisles and careful selection of specific items from each aisle. In contrast full backup will have every ingredient in a single aisle where it’s possible to basically wipe everything off the shelf directly into your cart.
Additionally, if tiered storage is used, some of that data may need to be pulled from higher-latency cold storage (a supplier which must ship things from a storage warehouse to the store to continue our grocery metaphor) in order for the restore to proceed which increases the time and relative expense of a recovery operation.
One workaround which traditional backup software uses for this problem is to create a synthetic backup. With synthetic backups, it’s still only necessary to send incremental backup to the storage location. Then, the backup software will combine incremental backup data with the full backup on the backend. This is very useful for restoring speeds because it creates a de-facto full backup after each backup interval. The downside of most synthetic backups is that they lead to a massive increase in storage utilization as compared with traditional incremental backup given that each backup is effectively a complete additional copy.
How to implement incremental backup
And, we would be remiss at this point if we did not say that CrashPlan (the software we make) employs many of the methods we have mentioned previously to ensure complete and secure incremental backups. Plus, CrashPlan provides reliable business data backup while avoiding resource and speed concerns by storing the initial full backup and incremental data for each device in a single location. This means that all data backed up is immediately added to the existing full backup, and can be restored immediately without using fragmented storage archives or requiring the overhead of a synthetic backup.
Find out more or start a trial here.