Backups – safeguarding your data
There’s an adage that says one should never put all one’s eggs in the same basket. The assumption is that even if some eggs are lost, the rest will remain safe. This needs an update before being applied to computers, because one of the things that computers can do really, really, well is to copy your eggs! That means that you can keep all of your eggs in several baskets at once. We call these, “backups”. If you want just bullet points based on the information below, look at Backups in a nutshell.
Unless your computer is only used for the most trivial purposes you will have files that you simply cannot afford to lose, whether they are your business accounts, your family’s store of irreplaceable digital photos or chapters of your latest novel. Backups are safety copies of your files.
Whatever type of computer you use, you already have some means of making backups yourself, but when the number of files is very large the management of backups becomes a big task – just the sort of task that should be left to a computer, and indeed you probably already have a program on your computer that will do it, at least in a simple way. What you have to do is to plan your backup strategy.
Computer professionals often mean something very specific when they talk about backup, sometimes for example drawing clear distinctions between a backup and an archive. There is merit in these views, but in this page we consider a backup to be multiple separate copies that let you recover your valuable data after it has been wholly or partially lost from its normal location.
We do not give all the answers. Each situation has its own solutions, but we hope that this overview will illustrate why there is a wide range of backup solutions including many types of hardware and special software. We end by declaring that some things are not backups, insisting on our “multiple separate copies” definition above. In particular, techniques for data integrity should not be confused with backup techniques – data integrity tries to keep your data as it is, but a backup lets you restore it as it was.
Chiefly because of risks such as fire and burglary. The only place to keep your backups is on a separate site. If you are backing up an office system, keep the backups at home. If you back up your home system, keep your backups at work, or leave them with a friend. “If it is in the same building as the computer it isn't a backup.” At the very least, never keep your backups beside the computer (we’ve actually had otherwise intelligent clients who did this)! If you have a car, how about having a little cubby-hole somewhere in the boot where you store your backups?
But wherever you store your backups, consider using a technique that encrypts them so that a thief could not read your data. That’s a subject for another time.
Because the purpose of a backup is as a safety net. Suppose you have only one copy and your main disk has just crashed. The “backup” is now no longer a safety net but in fact your only copy. You replace the disk and fetch the backup to perform a recovery. Now your only copy is in harm's way because there is an outside chance that it might be damaged in the machine before the recovery has been completed. “Two copies is a backup system – one copy is not a backup”.
Or suppose that you want to create a more up-to-date backup. You decide to overwrite your only “backup” but during the process, the computer crashes and you are left with a damaged disk and a half-finished copy. No backup.
How you can lose your data
- Disk errors
Fortunately rare, at least on a newish disk, and the most likely way to make it happen would be to turn off the power while data are being written to the disk, which is why it is important to shut a computer down properly. There are safeguards against inaccurate data being written to or read from a disk, and modern disk technology is very effective. In the event of imperfections in the surface of the disk a modern drive will automatically move affected data to a hot fix area.
But one thing that can be said about a disk is that, eventually, it will fail. There is a reason why most disks have about a three year guarantee. When a disk reaches the end of its life the hot fix area can fill up rapidly and then the disk can no longer correct errors. The good thing is that until the final failure, the disk is likely to be entirely reliable in respect of returning accurate data, but the bad thing is that if you ignore the signs, it will catch you by surprise.
- Program error
If a program has an error then it may write incorrect data to the disk. There is no safeguard against this because the disk will faithfully and reliably record the erroneous data as instructed by the program.
- User action
If a user of the computer enters data incorrectly or accidentally deletes files, the disk will faithfully record the action. This is by far the most common reason for data loss. It happens all the time, and it will happen to you. It does not even have to be an error – it can just creep up on you. Suppose that you have spent three weeks slaving over the final chapter of your novel and in the end you realise that you preferred the original draft, but if you have not kept different versions, but worked all the time in the file called “Final chapter”, you can only get the draft back by restoring that version of the file from a backup.
Hardware for backing up
There are several types of tape with widely varying cost as the capacity increases. Tends to give the cheapest cost for the highest capacity and is the preferred method for commercial use, but can still be expensive. Modern disk sizes even for home and office use are larger than all the reasonably priced tape systems, so multiple tapes might be needed for a full backup.
- CD / DVD
A low cost option as the disks are cheap and the writers are a standard feature of nearly all PCs sold. If you have no backup and your machine is fitted with a writer, make a copy of your important data right now before reading any further – we'll wait for you.
Very low capacity in comparison with tape, but also highly convenient for selective backups. Rewriteable disks can be used for rotated sets, but CDRs and DVRs are so cheap that archiving them permanently would be an option. Use a well-regarded brand of disk; the cheaper ones make better drink coasters than storage media.
- External disk
For a full backup, a portable external hard disk can be very useful. This could be brought on site for backing up and then stored elsewhere, and a series of incremental or differential backups continued on CD, DVD or tape. Many of them are sold with quite useful backup management software included You will need at least two, of course – “One copy is not a backup.”
- Online archiving
Generally not suitable for full backups, but you might consider this for important sets of files. Space can be rented on servers on the Internet (many ISPs now provide space as part of the deal), and files stored there are kept securely and safeguarded by the providers’ own industrial-strength backup systems. The files can also be downloaded to other locations, so the system has some advantages for file sharing.
Nowadays, online archiving is included in the generalised term, ”Cloud Storage“.
- Cloud Storage
This encompasses the online archiving mentioned above, but cloud providers offer lots of additional useful facilities. For example, there might be a means of automatically uploading files that you have altered, and a mechanism for retrieving old versions of your files, giving you a combined data integrity and backup service. But beware of cloud services that encourage you to create and edit all of your files “in the cloud” but don’t remind you to keep backups of your own where only you have control over them.
The advantage of Cloud Services is that, most of the time, they give a very high level of data integrity. The downside can be expressed in one very simple statement: Clouds evaporate. How long do you expect your cloud provider to stay in business? If all of your data exist only on the servers of a single cloud provider then, quite simply, you have no backup.
Finally, if confidentiality matters to you, consider this alternative description of Cloud Storage: “Other people’s servers”. Need we say more?
Strategies for backing up
The most essential attribute of a backup is that it is a historical record. When you use your backup to perform a recovery it is because you do not want your data as it is (incomplete or destroyed), but as it was. A vital supplementary question is, “You want your data as it was when?” There are two most likely scenarios for the restoration of backups. The first is disaster recovery, when you want to recover all of your data right up to the moment that disaster struck. The second is historical recovery, where you realise that a mistake was made some time ago and you have to recover the correct data from immediately before that moment.
So a comprehensive backup strategy means having both recent backups, and a set of several backups going back over time. The core of your strategy is to decide how often to back up, how long to keep your backups and what files to save each time, and major factors in those decisions are the value you place on your data, and how much or how often the data change.
Your strategy will almost inevitably use the following methods, including rotations with full and probably incremental backups at different times.
- Grandfather, father, son
A set of media (it does not have to be three) used in rotation so that the oldest is erased and re-used each time a backup is done, becoming the newest. The idea of rotations is fundamental to most backup systems, sometimes using more than one set, e.g. daily and weekly rotations.
- Full backup
Does exactly what it says – a full backup will contain a full copy of the system. Ideal for disaster recovery, as long as it is a recent full backup.
- Selective backup
Select a subset of important data to be backed up. Quicker than a full backup, so it can be done more often. This is handy where a subset of your data changes more frequently.
- Incremental backup
Back up only items that have changed since the last backup. This usually results in a quick and small backup, so it can be done more often, and often (for example if tapes are used) it can be added to the previous incremental backup. To do a disaster recovery you would have to restore your most recent full backup followed by each of the subsequent incremental backups.
- Differential backup
Similar to the incremental, but back up all items that have changed since a specific time, usually the time of the last full backup. The first differential backup would be identical to an incremental backup, but the next would back up all of those files again, plus those that have changed, and so on. So a differential backup is not as small and quick as an incremental, but to perform a disaster recovery you would only need your most recent full backup and your most recent differential backup.
Techniques for data integrity
None of the following are backup techniques. If you make accidental changes, or delete your data, any of these systems will just record your decision with faithful integrity. Do not let anybody sell you one of these as a backup system; you still need backups.
- RAID arrays
The Redundant Array of Inexpensive (or Individual) Disks is a system that makes disk storage faster and more reliable. It uses a mathematical trick to spread the data over three or more disks in such a way that a proportion of the data is redundant, so that any one disk can fail without any loss of data or even stopping the computer. When the failed disk is replaced the data can be rebuilt onto it from the remaining disks, restoring the resiliency of the system; more highly specified systems even have a spare disk already installed that will do this automatically.
- Disk mirroring
Two disks which each have an identical copy of the data, kept constantly in step. This technique pre-dates the coining of the term, “RAID”, but is now regarded as the most basic form of RAID.
- Remote disk mirroring
Disk, or array, mirroring where the second disk or array may be many miles away. Although this is not a backup system, it is an important part of many disaster recovery strategies.