Team LiB
Previous Section Next Section

Backup Types

Backups can consist of tar-balling your data up and sticking it on another server, dumping a live filesystem to tape, or e-mailing yourself 562MB uuencoded tar files to be reassembled at home. Although none of these are really production quality forms of backup, the whole point of backups is being able to get your system back to its original operating condition after some catastrophic event.

Most production systems require a regular and strictly regimented backup schedule to a set of tapes. To be effective, your plan should include backing up regularly and often, and occasionally rotating sets of tapes off site for safekeeping in case of disaster.

Before you learn how to work with the software tools to get the job done, you need to understand the basics of various types of backups (levels), backup methodology, media, and various related technologies. Then you can start making intelligent decisions about what backups tools to use and how to use them most effectively.

Types of Backups

What are commonly referred to as types of backups should really be called levels of backups. What level of backup you want dictates how many files make it to tape (or your backup media of choice). The following sections describe the three main types of backups: full backup, incremental backup, and differential backup.

Full Backup

A full backup is simply backing up every file in the system or target directories, regardless of the previous backup time or changes since the last backup. A full backup is what most nonadministrators think of when they think of backups; however, this is not what you should do every time you want to run backups of your system. The time it requires and amount of tapes it needs makes this solution something you don't want to do every time backups are needed, but it is still required once in a while to achieve a good overall backup strategy.

If you have a corporate file server, for example, running full backups of it every night of the year just is not feasible-not to mention all of your other systems in the corporation. Let's say you have a 300GB RAID-5 corporate file server and you want to back this up every night to your local, measly little four 12/24GB tape drives. If you get a 70% compression rate (yielding ~20GB/tape), this would take you around 15 tapes per night at a cost of around $60/night with a standard 90 data day retention time-over $5,400 for 3 months of tape. And that's just one of your servers. You may have only spent half that much the server itself. The bad thing is over 90 percent of the data on all those tapes is all the same. This is not an efficient method to use in a production environment.

Another issue with full backups is the time required. If each of your four little 12/24GB tape drives chugs along at 35MB/min transfer rate (real world), the effective transfer rate is 140MB/min (assuming that you have your four tape drives on separate SCSI busses running full speed, with a constant data stream). All things being equal, your daily 300GB file server full backup should take you around 36 hours of continuous backing up, or around 1.5 days! (And this is assuming that you're there for the full backup, swapping tapes when prompted, and making runs to Taco Bell between tapes.) As you can see, this is not feasible in most real world environments. But more than this, it's wasteful and will turn you into a gray, bitter, old adminstrator who mumbles to himself in the hallway. Don't do it.

But do you really need full backups every night? After all, how much of your data really changes every day? From experience, I can tell you that that the delta (or amount of file change) on our example would generally be less than 1 percent. So in the real world, you could probably just get away backing up the delta of no more than 3GB of data per night (you could get a whole week on one tape!). This leads us to consider other alternatives-incremental and differential backups.

Incremental Backup

Incremental backups are more frugal than full backups. Simply put, incremental backups record everything that's changed since the last backup. To do a restore, you will need the most recent full backup tape(s), plus the tapes of each incremental backup since the full. So with incrementals, you start with a full backup, and then build on that with each incremental only backing up what's changed from night to night. This translates to less data to back up per night, hence less tape used. In fact, you can probably squeeze more than one day's backup per tape if you're careful. However, when implementing an incremental backup system, if you have to do a full restore on Friday, you need the tapes from Thursday, Wednesday, Tuesday, Monday, and all the full tapes from Saturday. Figure 5-1 shows what a daily incremental-based backup looks like.

Click To expand
Figure 5-1: Example of using full and incremental backups together.
Note 

If you don't have people working weekends and can afford to skip backups on Sunday, consider keeping this "the day of rest" (for the tape drives anyway). It gives you an emergency time slot to do large unexpected backups, restores, or run into "full backup overflow" from Saturday if need be so that you don't eat into Monday morning when people start coming in. Nothing slows down live file server access more than running a remote backup of it while people are trying to get to it and use it.

Since more tapes are required to do a full and current restore from any given day with this scheme, (today's tape plus each tape back to and including the full backup) this can make for serious trouble if you can't locate and retrieve all of the tapes required. Even so, an incremental-based backup system can be used in a production environment; it's just a nightmare to restore from and increases restore failure risks. It can effectively allow you to get the job done, especially if your department is in a financial pinch and doesn't want to buy all those tapes for full. Just keep in mind: there is not "one best solution" here. The pros and cons of each backup scenario must be considered for your given situation. As we'll see a bit further down, occasional incrementals make for a good diverse backup strategy, they're just not ideal to rely on as your sole backup strategy.

Tip 

If you don't always have budget for new tapes, get them when you can. You'll need them the most when you're totally out. Trust me.

Differential Backups

While an incremental backup records whatever has changed since the last backup of any type, a differential backup only backs up what has changed since the last full backup. So if we first backup the 300GB file server with a full backup on Saturday, and then every night we backup using differentials, we will typically have something like in the graph shown in Figure 5-2.

Click To expand
Figure 5-2: Example of using full and differential backups together.
Tip 

When scheduling backups with cron via crontab -e, many inexperienced administrators just schedule a time such as midnight or 1:00 A.M. for the backup job. Don't do this. Pick the hour you want your backups to start, and then add a random number of minutes from -12 to +12 to come up with your backup start time (for instance, 1:08 A.M.). Scheduling your backups like this will keep you from running into the classic "top of the hour" problem that less experienced administrators face when their system starts an even-hour database dump, runs backups, fires off a massive tar ball download, and starts scanning the system for file alterations all at once. All this activity starting at the exact same second is bad and can take down an otherwise stable server. Also, note the system's own standard mass cronjob times in /etc/crontab and avoid those times as well.

Figure 5-2 shows a full backup on Saturday, with differential backups beginning on Monday. Each day you back up what has changed since the full backup on Saturday (differentials). The great thing about this is that to do a full restore of your system, all you need is the previous night's differential tape, as well as the last full backup set (from Saturday in this case). The major disadvantage of this scenario is that you are still backing up data that has been backed up multiple times (on weeknights), and so you still really use more tape space (or tapes) than the absolute minimum required to get the job done.


Team LiB
Previous Section Next Section