Section 3.2. Use Source Control

3.2. Use Source Control

The first rule is hopefully the most obviousall development teams and even individual developers should be using source control for all of their work. Unfortunately, this is often not the case, and the repercussions can be fairly dire.

This first rule is by far the most important. It's the key to creating a solid development environment. If you're already working on a web application that doesn't use source control, now is the time to halt development and get it into place. The importance of source control cannot be emphasized enough.

3.2.1. What Is Source Control?

If you've not encountered source control (often called software configuration management or SCM) in your work before, you're going to kick yourself for missing out on it for so long. It could be summarized as "the ability to undo your mistakes."

If your code-editing software didn't have an undo function, you'd soon notice. It's one of the most basic features we expect. But typically, when you close a file you lose its undo history and are left with only the final saved state of the document. The exception to this rule is with "fat" file formats, such as Corel's WordPerfect, which save the undo history (or at least a segment of it) right into the file.

When you're dealing with source code files, you don't have the ability to store arbitrary data segments into the file, as with WordPerfect documents, and can only save the final data. Besides, who would want source files that kept growing and growing? Source control, at its most basic, allows you to keep a full undo history for a file, without storing it in the file itself. This feature, called versioning , isn't all that source control is good for, and we'll look at the main aspects in turn.

3.2.1.1. Versioning

Versioning, the most basic feature of a source-control system, describes the ability to store many versions of the same source file. The typical usage sequence, once a file is established in source control, is as follows. A user checks out a file from the repository , a term used to describe the global collection of files under source control. A repository is then typically broken down into multiple projects or modules that then contain the actual files. The term "checking out" refers to grabbing a copy of the file to be edited from the repository, while maintaining an association with the version in the repository (often with hidden metadata files). Once the file has been checked out, the user performs her edits on the file. When the user is happy with her changes (or just wants to checkpoint them), the file is then committed (or checked in) and updated in the repository.

At this point, the repository knows about two versions of the file. Each version has some kind of unique identifier, called a revision number. When a user checks out the file again, she is given the latest revision.

3.2.1.2. Rollback

Keeping an undo history would be pointless if there were no way to traverse that history and perform the actions in reverse (to get back to the state the document was in before the editing started). Source-control systems thus let you retrieve any revision of a file in the repository.

Revisions can be accessed in a few different waysvia the revision number that was returned when the revision was committed, via the date and time at which the revision existed, and via tags and branches, which we'll deal with shortly. When accessing by date and time, source-control systems let you return a snapshot of your source code from that point in time (that is to say, the most recent revision at that time). This lets you regress your code to any point from the present to back when the file was added to source control.

3.2.1.3. Logs

When each revision of a file is committed, the committer can optionally add a message. This message can be used to describe the changes made since the last revision, both in terms of material changes and the reasons behind the change. These log messages can then be viewed for each revision of the file. When read in sequence, commit log messages give an evolving story of what's happened to the file.

3.2.1.4. Diffs

Source-control systems can provide diffs similar to the Unix diff(1) program. A diff shows deltas (changes) between two files, or in the case of source control, two revisions of the same file. There are a few different diff formats, but they all highlight the lines that have been added, modified, or removed.

By looking at the diff between two revisions, you can see exactly what changes were made. Couple this with the commit log messages and you can see the explanation behind the changes. You can also see who made the change. . . .

3.2.1.5. Multiuser editing and merging

A primary feature of most source-control systems is support for multiple simultaneous users. Users require "accounts" for checkouts and commitsa user has to authenticate himself for each action he performs. In this way, every file revision users commit and log is tagged with their details, and the repository keeps track of who did what.

When you let more than one user edit files in a repository, there's a chance that two users might try and edit the same file at the same time. In this case, a merge operation is performed, merging the changes from both revisions into a single new version (often called a three-way merge).

Some source control systems can do many merges automaticallyif one user edits a function at the top of a file and another edits a function at the end of the same file, it's fairly straightforward to merge the two changes. This Unix merge(1) program does just this, and is in fact used by some source control systems. In the case where code cannot be automatically merged, a conflict occurs, and the second user to commit is notified. The two versions of the changed code are shown side by side, and it's up to the user to manually merge the changes together (see Example 3-1).

Example 3-1. Manual merge process

<<<<<<< library.php function my_function($a, $b){ ======= function my_function_2($b, $a){ >>>>>>> 1.3

In practice, conflicts should happen very rarely. If they occur frequently, it indicates one of two things. The first, and far easier to solve, is that the user isn't checking out files often enough. A good rule of thumb is to re-check out files (sometimes called updating) before starting each work session (such as at the start of the day) and again before beginning any large change. The second cause can be a real problemwhen two users are working on the same piece of code simultaneously, it usually indicates that the users aren't communicating enough. Having two engineers work on the same piece of code at the same time, unless they're pair programming, is only going to lead to wasted time merging the two edits, if they can be merged at all.

Some files simply cannot be merged, or require special client software to do so. If you were using your repository to store program binaries, for instance, the actual binary differences between versions don't really matter to you. If you were using your repository to store images, the differences might be expressible and mergable in some sort of GUI-based application.

Although HTML and XML documents are simple text at heart, specialized programs are often useful for examining the differences between revisions and conducting three-way merges. This is especially useful when making sure that a document is well-formed XML while merging multiple revisions together.

3.2.1.6. Annotation (blame)

source-control systems allow users to view files in annotated or blame mode, in which each line or block of lines is identified by the revision and the user who last modified it (Example 3-2).

Example 3-2. Annotation log

413 cal 1.77 if (tags_delete_tag($HTTP_GET_VARS[deletetag])){ 414 415 cal 1.42 $photo_url = account_photo_url($photo_row); 416 417 asc 1.74 if ($context_ok) { 418 $photo_url .= "in/" . $current_context['uid']; 419 } 420 421 cal 1.42 header("location: $photo_url?removedtag=1"); 422 cal 1.37 exit; 423 eric 1.88 }

This example shows that lines 413 and 414 were committed by "cal" in revision 1.77. Lines 417 to 420 were committed by "asc" in revision 1.74. "eric" contributed the closing brace on line 324 in revision 1.88.

A blame log like this can be very useful when tracking down the rationale behind a particular block of code or identifying the revision in which something changed. The example log tells us that handling for $context_ok was probably added by "asc" in revision 1.74. We can then zoom in on this revision, looking at the diff and commit log messages, and see if the feature was indeed added in that revision or if it existed before and was only modified in that revision. In the latter case, we can then look at an annotation log for revision 1.74 to see which previous revision the code came from.

In this manner, the annotation log lets us quickly find the revision number and commit log message which accompanied a particular change, without having to hunt through the diffs for every commit. This quickly proves useful when files have hundreds or even thousands of revisions.

3.2.1.7. The locking debate

The default working mode (and sometimes the only working mode) in most source control systems is to avoid the use of locks. That is to say, any user can edit any file at any time, and conflicts are managed using merges. An alternate approach is to force users to explicitly checkout a file for editing (sometimes called reserved check outs), locking other users from editing the file until the first user commits his changes and releases the lock.

Using file-editing locks completely eliminate conflicts because no user is ever able to work on a file while another user is editing it. There are drawbacks to this approach, however. To start with, it doesn't allow users to work on two parts of the same file simultaneously, where there could have been no conflict. The more serious drawback occurs when a user checks out a file for editing, and then changes her mind and forgets to unlock the file, resulting in a file that nobody else can edit. Couple this with an engineer who goes on holiday, and you can get in a spot of bother.

3.2.1.8. Projects and modules

Nearly all source-control systems allow you to group sets of files and directories into projects or modules. A number of files comprise a single module and the repository tracks all of the files together. The code for a single application can then be collected together as a single module, which you check out all at once.

3.2.1.9. Tagging

The action of tagging a module means marking the current revision in each file with a specific marker. For example, you might add the tag "release-1.00" at the point of release. This would add a tag to the current revision of every file, which would not apply to subsequent revisions. Then, at some point in the future, you can request to check out all files tagged with "release-1.00" and return to the state of the repository when the tag was added. This has the same basic effect as asking for a repository snapshot at a particular point in time, but allows you to use meaningful labels for releases and milestones.

3.2.1.10. Branching

The latest revision is usually called the head , but there are some circumstances in which you might need more than one headthat's where branches come in. A branch effectively splits the repository into two mirrored versions, both of which can progress independently of one another. A user chooses which branch he wishes to work on while checking out, and any work committed is only applied to that branch.

The canonical example is to create a separate branch at the point of a major version release. The branch that contains the main head, often called the trunk , can then continue as normal, working toward the next release. If a bug is found that needs to be fixed in the current release and can't wait for the next release, then the bug can be fixed in the second branch. This branch is identical to the first release point, so you can safely fix the bug and re-release the code without incorporating any of the work going into the next release. A branch like this is often called a maint (short for maintenance) branch.

Branches can be used to good effect in the opposite direction, too. If you want to work on a new feature that drastically affects the code base, but you don't want to block the releasability of the trunk, then you can create a branch for feature development and keep the trunk moving.

3.2.1.11. Merging

Creating a branch is all well and good, but at some point you're not going to want it anymorewhen you're ready for your next release, or the large feature you've been working on is ready for prime time. At this point, you're going to want to merge your branch changes back into the trunk, so it's apt that this action is called merging.

A branch merge is similar to a conflict merge when committing changes; each file that has changed in both the branch and the trunk is merged. If the files can be merged automatically (in the case where changes don't overlap), then they are; otherwise, the conflicting section is presented to the user for a manual merge.

It's best to think of the trunk as a special kind of branch, so all merges happen between the branch you want to keep and the branch you want to merge into it. In the simple case, the former is the trunk, while the latter is the branch, but they can both be branches equally and validly. Most source-control systems allow branches of branches of branches, ad infinitum.

3.2.2. Utilitiesthe "Nice to Haves"

Aside from the server and client software required for the source control system, there are a number of common "nice to have" features, usually provided by extra software. We'll talk about these features one by one, and then have a look at which products provide them.

3.2.2.1. Shell and editor integration

Most source-control systems were invented and spent their childhood in the world of the command line. The end effect is that most basic source-control clients are pure command-line applications, requiring arcane syntax knowledge.

But the command line is no longer the most obvious choice for code editinga lot of programmers use Windows or OS X, or even Linux with a window manager. In many of these instances, people are using GUI-based editors and file transfer programs to increase their productivity (in theory).

Most source control products have evolved to support GUI clients, or where there have been gaps in official products, independent software developers and open source groups have stepped in and filled them. Many modern editors have support for various source control systems built in or support for adding them via plug-ins. Most developers now take for granted the ability to commit, roll back, and diff revisions right in the editor.

Where editor integration isn't possible, or when you're handling nontext documents in source control (such as images), some source control products have shell integration to allow checkouts, commits, and diffs directly from the OS file browser.

Although listed here under "nice to haves," good editor and shell integration is often a "must have" feature, especially when your developers aren't command-line savvy.

3.2.2.2. Web interfaces

The ability to look back through previous revisions of files, check commit logs, and perform diffs between revisions is an inherently visual process. With command-line tools, it quickly becomes difficult to search through files with hundreds of revisions, drilling down into diffs and cross-referencing file change sets. Luckily, we already have sometimg that includes languages and tools for browsing highly linked repositories of information: the Web.

The major source control systems have web interfaces that allow you to publish and browse code repositories in real time, view lists of files and folders, browse lists of their revisions, generate diffs between arbitrary revisions, and so on. Most interfaces allow access in any way the command-line tools mightyou can browse annotation logs for each revision, view revision by date or tag, and navigate branches.

The persistence of a web-based repository browser also brings advantages for people working in teams. A diff can be permalinked and the URL shared, so developers can share and talk about changes.

3.2.2.3. Commit-log mailing list

In addition to acting as a record of the changes to a particular file, commit messages can also act as a general activity indicator. By sending each commit message to a mailing list, often with a link to the relevant diff on a web-browsable version of the repository, a team of developers can be notified of everything being worked on.

This can also play a useful role in tracking down unexpected behavior and bugs. By looking at the recent messages on the committer's mailing list, a developer can immediately see a list of commits made since the last known good checkpoint.

A list of all committed work can also play a role in project management, giving an engineering or project manager an idea of who's been working on what, at least in the sense of committed code.

3.2.2.4. Commit-log RSS feed

It's a short step from thinking about a commit mailing list to thinking about a commit RSS feed. The concept is similar, and it can be used for almost all the same things. The downside compared to email-based mailing lists is that an RSS feed limits the number of items shown. Depending on the feed reader used, old items that drop from the feed document may or may not be kept in the client. For looking back over a long period of time, this can be a problem.

For developers who already use feed readers for keeping up with news, weblogs, and such, using feeds for repository commits can be a useful way of adding commit-log access without adding too much extra email for the developer. Aside from engineering managers using commit logs as a source of tactical overview, a commit log is very much something to dip in and out of, rather than sequentially consume.

3.2.2.5. Commit database

The fifth feature that's often deemed important by developer groups is the commit database. The premise is fairly simpleas code is committed, the commit transaction is added to a relational database management system (RDBMS). Some kind of frontend, usually a web app, can then be used to query and browse this database.

At first glance, this feature seems to provide many of the same services as the web-based repository browser and the commits mailing list or feed. This is certainly true, although a commit database is not really a sensible replacement for either, as the separate tools offer much more. The features unique to a commit, however, are what we're interested in. A few practical examples should help explain why you should care.

If you know a certain variable or snippet of code is now in the code base across multiple files, a good grep will tell you where the code resides. But what if you want to know which revisions that code was added to? A few individual trips through the revisions of each file will tell you, but for large file sets, the search quickly becomes very tedious. A commit database can tell you immediately which revisions a snippet of code first appeared in and who added it.

Even trickier is if you want to find out when something disappeared. If you have a code snippet that doesn't exist in any of the current revisions of files in the repository, you can't even grep for it. You could check out successively older snapshots of the repository and grep through them until you find the culprit, but again, for large code bases with thousands of revisions, this task is almost impossible. Once again, a commits database can return this information instantly.

Checking what an individual user has been working on is possible with the commits mailing listjust sort your mail by sender. But how would you go about finding out who's been committing files in a specific folder, or finding out what the most recent commits were from a specific developer in a specific subset of files? These actions are only possible using a commit database or by painstaking recursive queries against commit logs.

The small cost of setting up a commit database (they often plug right into the web repository browser) is quickly paid back the first time you need to track down any kind of obscure subset of commits.

3.2.2.6. Commit hooks

The sixth and final "nice to have" is a slightly nerdier feature than the otherscommit hooks. Source control systems with hooks allow you to run custom code and programs when certain actions are performed, such as the committing of a file, the branching of a repository, or the locking of a file for modification.

Commit hooks have various uses, ranging from powering the other "nice to have" features to performing complex application-specific tasks. For applications with strict coding guidelines, a commit hook can check that newly committed code complies. For files that need to be stored in more than one format (such as a JavaScript file that is kept in a well-commented "source" version and a compressed "production" version), a commit hook can be used to keep two versions in sync. For applications with automated test suites, tests can be performed on commit and the results immediately sent back to the developers, alerting them if the code they committed breaks the build.

There are literally hundreds of uses for system hooks, many of which can turn a monotonous rote task into an automated action. Automating our processes, as we'll see in the next section, helps us reduce mistakes and streamline our workflow.

3.2.3. Source-Control Products

There are quite a few source-control systems available, both for free and for cash (in some cases, quite a lot of it), and there are a few factors that you should consider when choosing a system for your development. Aside from supporting the basic features you need, the availability of client software for your developer's operating systems can play a vital roleif your programmers can't commit code to the repository, there's little point in having it.

It's not just the client that matters, but the tools and utilities around it. Does the tool you choose have a web interface, mailing list integration, and a searchable commit database? We'll discuss a few of the more popular choices and weigh the pros and cons.

3.2.4. The Revision Control System (RCS)

RCS is the grandfather of modern source-control systems. Created by Walter Tichy at Purdue University in the early 1980s, RCS has been around for a long time, is easy to understand, and is used internally by CVS. RCS is actually based on the earlier SCCS (Source Code Control System), which was a closed-source application suite provided with versions of Unix in the 70s.

RCS is not a full source-control system, but rather a format for storing revisions of a file in a single other file. RCS has no way to connect to a repository from another machine (apart from via a remote shell, which you might regard as cheating), has no support for multiple users, and no sense of projects or modules. I include it here primarily as an item of historical interest.

3.2.4.1. The Concurrent Versions System (CVS)

CVS was released in 1986 by Dick Grune at the Vrije University in Brussels as a source-control system for handling large projects with many developers and a remote repository. Using RCS for the actual revision storage, CVS provides several remote access protocols, support for branching, merging, tagging, locking, and all the usual functions.

3.2.4.1.1. Client availability

Because CVS has been around for a long time, there are clients for virtually every possible platform. If you have a Linux box, chances are you already have the CVS client and server software installed. Windows and OS X have clients in abundance, both with the official command-line clients and with independent GUI clients. WinCVS for windows is fairly popular and easy to use (http://www.wincvs.org/).

In terms of shell and editor integration, CVS is far ahead of its alternatives, with support in virtually all editors that allow plug-ins. BBEdit on OS X (http://www.barebones.com/products/bbedit/) and Vi or Emacs for Unix are fairly popular choices. Under Windows, TortoiseCVS (http://tortoisecvs.org/) gives comprehensive shell integration, adding CVS actions to file context menus in Explorer.

3.2.4.1.2. Web interfaces

CVS has a couple of major web interfaces, confusingly called ViewCVS (open source, written in Perl) and CVSView (open source, written in Python). Both look very similar, provide all the basic features you want, and are possible to extend, given enough patience. If neither of those takes your fancy, there are a wide array of alternative web interfaces, including Chora, which is part of the Horde suite (open source, written in PHP; http://www.horde.org/), so it's usually possible to find one in your favorite language.

3.2.4.1.3. Mailing list and RSS feed

CVS has a pluggable system for executing arbitrary commands whenever something is checked in. This allows scripts to be easily plugged in to turn commit events into emails and RSS feeds, just by committing some scripts to the CVSROOT folder in the root of the repository. The usual suspects are CVSspam for mail (http://www.badgers-in-foil.co.uk/projects/cvsspam/) and cvs2rss for RSS (http://raa.ruby-lang.org/project/cvs2rss/).

This pluggable system affords other advantages, allowing such things as automatically running regression test suites when new code is committed, populating commit databases with events, and keeping checked out copies in sync with the repository head.

3.2.4.1.4. Commit database

CVS supports the Bonsai commit database (http://www.mozilla.org/projects/bonsai/) through its pluggable triggers feature. Bonsai was created by Terry Weissman as a Mozilla project tool and written in Perl, storing its data using MySQL. Bonsai supports all the usual commit database features and is easy to set up and use, and fairly easy to integrate into your application by accessing the stored data directly.

Pros

It's free: Speaks for itself.
Tried and tested: CVS has been around forever. This means that other users have already gone through the painful process of discovering its bugs and weaknesses, which have either subsequently been fixed, or have at least been well documented (see "Cons" section).
Great client and utility availability: This ubiquity also means that there are clients and servers available on just about every platform, as well as the full complement of utilities.

Cons

File-level, rather than repository-level, versioning

Files in CVS each have a version number, rather then the repository having a version number. A side effect of this design is that all file modifications are saved independently of each other, even when committed as a group. The only way to tell if two modifications were committed in one go is to compare their commit times (which can vary a little, depending on how long the commit took) or compare the commit log message (if one was sent). Lacking a concept of "change sets" also means that commits of multiple files are not transactionalthat is to say, some files can be committed, while others are not (in the case of conflicts), which leaves the repository in an inconsistent state.

No ability to move or rename files

Because CVS tracks each file in the repository using an individual RCS file, there is no way to move a file within the repository. To simply rename a file, the recommended procedure is to delete the old file and add it again with its new name. This is fairly flawed because although the repository remains consistent, the revision history of the file is "lost," or at least well hidden.

An alternative approach is to manually copy the RCS file in the repository, so that the new version of the file has the full revision history. The downside to that approach is twofold. First, it requires an administrator to manually poke around inside the repository, which leaves you open to dangerous mistakes. Deleting a file via CVS leaves its history intact and restorable, but deleting its RCS file makes it lost forever. Second, when previous versions of the repository are checked out, the file exists both with its old name (which is correct), but also with its new name, even though the file didn't exist there at that time in the repository's history. In this way, the repository is left in an inconsistent state.

3.2.4.2. Subversion (SVN)

Subversion (http://subversion.tigris.org/) is an open source project, started in 2000, with the clear aim of developing a drop-in replacement for CVS, while fixing its major problems.

Unlike CVS, Subversion does not store its revision history using individual RCS files in the background. Subversion in fact has a pluggable backend and is currently able to store the repository using either Berkeley DB or FSFS. This database approach for revision storage allows Subversion to overcome the two main issues with CVS.

3.2.4.2.1. Client availability

Subversion clients are slowly becoming more common. For Windows users, TortoiseSVN (a clone of TortoiseCVS) is a sensible choice, giving full shell integration and making common actions trivially easy (http://tortoisesvn.tigris.org/). SCPlugin for OS X (http://scplugin.tigris.org/) allows you to browse and commit code directly from the Finder. As of version 8.2, BBEdit has integrated Subversion support. The Unix stalwarts Vi and Emacs both have subversion support through plugins, as does the Eclipse IDE.

3.2.4.2.2. Web interfaces

There are a few good choices for your Subversion repository over the Web. The Subversion authors created websvn (http://websvn.tigris.org/), a PHP application that works well and is fully featured. The trac protect management tool (http://www.edgewall.com/trac/) has support for Subversion integration and allows you to tie together revision control and issue tracking. Chora (http://horde.org/chora/) supports both CVS and Subversion, but requires the Horde framework. ViewVC (http://www.viewvc.org/), the sucessor to ViewCVS, includes full subversion support.

3.2.4.2.3. Mailing list and RSS feed

Subversion provides an extension mechanism similar to CVS, allowing execution of arbitrary scripts after a successful commit. Subversion itself comes with an email notification mechanism written in Perl. Programs to convert Subversion commits to RSS feeds are harder to find, though. There's no current favorite, but SubveRSSed fills the role quite well (http://base-art.net/Articles/46/).

3.2.4.2.4. Commit database

Subversion has a Bonsai clone in the shape of Kamikaze (http://kamikaze-qscm.tigris.org/). Kamikaze uses a Perl script at the backend to post commit data into a MySQL store. The data can then be queried using a PHP frontend application.

Pros

It's free: Speaks for itself.
Fileset commits: Sets of files are atomically committed together in a way which can be later referenced. The repository as a whole, rather than the individual files, has a revision number. Each set of committed files thus has a unique revision number that refers to the state of the repository immediately after the commit. Tags become less important because every revision number is a tag across the entire repository.
Allows moves and renames: Files can be renamed and moved in the repository with their edit histories intact. Subversion suffers none of the inconsistencies of CVS, and both old and new repository snapshots have only the correct files.

Cons

Unreliable storage

The default repository storage method, Berkeley DB, has some problems with corruption. Because Berkeley DB runs in the same process and Subversion accesses it directly (in contrast to standalone database), when Subversion crashes or hangs, the repository store also crashes and becomes wedged. As this point, the system administrator needs to recover the database and roll it back to the last checkpoint before the failure.

From Subversion 1.2 onward, FSFS, a second storage method, has become the default. FSFS stores data using flat files, so the whole repository can't be damaged by a single hung process. FSFS also solves the portability issues with Berkeley DB, in which it's not possible to move the repository from one kind of machine to another. The FSFS store is just a directory tree full of files and can be backed up and moved around between systems, just as with CVS.

Hard to compile

Although it's true that Subversion is difficult to compile, this isn't so much of an issue now as it was previously. Subversion client and server software is now available as precompiled binaries, so unless you're trying to use bleeding edge features, it's advisable to just use the provided binaries.

3.2.4.3. Perforce

Perforce (http://www.perforce.com/perforce/) is a commercial source-control system used by some large open source projects (including Perl 5), and is favored for its speed and flexibility. Although not free, it's included in our list here as an example of a commercial solution that provides a good alternative to its open source competitors. If you're looking for a source-control system with a company behind it to provide support, Perforce is a sensible choice.

Perforce follows the usual naming conventions, expect for repositories, which it calls depots.

3.2.4.3.1. Client availability

Perforce's own client software is the usual choice for developers. P4V, the Perforce Visual Client, is available for Windows, OS X, Linux, Solaris, and FreeBSD. P4V includes sophisticated tools for browsing the repository and tracking changes. P4Win is an additional Windows client with full feature support. The P4EXP extension also allows full Explorer shell integration for Windows users.

The Perforce client API allows third-party developers to integrate Perforce support into their clients, and BBEdit for OS X, among others, has built-in Perforce support.

3.2.4.3.2. Web interfaces

P4Web is the official Perforce web repository browser and offers all the features of the CVS and Subversion web repository browsers, as well as some Perforce-specific features. P4Web is free for licensed Perforce users.

3.2.4.3.3. Mailing list and RSS feed

Perforce has a trigger-based plug-in architecture similar to CVS and Subversion, allowing the generation of emails or RSS feeds. The Perforce manual has examples for creating an email generator but lacks any RSS examples. Creating one yourself should, however, be a trivial exercise, as the trigger scripting interface is well documented.

3.2.4.3.4. Commit database

Bonsai support for Perforce is currently planned, though there hasn't been any work on this feature as of yet. Some Bonsai-like features are exposed through P4V already, so this isn't as much of an issue.

Pros

Atomic commits and changesets: Like Subversion, Perforce supports atomic commits of changesets (a group of files modified together).
Good permissions framework: Perforce has a full permissions system, allowing you to restrict different portions of your repository to different developers.
Commercial-grade support: As Perforce is a commercial product, you're paying for real support from a real company. If you're having problems, you can open a support ticket rather than searching Usenet for the solution.

Cons

Expensive: The Perforce end-user license costs between $500 and $800 per user, which also includes a year of support. Open source projects can apply for a free license.

3.2.4.4. Visual Source Safe (VSS)

VSS (http://msdn.microsoft.com/ssafe/) is Microsoft's source-control system. It was historically a separate product but is now part of Visual Studio. In the past, VSS has only worked in locking mode, where files had to be opened for exclusive editing, but now it supports simultaneous editing and multiple edit merging.

VSS requires a license for each user, and has a single central repository that needs to be accessed over a network share. There is some support for developer permissions, with five different assignable levels. VSS is not used internally for Microsoft's application development (it's never been stated publicly), which might indicate that they don't have a lot of confidence in the product.

3.2.4.4.1. Client availability

The official client runs on Windows only, although several Windows IDE's have VSS support built in via its API. MainSoft (http://www.mainsoft.com) provides a Unix client, and CodeWarrior on OS X also provides integrated support.

3.2.4.4.2. Web interfaces

VssWebHandler is an ASP.NET interface to VSS repositories, requiring IIS, .NET, and a copy of the VSS client, although at present it hasn't been released to the public. The forthcoming version of VSS is rumored to include an official web interface named "SourceSafe Web Service," but information about it is sketchy. The VSS team's weblog contains periodic information (http://blogs.msdn.com/checkitout/).

3.2.4.4.3. Mailing list and RSS feed

There are currently no generally available email or RSS generation tools for VSS repositories.

3.2.4.4.4. Commit database

There is currently no generally available Bonsai-like commit database for VSS.

Pros

Easy Visual Studio integration: If you're using Visual Studio to build a .NET web application, using VSS is trivially easy.

Cons

No atomic changesets

As with CVS, VSS treats every file individually with no concept of sets of changes.

No ability to rename or move files

VSS doesn't allow files to be renamed or moved while retaining the version history. As with CVS, you can get around this issue by deleting and re-adding the file, although the revision history is lost.

No annotation or blame log

VSS has no annotation feature, so it doesn't allow you to easily see who edited which part of a file. You can emulate an annotation feature by viewing diffs between revisions, but this process quickly becomes impossible for files with many revisions.

Bad networking support

VSS relies on Windows network shares to allow multiple users to work on the same repository. This makes working in remote locations very difficult, without either using something like VNC to work remotely on a local machine or using a VPN to mount the network drive as if you were on the same network.

For developers who absolutely must work remotely with VSS repositories, the product SourceOffSite (http://www.sourcegear.com/sos/) allows VSS to be used in a classic client-server model, with encryption to avoid letting people read your source code as it's sent over the wire.

3.2.4.5. And the rest . . .

These four products are the most widely used source-control systems in web development today, but there are plenty of other systems with smaller install bases and similar features. If you feel that none of these four meets your needs, then you should check out some of the others by visiting the Better SCM web site (http://better-scm.berlios.de/alternatives/), which compares a larger collection of systems.

3.2.4.6. Summary

Unless you have a clear reason not to, Subversion is the obvious choice for working with web applications. Subversion has all the tools you could want, both in the official distribution and in the shape of third-party applications.

There are still a few circumstances in which you might want to go with another product, though. Client integration is usually the biggest issue, since it's most important that it is easy for your developers to work with their code and commit changes.

If you're already using a source-control product and it's working well for you, switching can be quite a paina new repository, new client software, and a set of new tools are all going to take time to transition to. Most source-control systems (with the exception of VSS) allow you to import your repository from other source-control systems and retain version histories, so transitioning systems doesn't have to mean throwing away everything you already have.

Finally, if you're looking for a product with commercial support, then you're going to want to pick something like Perforce. But it's worth bearing in mind that after spending a couple of hours with Subversion, you might start to feel that it's easy enough to not require a support contract.

3.2.5. What to Put in Source Control

When people first approach source control, it seems obvious to use it for application code, but there's no reason to stop there. Typically a whole web application, including source files, static HTML, and image assets, can all be added as a single project or module. If you're not going to simply expand source control to cover the entirety of the application, there are a few other key assets that do belong there.

3.2.5.1. Documentation

Hopefully, your application has some documentation. This might include internal or external API documentation, software, hardware, and network architecture diagrams; recovery and maintenance procedures; OS install and setup guides; and any sort of programmer reference material.

Putting general site documentation into source control helps you in a couple of ways. The obvious benefits of versioning and rollback apply, allowing you to look over the history of changes to your procedures and architecture. Such a history is often useful when tied into your application's main source control, as changes in application source code and documentation can be cross-referenced. But in addition to the regular source control benefits, having your documentation in a single location that can't be accidentally deleted and all developers can read and write, brings a lot of benefit to your development. It can help to enforce an amount of discipline in documentation (as I'm sure you know, most developers have an acute allergy to both the reading and writing of documentation) by making it accountable and trackable via source-control annotation logs and commit logs. It also means that developers can always find documentation when they need itall your developers can check out code from source control, and there's only one place to look for it.

It can be useful to automatically publish your documentation project to a web server to allow your documentation to be read via the Web. Automating this procedure to publish documents directly from source control every hour or so ensures that the documentation never goes out of sync and is available instantly, which is useful in cases where developers can't perform a source-control checkout, such as when they're desperately trying to restart services from an Internet cafe in the middle of nowhere and can't remember the procedure.

3.2.5.2. Software configurations

Assuming that you're serving your application on Unix, your software configurations are almost certainly simple text files. As such, they're prime candidates for storing in source control. Although not traditionally thought of as part of your application's code, applications can be so reliant on specific software setups that it helps to think of your application as including, for example, your web and mail server configurations.

The Apache configuration file (httpd.conf) will often contain application-tuned rules (such as connections, keepalives, special modules) and even application logic (rewrite rules, HTTP auth configs, vhosts), so keeping it in source control alongside your application source code makes a lot of sense. But it's not just web server configurationsyou should consider storing any configuration file that isn't the default out-of-the-box version. This can include web server (httpd.conf, php.conf, etc.), mail server (aliases, virtual, transport, etc.), PHP (php.ini), MySQL (my.ini), and anything else you've installed as part of your application setup.

At first it may seem adequate to keep a snapshot of current configs for jumping new machines and restoring broken ones, but the other features of source control can become useful. If you see that the load profile for your web servers jumped up a month ago, you can look at the change history for your configs and see if anything was changed around that timeand then roll back if deemed necessary. You can see who changed an incoming mail rule, and read the commit log to find out why.

3.2.5.3. Build tools

Your build tools (which we'll talk about in the next section) are prime candidates for source control, along with any programs or scripts used to support your application. Hopefully by this point the general rule of thumb has become apparentany file you modify more than once should be in source control, whether you consider it a part of the application or not. Taking a holistic view of web applications, you can say that anything that's contained on your servers, including the servers themselves, are part of your application.

3.2.6. What Not to Put in Source Control

Not everything is necessarily right for source control, depending on the system you choose to use and your available hardware platform. If your application has a compiled program component, then it's usually only worth putting the source for the application (including makefiles or build instructions) into source control, rather than the compiled output. Large binary files aren't what source control is designed for. Instead of storing the delta between two subsequent revisions, the system has to store every revision separately. This means you can't perform diffs or merges or look at annotation logs.

Other compiled software falls into the same category. If you compile your own Linux kernel, there's no need to store the kernel in source controljust the changes you made to the source code. In addition to binary files, large files (larger than a megabyte) are not usually suited to source control (though this can depend very much on your choice of systemSubversion plays nicely with large files, while CVS does not). Source control is designed for small text files, and using it for anything else will only cause you problems.