Revision control concepts – Revision Control System (RCS), Concurrent Versions System (CVS) and Subversion
Revision control (also known as version control (system) (VCS), source control or (source) code management (SCM)) is the management of multiple revisions of the same unit of information. It is most commonly used in engineering and software development to manage ongoing development of digital documents like application source code, art resources such as blueprints or electronic models, and other projects that may be worked on by a team of people. Changes to these documents are usually identified by incrementing an associated number or letter code, termed the “revision number”, “revision level”, or simply “revision” and associated historically with the person making the change. A simple form of revision control, for example, has the initial issue of a drawing assigned the revision number “1”. When the first change is made, the revision number is incremented to “2” and so on.
Version control systems are most commonly stand-alone applications, but revision control is also embedded in various types of software like word processors (e.g. Microsoft Word, OpenOffice.org Writer, KOffice, Pages, Google Docs), spreadsheets (e.g. OpenOffice.org Calc, Google Spreadsheets, Microsoft Excel), and in various content management systems. Integrated revision control is a key feature of wiki software packages such as MediaWiki, DokuWiki, TWiki, etc. In wikis, revision control allows for the ability to revert a page to a previous revision, which is critical for allowing editors to track each other’s edits, correct mistakes, and defend public wikis against vandalism and spam.
The Revision Control System (RCS) is a software implementation of revision control that automates the storing, retrieval, logging, identification, and merging of revisions. RCS is useful for text that is revised frequently, for example programs, documentation, procedural graphics, papers, and form letters. RCS is also capable of handling binary files, though with reduced efficiency and efficacy. Revisions are stored with the aid of the diff utility.
RCS was initially developed in the 1980s by Walter F. Tichy while he was at Purdue University as a free and more evolved alternative to the then-popular Source Code Control System (SCCS). It is now part of the GNU Project but is still maintained by Purdue University.
RCS operates only on single files, has no way of working with an entire project. Although it provides branching for individual files, the version syntax is cumbersome. Instead of using branches, many teams just use the built-in locking mechanism and work on a single head branch.
A simple system called CVS was developed capable of dealing with RCS files en masse, and this was the next natural step of evolution of this concept, as it “transcends but includes” elements of its predecessor. CVS was originally a set of scripts which used RCS programs to manage the files. It no longer does that, rather it operates directly on the files itself.
A later higher-level system PRCS uses RCS-like files but was never simply a wrapper. In contrast to CVS, PRCS improves the delta compression of the RCS files using Xdelta.
In single-user scenarios, such as server configuration files or automation scripts, RCS may still be the preferred revision control tool as it is simple and no central repository needs to be accessible for it to save revisions. This makes it a more reliable tool when the system is in dire maintenance conditions. Additionally, the saved backup files are easily visible to the administration so the operation is straightforward. However, there are no built-in tamper protection mechanisms (that is, users who can use the RCS tools to version a file also, by design, are able to directly manipulate the corresponding version control file) and this is leading some security conscious administrators to consider client/server version control systems that restrict users’ ability to alter the version control files.
Some wiki engines, including TWiki, use RCS for storing page revisions.
Basics of RCS
The Revision Control System (RCS) was designed by Walter Tichy of the Department of Computer Science at Purdue University. RCS is a software tool for UNIX systems which lets people working on the system control “multiple revisions of text … that is revised frequently, such as programs or documentation.” [TIC82] It can be applied to development situations of all sorts, including the creation of documents, drawings, forms, articles, and of course, source code.
In a typical software development environment, many developers will be engaged in work on one code base. If everyone was to be allowed to edit and modify any development item whenever they felt like it, it should be obvious that chaos would tend to result while little productive work got done. Instead of suffering under such an environment, most developers prefer to implement version control tools such as the one described here.
Version control tools allow an individual developer to maintain control over an item such as a source file while they implement and test changes to it. During this period, while the file is “checked out”, no one else is allowed to modify it. When the developer is satisfied that the changes have been implemented successfully and that they broke nothing in the meantime, the file may be “checked in” to the baseline database for use by everyone. The specifics of this process for RCS will be discussed in the tutorial proper.
Earlier implementations of version control tools include AT&T’s SCCS, IBM’s CLEAR/CASTER, and DEC’s CMS. RCS, implemented in 1982, attempts to fix several shortcomings the author found in previous tools.
What is CVS?
CVS is a version control system. Using it, you can record the history of your source files. For example, bugs sometimes creep in when software is modified, and you might not detect the bug until a long time after you make the modification. With CVS , you can easily retrieve old versions to see exactly which change caused the bug. This can sometimes be a big help.
You could, of course, save every version of every file you have ever created. This would, however, waste an enormous amount of disk space. CVS stores all the versions of a file in a single file in a clever way that only stores the differences between versions. CVS also helps you if you are part of a group of people working on the same project. It is all too easy to overwrite each other’s changes unless you are extremely careful. Some editors, like GNU Emacs, try to make sure that two people never modify the same file at the same time. Unfortunately, if someone is using another editor, that safeguard will not work. CVS solves this problem by insulating the different developers from each other. Every developer works in his own directory, and CVS merges the work when each developer is done. CVS started out as a bunch of shell scripts written by Dick Grune and posted to the newsgroup
comp.sources.unix in the volume 6 release of July 1986. While no actual code from these shell scripts is present in the current version of CVS , many of the CVS conflict resolution algorithms come from them.
In April 1989, Brian Berliner designed and coded CVS . Jeff Polk later helped Brian with the design of the CVS module and vendor branch support.
CVS uses a client-server architecture: a server stores the current version(s) of a project and its history, and clients connect to the server in order to “check out” a complete copy of the project, work on this copy and then later “check in” their changes. Typically, the client and server connect over a LAN or over the Internet, but client and server may both run on the same machine if CVS has the task of keeping track of the version history of a project with only local developers. The server software normally runs on Unix (although at least the CVSNT server supports various flavors of Microsoft Windows and Linux), while CVS clients may run on any major operating-system platform.
Several developers may work on the same project concurrently, each one editing files within their own “working copy” of the project, and sending (or checking in) their modifications to the server. To avoid the possibility of people stepping on each other’s toes, the server will only accept changes made to the most recent version of a file. Developers are therefore expected to keep their working copy up-to-date by incorporating other people’s changes on a regular basis. This task is mostly handled automatically by the CVS client, requiring manual intervention only when a conflict arises between a checked-in modification and the yet-unchecked local version of a file.
If the check-in operation succeeds, then the version numbers of all files involved automatically increment, and the CVS-server writes a user-supplied description line, the date and the author’s name to its log files. CVS can also run external, user-specified log processing scripts following each commit. These scripts are installed by an entry in CVS’s loginfo file, which can trigger email notification or convert the log data into a Web-based format.
Clients can also compare versions, request a complete history of changes, or check out a historical snapshot of the project as of a given date or as of a revision number. Many open-source projects allow “anonymous read access”, a feature pioneered by OpenBSD. This means that clients may check out and compare versions with either a blank or simple published password (e.g., “anoncvs”); only the check-in of changes requires a personal account and password in these scenarios.
Clients can also use the “update” command in order to bring their local copies up-to-date with the newest version on the server. This eliminates the need for repeated downloading of the whole project.
CVS can also maintain different “branches” of a project. For instance, a released version of the software project may form one branch, used for bug fixes, while a version under current development, with major changes and new features, can form a separate branch.
CVS uses delta compression for efficient storage of different versions of the same file. The implementation favors files with many lines (usually text files) – in extreme cases the system may store individual copies of each version rather than deltas.
Distributed Concurrent Versions System
DCVS is the CVS-based versions control system for software development across physically distributed sites. It extends the programs CVS and CVSup with functionality to distribute CVS repositories with local lines of development and automatically handles the synchronization of the distributed repositories in the background. DCVS enables globally distributed developer teams to efficiently and securely collaborate with the same software product components. The DCVS concept of unique association of development lines prevents collisions and thus data loss.
Particularly DCVS provides for:
- any desired number of programmers concurrently developing an arbitrary number of development lines in workspaces isolated from each other
- worldwide and transparent distribution of repositories via secure connection; efficient replication and quick local server access for all operations
- integration in nearly all current GUIs and IDEs through the use of the default CVS protocol between the CVS clients and DCVS servers
- the functionality of snapshots replacing the more susceptible tags
- the functionality of change sets as powerful means for structuring code development and release/change management
- configurable e-mail notifications (add on)
- hooks for process control
- user control and authentication either by local system specific defaults or by distributed LDAP servers
- installation packages for all popular operation systems
Subversion is a free/open-source version control system. That is, Subversion manages files and directories over time. A tree of files is placed into a central repository. The repository is much like an ordinary file server, except that it remembers every change ever made to your files and directories. This allows you to recover older versions of your data, or examine the history of how your data changed. In this regard, many people think of a version control system as a sort of “time machine”.
Subversion can access its repository across networks, which allows it to be used by people on different computers. At some level, the ability for various people to modify and manage the same set of data from their respective locations fosters collaboration. Progress can occur more quickly without a single conduit through which all modifications must occur. And because the work is versioned, you need not fear that quality is the trade-off for losing that conduit—if some incorrect change is made to the data, just undo that change.
Some version control systems are also software configuration management (SCM) systems. These systems are specifically tailored to manage trees of source code, and have many features that are specific to software development—such as natively understanding programming languages, or supplying tools for building software. Subversion, however, is not one of these systems. It is a general system that can be used to manage any collection of files. For you, those files might be source code—for others, anything from grocery shopping lists to digital video mixdowns and beyond.
In early 2000, CollabNet, Inc. (http://www.collab.net) began seeking developers to write a replacement for CVS. CollabNet offers a collaboration software suite called SourceCast, of which one component is version control. Although SourceCast used CVS as its initial version control system, CVS’s limitations were obvious from the beginning, and CollabNet knew it would eventually have to find something better. Unfortunately, CVS had become the de facto standard in the open source world largely because there wasn’t anything better, at least not under a free license. So CollabNet determined to write a new version control system from scratch, retaining the basic ideas of CVS, but without the bugs and misfeatures.
In February 2000, they contacted Karl Fogel, the author of Open Source Development with CVS (Coriolis, 1999), and asked if he’d like to work on this new project. Coincidentally, at the time Karl was already discussing a design for a new version control system with his friend Jim Blandy. In 1995, the two had started Cyclic Software, a company providing CVS support contracts, and although they later sold the business, they still used CVS every day at their jobs. Their frustration with CVS had led Jim to think carefully about better ways to manage versioned data, and he’d already come up with not only the name “Subversion”, but also with the basic design of the Subversion repository. When CollabNet called, Karl immediately agreed to work on the project, and Jim got his employer, RedHat Software, to essentially donate him to the project for an indefinite period of time. CollabNet hired Karl and Ben Collins-Sussman, and detailed design work began in May. With the help of some well-placed prods from Brian Behlendorf and Jason Robbins of CollabNet, and Greg Stein (at the time an independent developer active in the WebDAV/DeltaV specification process), Subversion quickly attracted a community of active developers. It turned out that many people had had the same frustrating experiences with CVS, and welcomed the chance to finally do something about it.
The original design team settled on some simple goals. They didn’t want to break new ground in version control methodology, they just wanted to fix CVS. They decided that Subversion would match CVS’s features, and preserve the same development model, but not duplicate CVS’s most obvious flaws. And although it did not need to be a drop-in replacement for CVS, it should be similar enough that any CVS user could make the switch with little effort.
After fourteen months of coding, Subversion became “self-hosting” on August 31, 2001. That is, Subversion developers stopped using CVS to manage Subversion’s own source code, and started using Subversion instead.
While CollabNet started the project, and still funds a large chunk of the work (it pays the salaries of a few full-time Subversion developers), Subversion is run like most open-source projects, governed by a loose, transparent set of rules that encourage meritocracy. CollabNet’s copyright license is fully compliant with the Debian Free Software Guidelines. In other words, anyone is free to download, modify, and redistribute Subversion as he pleases; no permission from CollabNet or anyone else is required.
When discussing the features that Subversion brings to the version control table, it is often helpful to speak of them in terms of how they improve upon CVS’s design. If you’re not familiar with CVS, you may not understand all of these features. And if you’re not familiar with version control at all, your eyes may glaze over unless you first read Chapter 2, Basic Concepts, in which we provide a gentle introduction to version control in general.
- Directory versioning
- CVS only tracks the history of individual files, but Subversion implements a “virtual” versioned filesystem that tracks changes to whole directory trees over time. Files and directories are versioned.
- True version history
- Since CVS is limited to file versioning, operations such as copies and renames—which might happen to files, but which are really changes to the contents of some containing directory—aren’t supported in CVS. Additionally, in CVS you cannot replace a versioned file with some new thing of the same name without the new item inheriting the history of the old—perhaps completely unrelated— file. With Subversion, you can add, delete, copy, and rename both files and directories. And every newly added file begins with a fresh, clean history all its own.
- Atomic commits
- A collection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.
- Versioned metadata
- Each file and directory has a set of properties—keys and their values— associated with it. You can create and store any arbitrary key/value pairs you wish. Properties are versioned over time, just like file contents.
- Choice of network layers
- Subversion has an abstracted notion of repository access, making it easy for people to implement new network mechanisms. Subversion can plug into the Apache HTTP Server as an extension module. This gives Subversion a big advantage in stability and interoperability, and instant access to existing features provided by that server—authentication, authorization, wire compression, and so on. A more lightweight, standalone Subversion server process is also available. This server speaks a custom protocol which can be easily tunneled over SSH.
- Consistent data handling
- Subversion expresses file differences using a binary differencing algorithm, which works identically on both text (human-readable) and binary (human-unreadable) files. Both types of files are stored equally compressed in the repository, and differences are transmitted in both directions across the network.
- Efficient branching and tagging
- The cost of branching and tagging need not be proportional to the project size. Subversion creates branches and tags by simply copying the project, using a mechanism similar to a hard-link. Thus these operations take only a very small, constant amount of time.
- Subversion has no historical baggage; it is implemented as a collection of shared C libraries with well-defined APIs. This makes Subversion extremely maintainable and usable by other applications and languages.
Pros and Cons of Subversion over CVS
This list is written from the point of view of Subversion. That is, everything labelled “pro” below is an advantage of Subversion over CVS, and everything labelled “con” is a disadvantage of Subversion.
- Most obvious change is of course the single revision number.
- pro: makes it easy to know what went into a build
- pro: largely removes the need for date-based checkouts, since remembering one revision number is even easier than remembering a date. Just arrange for the revision number to be embedded in every binary you build, and you’re done.
- The branching model is extremely different. In CVS, branches are represented as a fork in the time dimension; both branches of a file are seen at the same absolute pathname, and you switch between them by deciding which revision number branch you’re working on. In SVN, the time dimension is unbroken and linear; an SVN repository is simply an ordinary-looking file system seen at a number of snapshots throughout its history. Branches are not officially recognised by the software at all; to create a branch you just copy (say) the path “
/myproj” to “
/myproj-branch” and do further development by checking out from there instead (exactly as you might have done if you wanted to branch the software in the absence of any version control). SVN’s support for cheap file and directory copies makes this internally efficient, and the “
svn switch” command is provided as a means of conveniently moving a single working copy directory between branches.
- pro: creating a branch is now a version-controlled operation, meaning you get to track who did it, when and why
- pro: “
svn ls” can easily show you what branches exist and are active, which was always hard to keep track of in CVS
- pro: you can delete a branch when you’ve finished with it (although if anyone later needs to recover it, they can always do so by using “
svn cp” from an earlier revision)
- con: the benefit of the single revision number is partially undone by this branching model, because to specify what files went into a build you now have to specify a pathname within the repository as well as the revision number. That said, at least you can do that at all – CVS didn’t even permit you to give a branch tag and a date tag together!
- con: there is no way to bring together all the various branched versions of a single source file, if (for example) you’re trying to remember which branch a particular change was made on. CVS didn’t exactly support this either, but it was at least practically feasible in some cases by grepping the
- con: it’s also rather heavyweight if you only wanted to branch one single file for some unofficial or private purpose.
- SVN stores a pristine copy of every checked-out source file on the client side, in a subdirectory of “
- pro: this allows common operations such as “
svn status” and some forms of “
svn diff” to work entirely locally without needing to talk to the SVN server.
- con: but of course your working copies take up twice as much space. Where I work this is a significant issue, since we have so much source!
- con: also, if your working copies are stored on an NFS volume, you might not even save much time, since it doesn’t make much difference whether you’re comparing your working files with pristine copies on an NFS server or on an SVN server – the network RTT and transfer rate is the limiting factor either way. So if you use network file servers, you don’t even get much benefit in return for the space cost.
- con: recursive greps of source directories now turn up lots of bogus hits in “
.svn” subdirectories. I don’t doubt that at some point someone will provide a recursive grep which leaves out
.svndirectories without having to be told to every time, but I haven’t seen it yet.
- pro: despite the large number of cons to go with the one pro, I will stress that the pro is a big pro in many circumstances and often outweighs all of the cons!
- pro: rumour has it that at some point the local pristine copies might become optional, allowing users to choose their own tradeoff between local storage and network utilisation
- pro: this allows common operations such as “
- improved security model as discussed at length in my article
- pro: even better still in SVN 1.3, with proper access control even in svnserve
- repository storage mechanism is totally different
- con: Berkeley DB format is a horrid mess, self-admittedly prone to corruption incidents requiring manual recovery, and IME terribly slow
- pro: fortunately, they saw the light and implemented fsfs, which is outstandingly cool and now also the default
- pro: fsfs is extremely friendly to incremental backup strategies
- pro: fsfs permits safe (race-condition-free) read-only access to a repository
- con: neither of SVN’s formats provides the simplicity of CVS’s from the point of view of someone poking around the on-disk repository by hand. Many functions not officially implemented in CVS could be worked around by hand-grepping – or occasionally hand-editing – the
,vfiles. This is not feasible in SVN, so you’re stuck with only the repository search and manipulation tools provided. Fortunately, these are by and large good enough. (Also, CVS’s conceptually simple storage format was a large part of what made it hard to support file renames.)
- lots of small but important ways in which SVN has fixed CVS’s annoying brokennesses
- pro: genuinely atomic commits (CVS can fail part-way through, and also tags the various files in a single commit with independently generated timestamps, so that a date-based checkout can occasionally give you only half of a big change)
- pro: support for moving and renaming files
- pro: separate “
status” from “
update” command, so you can quickly see what you’ve changed locally without having to risk making a mess with conflict markers. “
cvs -n up” would have done this too, but not so nicely, and nobody I know ever remembered to use it.
- pro: when conflicts do show up during “
svn update“, all three versions of the file are preserved (the local version, the new remote version, and their common parent the old remote version), which is occasionally a lifesaver
- con: SVN will now not automatically cope once you’ve removed the conflict markers from a file. It will have marked the file as “conflicted” when it first displayed the C status, and when you’ve sorted it out you have to manually tell it “
resolved“. OTOH, this could be seen as a pro, since CVS’s failure to track this allowed you to check in conflict markers by mistake.
- pro: the all-important “
revert” command is something that it ought to have been a criminally culpable oversight to have left out of CVS, or at least to have continued to leave out of it after the first half-hour of use!
IDEs with support for CVS
IDEs with support for Subversion
List of Integrated Development Environments (IDEs), which support or can be integrated with Subversion:
- Panic’s Coda — As of version 1.5, Coda has integrated Subversion support!
- IntelliJ IDEA
- Xcode — Apple’s Mac OS X IDE
- Microsoft Visual Studio
- Borland Delphi
- DelphiSVN — Subversion integration for Borland Delphi
- Subclipse — an open source project that integrates Subversion into Eclipse IDE
- Subversive — SVN Team Provider is an Eclipse project aimed at providing Subversion support in Eclipse similar to that previously provided for CVS.
- EasyEclipse — a software bundle that packages Eclipse and several other plug-ins and tools into a single installer. One of the Eclipse plug-ins included in this bundle is a Subversion plug-in
- Zend Studio
- Eric Python IDE
- ActiveState Komodo
- BBEdit — Mac OS X-based text editor that interfaces with Subversion (among other version control tools)
- UEStudio — lightweight IDE
- SharpDevelop — free IDE for C#, VB.NET and Boo projects on Microsoft’s .NET platform.
- TextMate — an extensible editor for Mac OS X
- Dreamweaver version CS4 now includes support for Subversion. Previous versions did not offer support.
- Subweaver — a Dreamweaver extension that integrates SVN commands by interacting with a TortoiseSVN installation
Comparison of revision control software
From Wikipedia, the free encyclopedia
The following tables compare general and technical information for notable revision control and software configuration management (SCM) software. This article is not all-inclusive and may become out of date quickly.