Why PVCS is unsuitable for Linux software development

0. Contents

1. Overview

The purpose of a source code control system is to allow developers to make changes to a body of code, while allowing anyone to obtain snapshots of that code as it existed at earlier times.

The terminology of the "code base" varies. PVCS calls that a product, CVS calls it a module. Whatever the name, the purpose is the same. We have some symbolic identifier for the code base as a whole, and that same identifier is used by all the team members to identify which particular code base they are working on (isp, kiosk, etc).

Likewise, the terminology for those snapshots varies. PVCS calls them baselines, CVS calls them tags. Whatever the name, the function is the same. We have some symbolic identifier for the state of the code at some previous point in time. At a minimum, we must be able to tag or name all the pieces of code that were present at the time of a release to QA or production. As we shall see later, this is tied into the problem of automatically tagging or naming nightly builds.

2. Fetch workset fails to fetch checked-out items

We have a script running under Linux to fetch the latest version of a workset. Checkout a file, make a modification, and check it in. The version number increases, so we now have file#1 and file#2 versions. Fetch the latest, and we get version 2. So far, all is well. Now checkout that file again, and the PVCS gui client on Windows does not update the version number - it stays at version 2. The history command shows two different version 2 copies, one created from version 1, and one in an intermediate "to be created" state. Now fetch the latest version, and we get version 1, which is clearly incorrect, since moments ago, the latest version was version 2.

3. Replication issues

The operations group has serious problems making replication to remote repositories work properly. This is replication between continents over slow data links. The only workaround that has been provided involves deleting all the logs before running the replication, which transfers the entire repository again, even though almost all of it already exists at the remote site. In our environment, this takes over 14 hours to complete.

Update - replication is apparently now working without copying the entire repository, but we have a new replication issue. When the remote site in India updates an item, it is apparently placed on some pvcs branch that is then locked. We cannot then update that item. So any item that is modified by a remote site can no longer be modified by the main site.

4. AIWS does not work on Linux (partial resolution)

In CVS we just remove/commit to delete a file. The equivalent on PVCS is the RIWS command, which removes the file from the workset. The file or item is still in the global workset, since there may be other snapshots that refer to that item. Now suppose that the file is again added to the system. In CVS, we just add/commit and the file is back. The equivalent on PVCS is the AIWS which adds the item back to the workset.

There are two problems here. First is that the AIWS command seems to require the fully specified item id, and since the item only exists in the global workset, it is unclear how to obtain that from the Linux command line. The second more serious problem is that the AIWS command simply does not work under Linux. From the Windows GUI, we can "export to workset" a file, and that GUI shows us the actual AIWS command that it used to do the work. Running that same command on Linux gives us a 2320 error - item not in workset. Wonderful, that is the problem we are trying to fix by running that command.

This has been partially resolved by removing and recreating the entire project in PVCS. For now, we cannot reproduce this issue.

5. Problems finding the item id given a filename

This is tied into both the performance issues below, and the (mostly solved) issue of removing files. In order to remove a file, we need to issue the RIWS command, which requires the fully qualified item id including the revision number. We can obtain a list of (item id, filename) pairs by using the CBL/REL/DREL/DBL commands. Those are the commands that we use to fetch the entire project from PVCS, and as a side effect, we keep the parts file which includes that list of (item id, filename) pairs. This allows us to grep that list for the filename, and obtain the item id for the RIWS command. The problem arises when we want to fetch some updates from PVCS via the FI command. That command does not tell us the item id of the item that it just fetched, and if we later want to remove that item, we no longer have the required fully qualified item id. We can solve that by issuing the CBL/REL/DREL/DBL commands for every item that we fetch with the FI command, but that will lead to very poor performance.

We may also run into authorization problems using CBL/REL/DREL/DBL to obtain the list of (item id, filename) pairs. It is not clear that we can allow every user the ability to create and delete baselines.

6. Apparent inability to branch

Many source code control systems use the 'branch' terminology, but they mean very different things. For the purpose of this discussion, a branch to a code base will mean that we have two or more symbolic names for versions of that code base, and we can make independent changes to each of those branches. Note that these changes may include adding or removing files, in addition to modifying existing source files. In particular, we may add function_one() to file.c on branch 1, add newfile.c on branch 1, add function_two() to file.c on branch 2, and remove oldfile.c on branch 2. So far, that could be accomplished by simply duplicating the entire code base under a different project or module name, and that is how some systems handle this.

However, we also want to be able to easily

The early solutions that were proposed in PVCS for this problem essentially involve duplicating the code base. At the time of creation of the branch, it seems to be required to check in all the code to the new branch. Although PVCS uses the word 'branch' in their documentation, we have obtained the following response from support.serena.com - "Named branches are used for Replication. There is no way to create a baseline off a named branch". Which just confirms that they are using the word "branch" to mean something entirely different.

It has now been proposed that we use PVCS worksets as a replacement for CVS branches. It seems to be relatively easy to create a workset based on an existing workset. But I am sceptical about the ability to do the merge. PVCS has the CMB (create merged baseline) and MWS (merge worksets) commands, and it seems likely that they operate in a similar manner. The command line reference guide documentation for CMB is very clear that the results of the merge depends on the order in which the input baselines are specified. Basically, the first baseline that contains an item (file) wins. So in our case above, we either get file.c with function_one or function_two, but not both. In contrast to the CMB documentation, the command line reference guide does not define the results of the MWS command. The PVCS User Guide does have some documentation about merging worksets, but it is all given in terms of the GUI client or web interfaces, and it never defines the results of the merge.

7. Problems with automated tagging

During development, we do nightly builds so that all the developers have access to a machine installed with the latest code. Using CVS, it is trivial to tag every build, and we do so. Our build scripts also automatically increment the version number, so we can guarantee that every build has a unique version number, and that version number is also used as the name of the tag. This allows any developer to see all the changes that were made between any two builds, which helps to isolate the source code changes responsible for unwanted changes in behavior. As a result, we currently have over 500 different tags in the CVS respository for the kiosk software, after about 10 months of development.

According to reports, PVCS is apparently unable to handle more than a few dozen such tags (baselines in PVCS), without serious performance problems. This means that we cannot automatically tag or name the nightly builds. If the nightly build tests good, without a tag you cannot actually retrieve that set of source code files again. So even after knowing that that particular build tests good, you cannot then tag or name those source files, unless you can GUARANTEE that no one has made any changes between the time of the build, and the time when such testing was complete. That time difference is typically some small number of days.

According to Serena, this is not a problem. We will test that when we start automated tagging again, but for now this item is closed.

8. No concurrent development

There are MANY times where we have multiple developers making concurrent changes to the same files. For example, we typically have a single file that contains all the messages that appear in the UI. This single file is then supplied to the translators for localization. If we cannot have concurrent access to that file, then we need to lock it with some check-out/check-in scheme. This raises the obvious bottleneck issues, where someone checks out the file, and then forgets that it is checked out, and then goes home or on vacation. With CVS, such problems do not exist, since we have concurrent access to the entire source code tree.

9. No support for vendor branches

Suppose we have GPL (or other) source code from a "vendor" such as debian or samba. Further, suppose we need to make changes to that code to support our project. So we start with some version of the vendor code, and we have a bunch of source changes. Now we notice there is a new version of the vendor code available, and we would like to update to that new version, but we need to preserve all our custom changes. This is a trivial operation in CVS, but there does not seem to be support for anything like this in PVCS.

10. No integration with Eclipse

This item is self explanatory. When this was first written, PVCS had no module to transparently do the required operations within Eclipse 3, on either Windows or Linux. Any project that is primarily written in Java should be developed with Eclipse since it is free, superior to anything else that is available, and is available on multiple platforms including Linux. PVCS now has a Windows Eclipse 3 plugin, but they do not support Eclipse on Linux as of 2006-03-27. They claim to fully support Eclipse on both Windows and Linux in the next release (Dimensions 10?) but I have not seen that version.

11. Performance

For the kiosk project, it currently takes about 2 minutes to update the version number, retrieve the latest source code, and tag or name that verion in CVS. The entire automated build process takes about 10 minutes, including the previous steps. It takes about 10 minutes per kiosk to force the new build onto the kiosk, but that is done in parallel across all the kiosks to be installed.

Using PVCS, it takes about 20 minutes just to retrieve the source code, so PVCS is about one order of magnitude slower than CVS, and that does not include the time to tag or create a baseline.

12. Flexible Search

Whenever a developer modifies a file, they generally add a comment in the source code control system which documents the reason for the change. This extra meta data may include such things as links to further documentation, references to the bug report, etc. Depending on the source code control system, those extra items may be required, and the links or references may be verified before the source code control system accepts the modified file into the repository.

It is nice (almost to the point of a requirement) that this metadata be easily searchable. That is apparently not the case with PVCS. There is apparently no mechanism in the PVCS gui to search for a string in the comment or description fields of the items in the PVCS database. Such a search required a custom SQL query to be written by the PVCS administrator.

13. Apparent inability to remove files (solved)

Consider the following situation. At time T1, our project has files A, B and C. We generate a release to QA, and this gets tagged or named as QA1. At any later time, we must be able to use the source code control system to fetch the files that went into QA1, and of course that must include files A, B and C.

At some later time, we remove file B, since the existance of that file was an error. We are now at time T2, and our project has files A and C. We release this to QA and tag or name this QA2. At any later time, we must be able to use the source code control system to fetch the files that went into QA2, and of course that must include files A and C, but it must not include file B, since that was NOT part of the QA2 release.

Of course this is trivial in CVS, we just remove and commit B at time T2. The difficulty with PVCS was caused by a bug in the mechanism used to fetch all the files from either QA1 or QA2. The REL (release) command generates a list of all the FI (fetch item) commands that are needed, but that list needs to be modified. As generated by PVCS, that list specifies the wrong workset. Using grep/awk/sed we can change that to specify /WORKSET="$GENERIC:$GLOBAL".

We use the RIWS command to remove the item from the workset. Actually, we need to repeat that command until it fails, thereby removing ALL versions of file B from the workset. Unless that is done, an extract of the current workset will still include some old version of file B. One issue with the RIWS command is that the documentation claims that it will accept the filename OR the fully qualified item id, but that is incorrect. At least on our version, it requires the fully qualified item id, and there does not seem to be any easy mechanism to obtain that id given a filename that you want to remove.