Are subversion externals an antipattern?

I am the author of the quote in the question, which came from a previous answer.

Jason is right to be suspicious of brief statements such as mine, and to ask for an explanation. Of course, if I fully explained everything in that answer, I would need to have written a book.

Mike is also right to point out that one of the problems with an svn:external-like feature is that changes in the targeted source could break your own source, especially if that targeted source is in a repository that you do not own.

In further explaining my comment, let me first say that there are “safe” ways to use the svn:external-like feature, just as with any other tool or feature. However, I refer to it as an antipattern because the feature is far more likely to be misused. In my experience, it has always been misused, and I find myself very unlikely to ever use it in that safe manner nor to ever recommend that use. Please further note that I mean NO disparagement to the Subversion team–I love Subversion, although I plan to move on to Bazaar.

The primary issue with this feature is that it encourages and it is typically used to directly link the source of one build (“project”) to the source of another, or to link the project to a binary (DLL, JAR, etc.) on which it depends. Neither of these uses is wise, and they constitute an antipattern.

As I said in my other answer, I believe that an essential principle for software builds is that each project constructs exactly ONE binary or primary deliverable. This can be considered an application of the principle of separation of concerns to the build process. This is particularly true regarding one project directly referencing the source of another, which is also a violation of the principle of encapsulation. Another form of this kind of violation is attempting to create a build hierarchy to construct an entire system or subsystem by recursively invoking sub-builds. Maven strongly encourages/enforces this behavior, which is one of the many reasons that I don’t recommend it.

Finally, I find that there are various practical matters that make this feature undesirable. For one, svn:external has some interesting behavioral characteristics (but the details escape me for the moment). For another, I always find that I need such dependencies to be explicitly visible to my project (build process), not buried as some source control metadata.

So, what is a “safe” manner of using this feature? I would consider that to be when it is used temporarily by only one person, such as a way to “configure” a working environment. I could see where a programmer might create their own folder in the repository (or one for each programmer) where they would configure svn:external links to the various other parts of the repository that they are currently working on. Then, a checkout of that one folder will create a working copy of all their current projects. When a project is added or finished, the svn:external definitions could be adjusted and the working copy updated appropriately. However, I prefer an approach that is not tied to a particular source control system, such as doing this with a script that invokes the checkouts.

For the record, my most recent exposure to this issue occurred during the summer of 2008 at a consulting client that was using svn:external on a massive scale–EVERYTHING was cross-linked to produce a single master working copy. Their Ant & Jython-based (for WebLogic) build scripts were built on top of this master working copy. The net result: NOTHING could be built stand-alone, there were literally dozens of subprojects, but not one was safe to checkout/work on by itself. Therefore, any work on this system first required a checkout/update of over 2 GB of files (they put binaries in the repository also). Getting anything done was a exercise in futility, and I left after trying for three months (there were many other antipatterns present as well).

EDIT: Expound on recursive builds –

Over the years (especially the last decade), I have built massive systems for Fortune 500 companies and large government agencies involving many dozens of subprojects arranged in directory hierarchies that are many levels deep. I have used Microsoft Visual Studio projects/solutions to organize .NET-based systems, Ant or Maven 2 for Java-based systems, and I have begun using distutils and setuptools (easyinstall) for Python-based systems. These systems have also included huge databases typically in Oracle or Microsoft SQL Server.

I have had great success designing these massive builds for ease of use and repeatability. My design standard is that a new developer can show up on their first day, be given a new workstation (perhaps straight from Dell with just a typical OS installation), be given a simple setup document (usually just one page of installation instructions), and be able to fully setup the workstation and build the full system from source, unsupervised, unassisted, and in half a day or less. Invoking the build itself involves opening a command shell, changing to the root directory of the source tree, and issuing a one-line command to build EVERYTHING.

Despite that success, constructing such a massive build system requires great care and close adherence to solid design principles, just as with constructing a massive business-critical application/system. I have found that a crucial part is that each project (which produces a single artifact/deliverable) must have a single build script, which must have a well-defined interface (commands for invoking portions of the build process), and it must stand alone from all other (sub)projects. Historically, it is easy to build the whole system, but hard/impossible to build only one piece. Only recently have I learned to carefully ensure that each project truly stands alone.

In practice, this means that there must be at least two layers of build scripts. The lowest layer are the project build scripts that produce each deliverable/artifact. Each such script resides in the root directory of its project source tree (indeed, this script DEFINES its project source tree), these scripts know nothing about source control, they expect to be run from the command line, they reference everything in the project relative to the build script, and they reference their external dependencies (tools or binary artifacts, no other source projects) based on a few configurable settings (environment variables, configuration files, etc.).

The second layer of build scripts is also intended to be invoked from the command line, but these know about source control. Indeed, this second layer is often a single script that is invoked with a project name and a version, then it checks out the source for the named project to a new temporary directory (perhaps specified on the command line) and invokes its build script.

There may need to be more variation to accommodate continuous integration servers, multiple platforms, and various release scenarios.

Sometimes there is a need for a third layer of scripts that invokes the second layer of scripts (which invoke the first layer) for the purpose of building specific subsets of the overall project set. For example, each developer may have their own script that builds the projects that they are working on today. There may be a script to build everything in order to generate the master documentation, or to calculate metrics.

Regardless, I have found that attempting to treat the system as a hierarchy of projects is counterproductive. It ties the projects to each other so that they cannot be freely built alone, or in arbitrary locations (temporary directory on the continuous integration server), or in arbitrary order (assuming dependencies are satisfied). Often, attempting to force a hierarchy breaks any IDE integration that one might attempt.

Finally, building a massive hierarchy of projects can simply be too performance intensive. For example, during the spring of 2007 I attempted a modest source hierarchy (Java plus Oracle) that I built using Ant, which eventually failed because the build always aborted with a Java OutOfMemoryException. This was on a 2 GB RAM workstation with 3.5 GB swap space for which I had tuned the JVM to be able to use all available memory. The application/system was relatively trivial in terms of amount of code, but the recursive build invocations eventually exhausted memory, no matter how much memory I gave it. Of course, it also took forever to execute as well (30-60 minutes was common, before it aborted). I know how to tune VERY well, but ultimately I was simply exceeding the limits of the tools (Java/Ant in this case).

So do yourself a favor, construct your build as stand-alone projects, then compose them into a full system. Keep it light and flexible. Enjoy.

EDIT: More on antipatterns

Strictly speaking, an antipattern is a common solution that looks like it solves the problem but doesn’t, either because it leaves important gaps or because it introduces additional problems (often worse than the original problem). A solution necessarily involves one or more tools plus the technique for applying them to the problem at hand. Therefore, it is a stretch to refer to a tool or a specific feature of a tool as an antipattern, and it seems that people are detecting and reacting to that stretch–fair enough.

On the other hand, since it seems to be common practice in our industry to focus on tools rather than technique, it is the tool/feature that gets the attention (a casual survey of questions here on StackOverflow seems to easily illustrate). My comments, and this question itself, reflect that practice.

However, sometimes it seems particularly justified to make that stretch, such as in this case. Some tools seem to “lead” the user to particular techniques for applying them, to the point where some argue that tools shape thought (slightly rephrased). It is mostly in that spirit that I suggest that svn:external is an antipattern.

To more strictly state the issue, the antipattern is to design a build solution that includes tying projects together at the source level, or to implicitly version the dependencies between projects, or to allow such dependencies to implicitly change, because each of these invokes very negative consequences. The nature of the svn:external-like feature makes avoiding those negative consequences very difficult.

Properly handling the dependencies between projects involves addressing those dynamics along with the base problem, and the tools and techniques lead down a different path. An example that should be considered is Ivy, which helps in a manner similar to Maven but without the many downsides. I am investigating Ivy, coupled with Ant, as my short-term solution to the Java build problem. Long term, I am looking to incorporate the core concepts and features into an open-source tool that facilitates a multiplatform solution.

Leave a Comment