My Top 5 Why-oh-Why's of the Day
Just a few questions that I can't seem to find any answers for.
- Why do people still use CVS or Subversion when there are tools like Darcs, Mercurial, or Git?
- Why are people still developing in and for Windows?
- Why are people still declaring variable types ahead of time?
- Why isn't Objective-C more popular?
- Why did I learn about S-expressions so late in life?
It's been a rough day. I hate trying to figure out how to get a project set up in Visual Studio when all you have are source files.
I'm aggravated at the fact that Subversion is such a piece of crap, but to use anything better you really have to get away from a crappy Windows command prompt and into something more useful, like csh or bash. The droppings left by Subversion in .svn directories (oops, sorry, they are _svn in Windows, WTF?) are super-duper annoying.
I'm also tired of the excessive typing (pun intended) required to develop in C++ or C#. Why can't I just say "x = y" without declaring the type of x; if y is already known. Even worse, why do I have to write code like this (in C#):
XmlDocument x = new XmlDocument(); x.load("file.xml"); foreach (XmlElement e in x.SelectNodes("//foo[@title='bar']") { doSomethingUsefulWith(e); }
When all I really need is this (in Ruby):
x = XmlDocument.new x.load("file.xml"); x.each do |e| doSoemthingUsefulWith(e) end
I know, silly arguments. Especially since C# 3.0 brings in Linq and var and a whole slew of other new features to make it more tolerable.
But you know what? You still have to create a bloody class with a public version of the static method main in order to execute. And you still have to wait for 3.0 to come out and stabilize. And then you have to pay for it if you want VS.NET. That's just not right. Yet people continue to pay for it. Suckers? Nah, they just wont listen.
Subversion and those annoying .svn directories.
Nearly a year ago I wrote about a Load-Bearing Kludge. In that article I talked about how we implemented a work-around for getting rid of those pesky .svn directories that Subversion uses for meta-data. The solution wasn't to get around them, but to relocate them. To do this I had to create a shared library that intercepted and decorated a few low-level C library functions like open, fopen, unlink, and a few others.
Since that article last February I've received quite a few requests for the source code that did this. It isn't anything spectacular and in fact the code was abandoned for a much better solution of not using Subversion at all. At any rate, I just received approval from Matt Lightner, the CEO of Site5 to go ahead and give it away. So here it is.
The code
Build it
The provided Makefile is the simplest of Makefiles. It knows how to build the .so, only using gcc. AFAIK this code will not easily work on Windows.
Just copy the fix_dotsvn.c and Makefile into a directory and run make. This should create fix_dotsvn.so.
Use it
fix_dotsvn.so will reroute all ".svn" directory access to whatever directory specified in the environment variable "DOT_SVN_PREFIX", which defaults to "/backup/svn_home". Change it to wherever you would like to reroute your .svn's.
To load it, set your LD_PRELOAD environment variable to /path/to/fix_dotsvn.so before running the svn command. Example:
$ LD_PRELOAD=/path/to/fix_dotsvn.so svn update
That's all folks
That's really all there is to it. It worked perfectly while we used it.
A Load-Bearing Kludge
Every now and then you run into a problem that can be resolved in only one way. A kludge. A kludge is generally what happens when you are forced to work around problems you can’t control. It happens a lot more frequently than it should, and it almost always comes back to bite you in the ass.
A few months ago was such a time. In the paragraphs that follow I’ll present to you a kludge that was perfectly functional but gave me some serious headaches. I am not entirely sure how my associates felt about this particular kludge but it was both discomforting and annoying. It should also have never been an issue, but I’ll elaborate on that further down.
The Stage
Allow me a few more moments and then I’ll get to the point.
- The project: Flashback at Site5
- The platform: Ruby on Rails
- The back-end: MySQL, Subversion, Linux (CentOS 4)
In a nutshell, Flashback is a versioning system that automatically sweeps and stores changes for Site5’s customers web space. The key function is to provide rapid undo capabilities for simple or massive changes allowing a customer to restore individual files or even their entire web space to any point in time, with just a few clicks.
The Problem
Flashback originally used Subversion as the underbelly for the version control system. Subversion has a nasty habit of storing its meta-data in .svn directories that happen to be right there within the data being versioned. For anyone who has tarred up their work area while using Subversion (or CVS for that matter) this is evident, and it generally isn’t annoying; though in our case it was a major road block. You see, we can’t be polluting users home directories with a bunch of Subversion meta-data without their consent. Even with their consent it would be inconsiderate at best, and destructive at worst. We just couldn’t allow this behavior to persist. Unfortunately Subversion doesn’t provide an alternative location for storing this meta-data and so our adventure begins with finding a work-around.
Phase 1 – Research
Many ideas were tossed around, from using a union file system to staging the directories prior to versioning them, to patching Subversion directly. I am pretty sure each and every member of the Site5 Engineering Team had an idea or two, but I rather unfortunately forget who had what ideas so I am unable to provide credit where credit is due. At one point I think we even discussed using something other than Subversion!
Phase 2 – Trial and Error
I worked diligently in an attempt to get a working UnionFS, hacked just right, so we could hide the .svn directories in a completely different place transparent to the customer. I am pretty sure David Felstead started working on the idea of staging the data, and even had a good deal of success with that.
Phase 3 – Aha!
With almost any tricky problem comes a tricky solution. This was no exception. I had a bright idea and I got to work immediately. I didn’t even bother to tell anyone about it until it was about 90% done, I knew it would work. All good kludges work, no matter how ugly they are. If it doesn’t work it isn’t a kludge, it’s a catastrophe.
The Kludge
So you want to know what it was do ya? It was almost too simple. You see, Linux (and UNIX in general) has this great facility in ld.so(8) that allows you to pre-load a set of dynamic libraries before the executable loads its own shared libs. This is invoked by setting the LD_PRELOAD environment variable prior to execution. So with a bit of strace(1) magic I set out to write my own shared library whose entire purpose was to intercept all file and directory related calls made during an invocation of Subversion (either by library call-out or svn executable) and rewrite the file paths, relocating every instance that contained “/.svn/” to a different directory tree.
Here is a small sample of how the code looked for the fopen(3) intercept function:
001 /** 002 * intercept fopen 003 */ 004 FILE *fopen(const char *path, const char *mode) 005 { 006 static FILE *(*orig_func)(); 007 if (!orig_func) { 008 orig_func = (FILE*(*)()) dlsym(RTLD_NEXT, "fopen"); 009 } 010 011 IF_DOT_SVN(path) { 012 char *new_name = adjusted_filename(path); 013 FILE *ret = orig_func(new_name, mode); 014 free(new_name); 015 return ret; 016 } 017 018 return orig_func(path, mode); 019 }
This fopen function is loaded before the application loads its own libraries (including the standard C library). Since this version of fopen is loaded first, it trumps any that is loaded later. The call to dlsym on line 8 is how we find the original from the standard C library. So now, any time the application makes an fopen function call, the path is rewritten to point to an internal directory tree, outside of the users web space prior to calling the true fopen function. In all there were 24 different functions that had to be intercepted in order for Subversion to be completely covered. Lucky for us Subversion delegates most of these tasks to the Apache Portable Runtime which is pretty easy to mine. Using strace (or truss) is nice to detect system calls, but you still have to figure out where those calls originate. GDB can be extremely useful in this case, I highly recommend it.
Post Mortem
Ultimately this strategy proved to work very well. We were able to separate the Subversion meta-data from the users data. This went into production, and there were no problems caused by pre-loading our custom “fix_dotsvn.so”. It was easier than installing a UnionFS. It was faster than staging all of the data. But it did come with its own set of baggage. We had to be absolutely certain that the kludge was in place or we could easily corrupt a repository beyond repair. It’s a very easy mistake to make and not one that is easily fixed. This is a high price to pay.
This is why this is a Load-Bearing Kludge.
Load-Bearing Kludge
There is no real definition that I am aware of. The term actually comes from Eugene Szedenits, Jr., an individual I regard quite highly whom I worked with at Clareos previously to joining the Site5 Engineering Team. I will attempt to provide the definition as I see it, and with Gene's blessing:
Any kludge you cannot remove without causing the entire application to die a horrible death.
That about sums it up.
What now?
Our Load-Bearing Kludge no longer exists. Thanks to David Felstead’s incredible work we were able to supplant Subversion with a versioning system that he wrote in pure Ruby. For our application it blows the doors off of Subversion in both speed and reliability. You have to read this entry in his blog to get the full picture.
At the beginning of this article I mentioned that I’d explain why I don’t think we should have ever been put in this place to begin with. The explanation is simple: Subversion should not be polluting your source code with meta-data directories. There are better ways. It isn’t likely going to change any time soon, and I’m not likely to start using any of the alternative systems out there because in reality Subversion is the best I’ve seen. I just don’t like it when applications dirty up my source trees.