I mention this because it made me smile when a little while ago a friend showed me a paper (pdf) from Google that described how they implemented the same algorithm in several different languages and found the one written in C++ the fastest. You know how brand loyalty is apparently tied up with our self-esteem? Hey, C++ may not be cool, but when you got important things you gotta do fast you need a tool man-enough for the job, I thought. Of course, benchmarks are notoriously unreliable, and I didn't really believe this. But it still made me smile.
Maybe I shouldn't let my self-esteem get entangled with a programming language. But I think it's not unnatural that after spending considerable time with a language one may become attached to it. I'd like to state that I am aware of this proverb: if all you have is a hammer, everything looks like a nail; I subscribe to the notion that learning an unfamiliar programming language may broaden your mind. A little. Which is a good thing.
Reading the Google paper I came across this
“We ﬁnd that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.”They have a high opinion of themselves don’t they? Doubtless it’s justified. But it sounds like a challenge...
One of the promises of object oriented design is that you may be able to change the implementation of an object without changing its interface. So I wondered if I could improve on the performance of the Google C++ implementation without changing the test driver. After about 30 seconds of reading the code I decided that that sounded too much like hard work: much more fun to continue in the great programming tradition and rewrite the code. (The Google code has objects with getters that return pointers to private member variables; this restricts the freedom to change the implementation. Breaking encapsulation may be justified for run-time performance gains. The penalty is that a change to the implementation may mean a change to the user code, or a run-time penalty to maintain the old interface.)
So anyway, I wrote the version shown below. To make execution time comparisons meaningful I tried to keep the functionality identical to the Google C++ implementation from Robert Hundt's paper. Russel Cox also implemented the algorithm. So when I talk about Robert's code or Russel's code these are what I'm referring to.
(I found I needed to reserve about 10 MB stack space to get Robert's code to produce a result. You may need to do the same for mine.)
Probably the most interesting difference between the implementations is that Robert's and Russel's both use explicit
deletes and mine does not. I don't mean to say that there is anything wrong with raw pointers - I use them all the time - I simply chose a different approach to the problem using
vectors and indices instead. The fact that the nodes are named with successive integers and may be referenced by a 'preorder' number, an integer, suggested a vector would be an obvious data structure to use.
Using collections, such as
vectors, can make the code simpler, less likely to leak memory and easier to make exception-safe.
I think the code is in the spirit of the Google paper, which was that it should be written using "the default, idiomatic data structures in each language, as well as default type systems, memory allocation schemes, and default iteration constructs the basic facilities provided in the language."
Is it correct?
If it was not a requirement to be correct we could give an answer to any problem in an arbitrarily short time. So let's ask first if the code gives the right answer.
My code calculates there are 76,001 loops in the test graph. Robert's gives 76,002 and Russel's gives 76,000. Oh dear. Which is right?
Playing with Robert's code I found that it always exaggerates the number of loops by exactly one; for example it returns 1 for a graph containing no loops at all. So adjusting for this means mine and Robert's agree.
I corresponded briefly with Russel and he was kind enough to respond to my questions. It turns out there is a slight difference between Russel's graph and Robert's in the first few nodes: in Robert's there is an edge connecting nodes 1 -> 2, in Russel's there isn't. This difference means there is one less loop in Russel's graph, so his result is correct for his graph.
Looking at the code that builds the graph I would have said there were ((((3 x 25) + 1) x 100) + 1) x 10 = 76,010. But I'd be wrong, probably because overlapping loops are not counted more than once, so the 'and 10' becomes 'and 1'.
In short, I believe there are 76,001 loops in the Robert Hundt C++ graph and my implementation counts them correctly.
I like to remember I have an amazing symbol manipulation tool at my command that I can use to help me check my own work. (Not my own brain, lol.) I mean I may be able to write some code to test my code. Since I have the two Google implementations to test against the obvious thing to do is to build some graphs and compare the loop counts returned by each implementation. I wrapped each in its own namespace and linked all three into one test program.
The first idea I had was to generate graphs containing a random number of nodes with a random number of edges between random nodes, like this
Here is the output from a typical run
During several hours running this test there were no disagreements between any of the implementations (after adjusting Robert's by -1).
The second idea I had was to generate graphs with all possible combinations of connections between nodes. This is only going to be practical for small graphs: for one node there are two possible graphs: the node has no outgoing edges and the node has an outgoing edge to itself. For two nodes there are 16 possible graphs ranging from the graph where neither node has any outgoing edges to the graph where each node has an outgoing edge to itself and to the other node. I think for n nodes there will be 2^(n^2) possible graphs (I'm assuming no duplicate edges). For 5 nodes there are 33,554,432 possible graphs and for 6 there are 68,719,476,736. So 5 nodes is probably about the limit of what I could reasonably check. Here's what I wrote to test all possible graphs up to 5 nodes
It takes about 10 minutes to count the loops in all possible graphs having 5 nodes or fewer and confirm that in all cases my implementation returns the same loop count as Robert's. (I also tested against Russel's implementation, but that plotted a stairway to heaven on my memory usage monitor: Russel didn't free any of the memory he allocated. The memory usage for Hundt+Hay was a flat line.)
I'm sure there are other tests I could do. I could count loops in more 'hand-built' graphs like the original 76,001 looper, for example. But at this point I'm going to say that my implementation is an accurate analog of the original Robert Hundt code.
Is it efficient?
Actually, let's just ask if it's fast. Here are the timings for the three implementations running on my 2008 Mac with a 2.8 GHz Core 2 Duo processor:
rhundt 31.3 seconds = 100% of rhundt
anthay 6.5 seconds = 20.8% of rhundt
russcox 5.5 seconds = 17.6% of rhundt
(The above is an update to the original post after I upgraded to GCC 4.2)
So Russel's code wins, but mine is a creditable runner-up.
It's interesting watching the random graph test program run: occasionally a graph will be generated where it looks like Robert's code will count the loops quicker than either Russel's or mine. I'm wary of benchmarks. People say that, like statistics, you can find a benchmark to support any point of view. Hang on!... I wonder if... Voilà!...
I added code to the random graph test to time how long each implementation took to count loops and ran it for 100 graphs. The code and the results are shown here.
In this test Russel's implementation counted the loops in 90% of the time it took Robert's implementation to count the loops in the same graphs. But my implementation took only 38% of the time. My code wins. I think benchmarks can be a pretty accurate measure of performance, now I come to think about it.
I have my tongue in my cheek, in case you didn't know. I mean, these results are genuine, but I'm still wary of benchmarks. Just to confirm my code's superiority I changed
50000and reran the test. Here are the last few lines of output
In this test my code took twice as long as Robert's and Russel's code took twice as long as mine! Robert's code wins. I'm going off benchmarks again.
Three different benchmarks. Three different winners. It would be interesting to analyse the performance characteristics of these three implementations of the same algorithm to understand what's happening. But that's for another time.
Is it readable?
From the user's perspective code readability is neither here nor there. But in my experience from the programmer's perspective, you are more likely to achieve some of the things the user does care about, like being correct, if you can reason about the code. And to reason about the code you need to understand it. And code readability aids understanding. I believe the way the code looks is of lesser importance than other goals, but it isn't of no importance.
The core analyze_loops algorithm in the Havlak paper (reproduced in the Google paper) is written in a mathematical pseudocode. It is 30 lines long. I think there is a fairly clear resemblance to that pseudocode in my implementation, which is 53 lines long. One line of pseudocode to two of real code is a pretty good ratio, I think. (Note: I don't count blank lines, comments or lines containing only a curly brace.)
In general I think the resemblance between someone's description of an algorithm and the concrete realisation of that algorithm in a real programming language that actually works on a real computer is a good thing. Sometimes the algorithm description is clear but the computer language doesn't allow the realisation to reflect it.
It may be that computer scientists who publish algorithms often describe them in a way suited to implementation in C++ because they all use C or C++ to develop their algorithms. I don't know.
In what way is an algorithm "readable?" Some algorithms are simple and you can just see what they do without much difficulty. And some you can't keep enough in your head to just "see." In that case you resort to pencil and paper and boxes joined to other boxes with arrows. You work through simple concrete examples following each step in the algorithm to see how it transforms the data. That's what I do anyway.
The code is short, but I didn't find it easy to write. I bought the Havlak paper for $15 from the ACM, but I didn't find it easy to move from the abstract description of the complex algorithm in the paper to the concrete representation of a working program; without the two Google examples by Robert Hundt and Russel Cox I may not have succeeded. I do not intend anything I've said here to imply any criticism of those two guys' work; I don't know them personally but I have no doubt they have a far deeper understanding of the Havlak algorithm than I do. But I suspect they were not very interested in the C++ implementation of this algorithm, perhaps because C++ is not a cool language. (Update: I regret saying this last bit; how do I know how interested they were? It was a snide comment. Sorry.)
One of my difficulties was a certain amount of confusion about when I should be using the basic block name, an int, and when the preorder number, also an int. Using the two typedefs for these two concepts helped me keep them separate.
The most interesting thing I learned was the UNION/FIND algorithm, which I'm sad to say, I had never before seen (or if I had I had forgotten). That path compression is really neat.
In the above narrative I may have made it sound like I implemented the algorithm, checked it against the Google results and found it was correct. This is not how it happened. I fiddled around for some time getting wrong answers and trying to figure out what mistakes I had made. So what? I believe a poet might work on a poem for years before it's published. Right matters. Right first time is overrated. Since I'm not 100% sure this code is right I'm going to stop pontificating right here.