C++ is not a cool language. I think it once was, back in the early 1990s; certainly there were a lot of books and magazine articles written about it at that time. Back then I wrote code mostly in C. I think it took quite a while before I really
got object orientation, but eventually I became comfortable with C++, probably around the time that Java became the new cool language.
I mention this because it made me smile when a little while ago a friend showed me a paper (
pdf) from Google that described how they implemented the same algorithm in several different languages and found the one written in C++ the fastest. You know how brand loyalty is
apparently tied up with our self-esteem? Hey, C++ may not be cool, but when you got important things you gotta do fast you need a tool man-enough for the job, I thought. Of course, benchmarks are notoriously unreliable, and I didn't really believe this. But it still made me smile.
Maybe I shouldn't let my self-esteem get entangled with a programming language. But I think it's not unnatural that after spending considerable time with a language one may become attached to it. I'd like to state that I
am aware of this proverb:
if all you have is a hammer, everything looks like a nail; I subscribe to the notion that learning an unfamiliar programming language may broaden your mind. A little. Which is a good thing.
Reading the Google paper I came across this
“We find that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.”
They have a high opinion of themselves don’t they? Doubtless it’s justified. But it sounds like a challenge...
One of the promises of object oriented design is that you may be able to change the implementation of an object without changing its interface. So I wondered if I could improve on the performance of the Google C++ implementation without changing the test driver. After about 30 seconds of reading the code I decided that that sounded too much like hard work: much more fun to continue in the great programming tradition and rewrite the code. (The Google code has objects with getters that return pointers to private member variables; this restricts the freedom to change the implementation. Breaking encapsulation may be justified for run-time performance gains. The penalty is that a change to the implementation may mean a change to the user code, or a run-time penalty to maintain the old interface.)
So anyway, I wrote the version shown below. To make execution time comparisons meaningful I tried to keep the functionality identical to the Google C++
implementation from Robert Hundt's paper. Russel Cox also
implemented the algorithm. So when I talk about Robert's code or Russel's code these are what I'm referring to.
antloops.h
#ifndef ANTLOOPS_H_INCLUDED #define ANTLOOPS_H_INCLUDED /* This code is an implementation of the analyze_loops algorithm from Nesting of Reducible and Irriducible Loops by Paul Havlak published in ACM ToPLaS, July 1997. (Referred to below as "Havlak's paper.") See http://howtowriteaprogram.blogspot.com/
Copyright (c) 2011 Anthony C. Hay. Some Rights Reserved. Published under the BSD 2-clause license. See http://www.opensource.org/licenses/BSD-2-Clause */
#include <vector> #include <stdexcept>
namespace anthay {
typedef int block_name;
/* a basic_block is a node or vertex in a control-flow graph (a basic block is a sequence of zero or more statements in a computer program that would be executed in-sequence with no jumps into or out of the sequence, described briefly in Havlak's paper; it doesn't matter, for our purposes it's just a node in a graph) */ class basic_block { public: void add_out_edge(block_name to) { out_edges_.push_back(to); }
void add_in_edge(block_name from) { in_edges_.push_back(from); }
typedef std::vector<block_name> edge_vec_t; const edge_vec_t & out_edges() const { return out_edges_; } const edge_vec_t & in_edges() const { return in_edges_; }
private: edge_vec_t out_edges_; edge_vec_t in_edges_; };
// represent one control-flow graph class mao_cfg { public: mao_cfg() : start_node_(undefined) { }
void reserve(int size) { node_vec_.reserve(size); }
// return a pointer to a basic_block with the given 'name'; // pointer is guaranteed valid only until next call to any non-const function basic_block * create_node(block_name name) { const int size = static_cast<int>(node_vec_.size()); if (size == 0) start_node_ = name; // assume first created block is graph start if (name >= size) node_vec_.resize(name + 1); return &node_vec_[name]; }
void create_edge(block_name from, block_name to) { create_node(from)->add_out_edge(to); create_node(to)->add_in_edge(from); }
// return the name of the start node block_name start_node() const { if (start_node_ == undefined) throw std::runtime_error("undefined start node"); return start_node_; }
// return the number of nodes in the graph int size() const { return node_vec_.size(); }
// return a reference to the node with the given 'name'; // reference guaranteed valid only until next call to any non-const function const basic_block & node(block_name name) const { return node_vec_[name]; }
private: enum { undefined = -1 }; block_name start_node_; typedef std::vector<basic_block> node_vec_t; node_vec_t node_vec_; };
class loop_structure_graph { public: loop_structure_graph() : count_(0) {} void clear() { count_ = 0; } void add_loop() { ++count_; } int size() const { return count_; }
void calculate_nesting_level() { /* I'm not implementing this because it has no effect on the performance of either the Robert Hundt or Russel Cox C++ implementations of the Havlak algorithm. Robert's LoopStructureGraph::CalculateNestingLevel() is never called. Russel's LoopGraph::CalculateNesting() is called, but as Loop::isRoot is not initialised (and is therefore mostly "true"), it does (almost) nothing. */ }
private: int count_; };
// find loops in given 'cfg' using the analyze_loops algorithm from Havlak's // paper; return the number of loops found int find_havlak_loops(const mao_cfg & cfg, loop_structure_graph & lsg);
} #endif
|
antloops.cpp
/* This code is an implementation of the analyze_loops algorithm from Nesting of Reducible and Irriducible Loops by Paul Havlak published in ACM ToPLaS, July 1997. (Referred to below as "Havlak's paper.") See http://howtowriteaprogram.blogspot.com/
Copyright (c) 2011 Anthony C. Hay. Some Rights Reserved. Published under the BSD 2-clause license. See http://www.opensource.org/licenses/BSD-2-Clause */
#include "antloops.h" #include <algorithm>
namespace anthay { namespace {
typedef int preorder;
// implements the DFS algorithm from Havlak's paper class depth_first_search { public: depth_first_search(const mao_cfg & cfg) : cfg_(&cfg), current_preorder_(0), node_(cfg.size(), unvisited), number_(cfg.size(), unvisited), last_(cfg.size()) { dfs(cfg.start_node()); }
// return the preorder number of the last visited node, or -1 // if no nodes were visited preorder last_preorder() const { return current_preorder_ - 1; }
// return true iff 'p' is set to the preorder number assigned to node // with the given name 'a'; false means there was no path to 'a' bool preorder_number(block_name a, preorder & p) const { p = number_[a]; return p != unvisited; }
// return the block name with the given preorder number 'p'; // 'p' _must_ be in the range 0 .. last_preorder() block_name block(preorder p) const { return node_[p]; }
// return true iff node 'w' is an ancestor of node 'v', where // w and v are the preorder numbers of the nodes bool is_ancestor(preorder w, preorder v) const { return w <= v && v <= last_[w]; }
private: const mao_cfg * cfg_; preorder current_preorder_; // used in dfs() to assign a preorder number to each node visited std::vector<block_name> node_; // node_[a] is the block_name of node with preorder number a std::vector<preorder> number_; // number_[a] is the depth-first (or preorder) number of node named a std::vector<preorder> last_; // last_[number_[a]] is the last decendant of node a
enum { unvisited = -1 };
// recursive depth-first search; the DFS algorithm from Havlak void dfs(block_name a) { node_[current_preorder_] = a; number_[a] = current_preorder_++; // foreach node b such that there exists an edge (a,b) do... const basic_block::edge_vec_t & out_edges = cfg_->node(a).out_edges(); for (unsigned i = 0; i < out_edges.size(); ++i) if (number_[out_edges[i]] == unvisited) dfs(out_edges[i]); last_[number_[a]] = current_preorder_ - 1; } };
// implements the UNION/FIND algorithm required by Havlak's algorithm class union_find { public: // create 'size' elements named 0 .. size-1, with no connections between elements explicit union_find(int size) : element_(size) { for (int i = 0; i < size; ++i) element_[i] = i; }
// return the name of the root element of the element named 'p' // (beneficial side-effect: the path from 'p' to the root is compressed) int find_root(int p) { // I found this a good place to learn about the UNION/FIND algorithm: // http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf while (p != element_[p]) p = element_[p] = element_[element_[p]]; return p; }
// merge the set containing element 'a' into the set containing element 'b' void unite(int a, int b) { const int ra = find_root(a); element_[ra] = find_root(b); }
private: std::vector<preorder> element_; };
// return true iff given 'v' contains given element 'elem' inline bool is_member(const std::vector<preorder> & v, preorder elem) { return std::find(v.begin(), v.end(), elem) != v.end(); }
const int max_non_back_preds = 32 * 1024;
}// anonymous namespace
// find loops in given 'cfg' using the analyze_loops algorithm from Havlak's // paper; record loops in given 'lsg' and return the number of loops found int find_havlak_loops(const mao_cfg & cfg, loop_structure_graph & lsg) { if (cfg.size() == 0) return 0;
// step a: const depth_first_search dfs(cfg);
// step b: enum node_type { non_header, self, reducible, irreducible }; // type[p] is node_type of node with preorder number p std::vector<int> type(dfs.last_preorder() + 1, non_header); std::vector<preorder> header(cfg.size(), cfg.start_node()); std::vector<std::vector<preorder> > back_preds(cfg.size()); // backedge predecessors std::vector<std::vector<preorder> > non_back_preds(cfg.size()); // non-backedge predecessors
for (preorder w = 0; w <= dfs.last_preorder(); ++w) { const basic_block::edge_vec_t & in_edges = cfg.node(dfs.block(w)).in_edges(); for (unsigned j = 0; j < in_edges.size(); ++j) { preorder v; if (!dfs.preorder_number(in_edges[j], v)) continue; // in_edges[j] was not visited if (dfs.is_ancestor(w, v)) back_preds[w].push_back(v); else if (!is_member(non_back_preds[w], v)) non_back_preds[w].push_back(v); } } header[cfg.start_node()] = 0;
// step c: union_find preorder_set(dfs.last_preorder() + 1); std::vector<preorder> p; for (preorder w = dfs.last_preorder(); w >= 0; --w) { p.clear();
// step d: const std::vector<preorder> & back_preds_w = back_preds[w]; typedef std::vector<preorder>::const_iterator preorder_iter; for (preorder_iter v = back_preds_w.begin(); v != back_preds_w.end(); ++v) { if (*v != w) p.push_back(preorder_set.find_root(*v)); else type[w] = self; } if (!p.empty()) { type[w] = reducible; for (unsigned ix = 0; ix != p.size(); ++ix) { const preorder x = p[ix];
// step e: const std::vector<preorder> & non_back_preds_set = non_back_preds[x]; if (non_back_preds_set.size() > max_non_back_preds) { // algorithm has degenerated, aparently lsg.clear(); return 0; } for (preorder_iter y = non_back_preds_set.begin(); y != non_back_preds_set.end(); ++y) { const preorder yprime = preorder_set.find_root(*y); if (!dfs.is_ancestor(w, yprime)) { type[w] = irreducible; if (!is_member(non_back_preds[w], yprime)) non_back_preds[w].push_back(yprime); } else if (yprime != w && !is_member(p, yprime)) p.push_back(yprime); } } }
if (!p.empty() || type[w] == self) { for (preorder_iter x = p.begin(); x != p.end(); ++x) { header[*x] = w; preorder_set.unite(*x, w); } lsg.add_loop(); } }
return lsg.size(); }
}
|
antloops_test.cpp
/* This code is an implementation of the test driver in the paper Loop Recognition in C++/Java/Go/Scala by Robert Hundt. See http://code.google.com/p/multi-language-bench/
Copyright (c) 2011 Anthony C. Hay. Some Rights Reserved. Published under the BSD 2-clause license. See http://www.opensource.org/licenses/BSD-2-Clause See also http://howtowriteaprogram.blogspot.com/ */
#include "antloops.h" using namespace anthay;
#include <iostream> #include <cassert>
// create a diamond from 'start' to two intermediate nodes back to one // end node; return the name (or index) of the end node int build_diamond(mao_cfg & cfg, int start) { cfg.create_edge(start, start + 1); cfg.create_edge(start, start + 2); cfg.create_edge(start + 1, start + 3); cfg.create_edge(start + 2, start + 3); return start + 3; }
// when n > 0, create a daisy chain of nodes containing 'n' edges: // start -> start+1 -> start+2 -> ... -> start+n // return the name of the last node in the chain int build_straight(mao_cfg & cfg, int start, int n) { for (int i = 0; i < n; ++i) cfg.create_edge(start + i, start + i + 1); return start + n; }
// create two diamonds in series, add three further edges to introduce // 3 loops; return the name of the last node in the created graph int build_base_loop(mao_cfg & cfg, int from) { int header = build_straight(cfg, from, 1); int diamond1 = build_diamond(cfg, header); int d11 = build_straight(cfg, diamond1, 1); int diamond2 = build_diamond(cfg, d11); int footer = build_straight(cfg, diamond2, 1); cfg.create_edge(diamond2, d11); cfg.create_edge(diamond1, header); cfg.create_edge(footer, from); footer = build_straight(cfg, footer, 1); return footer; }
int main() { // build a control flow graph and count loops like the code // in Robert Hundt's LoopTesterApp.cpp so that execution time // comparisons with that code are meaningful std::cerr << "counting loops...";
mao_cfg cfg; cfg.create_node(0); build_base_loop(cfg, 0); cfg.create_node(1); cfg.create_edge(0, 2);
for (int i = 0; i < 15000; ++i) { loop_structure_graph lsg; find_havlak_loops(cfg, lsg); }
int n = 2; for (int parlooptrees = 0; parlooptrees < 10; parlooptrees++) { cfg.create_edge(2, ++n);
for (int i = 0; i < 100; i++) { const int top = n; n = build_straight(cfg, n, 1); for (int j = 0; j < 25; j++) n = build_base_loop(cfg, n); const int bottom = build_straight(cfg, n, 1); cfg.create_edge(n, top); n = bottom; } cfg.create_edge(n, 1); }
loop_structure_graph lsg; const int num_loops = find_havlak_loops(cfg, lsg); std::cerr << "\nnum_loops = " << num_loops << "\n"; assert(num_loops == 76001);
for (int i = 0; i < 50; i++) { loop_structure_graph lsg; std::cerr << "."; find_havlak_loops(cfg, lsg); }
lsg.calculate_nesting_level(); }
|
(I found I needed to reserve about 10 MB stack space to get Robert's code to produce a result. You may need to do the same for mine.)
Probably the most interesting difference between the implementations is that Robert's and Russel's both use explicit
new
s and
delete
s and mine does not. I don't mean to say that there is anything wrong with raw pointers - I use them all the time - I simply chose a different approach to the problem using
vector
s and indices instead. The fact that the nodes are named with successive integers and may be referenced by a 'preorder' number, an integer, suggested a vector would be an obvious data structure to use.
Using collections, such as
vector
s, can make the code simpler, less likely to leak memory and easier to make exception-safe.
I think the code is in the spirit of the Google paper, which was that it should be written using "the default, idiomatic data structures in each language, as well as default type systems, memory allocation schemes, and default iteration constructs the basic facilities provided in the language."
Is it correct?
If it was not a requirement to be correct we could give an answer to any problem in an arbitrarily short time. So let's ask first if the code gives the right answer.
My code calculates there are 76,001 loops in the test graph. Robert's gives 76,002 and Russel's gives 76,000. Oh dear. Which is right?
Playing with Robert's code I found that it always exaggerates the number of loops by exactly one; for example it returns 1 for a graph containing no loops at all. So adjusting for this means mine and Robert's agree.
I corresponded briefly with Russel and he was kind enough to respond to my questions. It turns out there is a slight difference between Russel's graph and Robert's in the first few nodes: in Robert's there is an edge connecting nodes 1 -> 2, in Russel's there isn't. This difference means there is one less loop in Russel's graph, so his result is correct for his graph.
Looking at the code that builds the graph I would have said there were ((((3 x 25) + 1) x 100) + 1) x 10 = 76,010. But I'd be wrong, probably because overlapping loops are not counted more than once, so the 'and 10' becomes 'and 1'.
In short, I believe there are 76,001 loops in the Robert Hundt C++ graph and my implementation counts them correctly.
I like to remember I have an amazing symbol manipulation tool at my command that I can use to help me check my own work. (Not my own brain, lol.) I mean I may be able to write some code to test my code. Since I have the two Google implementations to test against the obvious thing to do is to build some graphs and compare the loop counts returned by each implementation. I wrapped each in its own namespace and linked all three into one test program.
The first idea I had was to generate graphs containing a random number of nodes with a random number of edges between random nodes, like this
int bigrand() { return rand() * rand(); }
// generate three representations of the same random graph void generate_random_graph( rhundt::MaoCFG & rhcfg, russcox::CFG & russcfg, anthay::mao_cfg & antcfg, int num_nodes, int num_edges) { // create all nodes; nodes are named 0..num_nodes-1 for (block_name b = 0; b < num_nodes; ++b) { rhcfg.CreateNode(b); russcfg.NewBlock(); antcfg.create_node(b); }
// link pairs of nodes chosen at random for (int i = 0; i < num_edges; ++i) { const block_name a = bigrand() % num_nodes; const block_name b = bigrand() % num_nodes; // we don't care if we link a node to itself // or duplicate an existing link
new rhundt::BasicBlockEdge(&rhcfg, a, b); russcfg.Connect(russcfg.block[a], russcfg.block[b]); antcfg.create_edge(a, b); } }
// create a random graph and count the loops using all three implementations void test_graph(int num_nodes, int num_edges) { rhundt::MaoCFG rhcfg; russcox::CFG russcfg; anthay::mao_cfg antcfg;
generate_random_graph(rhcfg, russcfg, antcfg, num_nodes, num_edges); std::cerr << "nodes " << std::setw(6) << num_nodes << " edges " << std::setw(6) << num_edges << " loops ";
anthay::loop_structure_graph antlsg; const int ant_loops = anthay::find_havlak_loops(antcfg, antlsg); std::cerr << std::setw(5) << ant_loops;
russcox::LoopGraph russlsg; russcox::LoopFinder f; f.FindLoops(&russcfg, &russlsg); const int rcox_loops = russlsg.loop.size(); std::cerr << " " << std::setw(5) << rcox_loops;
rhundt::LoopStructureGraph rhlsg; // (to make consistent with others shift result form 1..n to 0..n-1) const int rhundt_loops = rhundt::FindHavlakLoops(&rhcfg, &rhlsg) - 1; std::cerr << " " << std::setw(5) << rhundt_loops;
if (rhundt_loops != ant_loops || rcox_loops != ant_loops) std::cerr << " <=== disagreement";
std::cerr << "\n"; }
int main() { const int max_graph_size = 10000;
srand(20111002); for (;;) test_graph((bigrand() % max_graph_size) + 1, bigrand() % max_graph_size); }
|
Here is the output from a typical run
nodes 8834 edges 1940 loops 0 0 0 nodes 3112 edges 5351 loops 379 379 379 nodes 6031 edges 5518 loops 0 0 0 nodes 7929 edges 2756 loops 0 0 0 nodes 3841 edges 8340 loops 740 740 740 nodes 840 edges 608 loops 12 12 12 nodes 9548 edges 8367 loops 134 134 134 nodes 6 edges 9525 loops 6 6 6 nodes 1185 edges 7226 loops 581 581 581 nodes 8411 edges 8406 loops 0 0 0 nodes 3179 edges 2630 loops 0 0 0 nodes 9249 edges 6192 loops 0 0 0 nodes 4937 edges 9384 loops 657 657 657 nodes 6911 edges 1174 loops 0 0 0 nodes 6541 edges 5294 loops 0 0 0 nodes 4117 edges 6625 loops 324 324 324 nodes 6245 edges 449 loops 0 0 0 nodes 4289 edges 4160 loops 0 0 0 nodes 9031 edges 5296 loops 0 0 0 nodes 3083 edges 1959 loops 0 0 0 nodes 321 edges 7315 loops 270 270 270 nodes 6165 edges 4654 loops 5 5 5 nodes 4377 edges 877 loops 0 0 0 nodes 5009 edges 1100 loops 0 0 0 nodes 6081 edges 7288 loops 144 144 144 nodes 7559 edges 5344 loops 0 0 0 nodes 2729 edges 2896 loops 0 0 0 nodes 7425 edges 6250 loops 65 65 65 nodes 5134 edges 9532 loops 795 795 795 nodes 7571 edges 3600 loops 0 0 0 nodes 4475 edges 8744 loops 648 648 648 nodes 1182 edges 3650 loops 328 328 328 nodes 4268 edges 4450 loops 95 95 95 nodes 761 edges 284 loops 0 0 0 nodes 7021 edges 50 loops 0 0 0 nodes 6036 edges 7762 loops 370 370 370 nodes 4816 edges 6359 loops 327 327 327 nodes 2417 edges 9300 loops 1207 1207 1207 nodes 289 edges 5292 loops 259 259 259 . . . and so on . . .
|
During several hours running this test there were no disagreements between any of the implementations (after adjusting Robert's by -1).
The second idea I had was to generate graphs with all possible combinations of connections between nodes. This is only going to be practical for small graphs: for one node there are two possible graphs: the node has no outgoing edges and the node has an outgoing edge to itself. For two nodes there are 16 possible graphs ranging from the graph where neither node has any outgoing edges to the graph where each node has an outgoing edge to itself and to the other node. I think for n nodes there will be 2^(n^2) possible graphs (I'm assuming no duplicate edges). For 5 nodes there are 33,554,432 possible graphs and for 6 there are 68,719,476,736. So 5 nodes is probably about the limit of what I could reasonably check. Here's what I wrote to test all possible graphs up to 5 nodes
// generate all possible graphs having 'n' nodes and test // all loop counting implementations return the same count void test_all_possible_graphs(int n) { assert(1 <= n && n <= 5); const int end = 1 << (n * n); // we will test this number of graphs for (int bitmap = 0; bitmap < end; ++bitmap) { /* think of bitmap as n groups of n bits; each node has its own group, each bit in that group determines whether there is an out edge from that node to the node corresponding to that bit */
// create all nodes anthay::mao_cfg antcfg; rhundt::MaoCFG huncfg; for (int a = 0; a < n; ++a) { antcfg.create_node(a); huncfg.CreateNode(a); }
// add edges only where the corresponding bit in bitmap is set int bit = 1; for (int a = 0; a < n; ++a) { for (int b = 0; b < n; ++b) { if (bitmap & bit) { antcfg.create_edge(a, b); new rhundt::BasicBlockEdge(&huncfg, a, b); } bit <<= 1; } }
// count loops anthay::loop_structure_graph antlsg; const int ant_loops = anthay::find_havlak_loops(antcfg, antlsg); rhundt::LoopStructureGraph hunlsg; const int rhundt_loops = rhundt::FindHavlakLoops(&huncfg, &hunlsg) - 1;
// compare counts if (rhundt_loops != ant_loops) std::cout << "disagreement for 0x" << std::hex << bitmap << "\n"; if (bitmap % 1000000 == 0) std::cerr << '.'; } }
int main() { for (int n = 1; n <= 5; ++n) test_all_possible_graphs(n); }
|
It takes about 10 minutes to count the loops in all possible graphs having 5 nodes or fewer and confirm that in all cases my implementation returns the same loop count as Robert's. (I also tested against Russel's implementation, but that plotted a stairway to heaven on my memory usage monitor: Russel didn't free any of the memory he allocated. The memory usage for Hundt+Hay was a flat line.)
I'm sure there are other tests I could do. I could count loops in more 'hand-built' graphs like the original 76,001 looper, for example. But at this point I'm going to say that my implementation is an accurate analog of the original Robert Hundt code.
Is it efficient?
Actually, let's just ask if it's fast. Here are the timings for the three implementations running on my 2008 Mac with a 2.8 GHz Core 2 Duo processor:
$ g++ i686-apple-darwin11-llvm-g++-4.2: no input files $ $ $ g++ -O3 LoopTesterApp.cpp mao-loops.cpp $ time ./a.out Welcome to LoopTesterApp, C++ edition Constructing App... Constructing Simple CFG... 15000 dummy loops Constructing CFG... Performing Loop Recognition 1 Iteration Another 50 iterations... .................................................. Found 76002 loops (including artificial root node)(3800100) loop-0, nest: 0, depth: 0
real 0m31.326s user 0m29.501s sys 0m1.503s $ $ $ g++ -O3 russcox.cpp $ time ./a.out # of loops: 76000 (including 1 artificial root node)
real 0m5.497s user 0m5.355s sys 0m0.086s $ $ $ g++ -O3 antloops_test.cpp antloops.cpp $ time ./a.out counting loops... num_loops = 76001 .................................................. real 0m6.511s user 0m6.009s sys 0m0.279s $
|
In summary:
rhundt 31.3 seconds = 100% of rhundt
anthay 6.5 seconds = 20.8% of rhundt
russcox 5.5 seconds = 17.6% of rhundt
(The above is an update to the original post after I upgraded to GCC 4.2)
So Russel's code wins, but mine is a creditable runner-up.
It's interesting watching the random graph test program run: occasionally a graph will be generated where it looks like Robert's code will count the loops quicker than either Russel's or mine. I'm wary of benchmarks. People say that, like statistics, you can find a benchmark to support any point of view. Hang on!... I wonder if... Voilà!...
I added code to the random graph test to time how long each implementation took to count loops and ran it for 100 graphs. The code and the results are shown here.
// create a random graph and count the loops using all three implementations void test_graph( int num_nodes, int num_edges, clock_t & ant_ms, clock_t & rus_ms, clock_t & rhu_ms) { rhundt::MaoCFG rhcfg; russcox::CFG russcfg; anthay::mao_cfg antcfg;
generate_random_graph(rhcfg, russcfg, antcfg, num_nodes, num_edges); std::cout << "nodes " << std::setw(6) << num_nodes << " edges " << std::setw(6) << num_edges << " loops ";
static_assert(CLOCKS_PER_SEC >= 1000, "code assumes at least millisecond clock() resolution");
clock_t start = clock() / (CLOCKS_PER_SEC / 1000); anthay::loop_structure_graph antlsg; const int ant_loops = anthay::find_havlak_loops(antcfg, antlsg); clock_t stop = clock() / (CLOCKS_PER_SEC / 1000); ant_ms = stop - start; std::cout << std::setw(5) << ant_loops << " " << std::setw(5) << ant_ms << "ms";
start = clock() / (CLOCKS_PER_SEC / 1000); russcox::LoopGraph russlsg; russcox::LoopFinder f; f.FindLoops(&russcfg, &russlsg); const int rcox_loops = russlsg.loop.size(); stop = clock() / (CLOCKS_PER_SEC / 1000); rus_ms = stop - start; std::cout << " : " << std::setw(5) << rcox_loops << " " << std::setw(5) << rus_ms << "ms";
start = clock() / (CLOCKS_PER_SEC / 1000); rhundt::LoopStructureGraph rhlsg; // (to make consistent with others shift result form 1..n to 0..n-1) const int rhundt_loops = rhundt::FindHavlakLoops(&rhcfg, &rhlsg) - 1; stop = clock() / (CLOCKS_PER_SEC / 1000); rhu_ms = stop - start; std::cout << " : " << std::setw(5) << rhundt_loops << " " << std::setw(5) <v rhu_ms << "ms";
if (rhundt_loops != ant_loops || rcox_loops != ant_loops) std::cout << " <=== disagreement";
std::cout << "\n"; }
int main() { clock_t ant_ms_total = 0; clock_t rus_ms_total = 0; clock_t rhu_ms_total = 0;
const int max_graph_size = 10000;
srand(20111002); for (int i = 0; i < 100; ++i) { clock_t ant_ms, rus_ms, rhu_ms; test_graph((bigrand() % max_graph_size) + 1, bigrand() % max_graph_size, ant_ms, rus_ms, rhu_ms); ant_ms_total += ant_ms; rus_ms_total += rus_ms; rhu_ms_total += rhu_ms; }
std::cout << "ant ms " << ant_ms_total << " russcox ms " << rus_ms_total << " rhundt ms " << rhu_ms_total << "\n"; std::cout << "ant " << (100 * ant_ms_total) / rhu_ms_total << "% russcox " << (100 * rus_ms_total) / rhu_ms_total << "% rhundt " << (100 * rhu_ms_total) / rhu_ms_total << "%\n"; }
nodes 8834 edges 1940 loops 0 1ms : 0 2ms : 0 21ms nodes 3112 edges 5351 loops 379 22ms : 379 107ms : 379 54ms nodes 6031 edges 5518 loops 0 0ms : 0 0ms : 0 8ms nodes 7929 edges 2756 loops 0 0ms : 0 1ms : 0 10ms nodes 3841 edges 8340 loops 740 59ms : 740 292ms : 740 158ms nodes 840 edges 608 loops 12 0ms : 12 1ms : 12 1ms nodes 9548 edges 8367 loops 134 2ms : 134 39ms : 134 22ms nodes 6 edges 9525 loops 6 1ms : 6 29ms : 6 7ms nodes 1185 edges 7226 loops 581 110ms : 581 125ms : 581 288ms nodes 8411 edges 8406 loops 0 0ms : 0 0ms : 0 11ms nodes 3179 edges 2630 loops 0 0ms : 0 0ms : 0 5ms
. . . and so on . . .
nodes 126 edges 610 loops 52 0ms : 52 1ms : 52 3ms nodes 758 edges 6658 loops 476 79ms : 476 81ms : 476 243ms nodes 6553 edges 674 loops 0 0ms : 0 0ms : 0 10ms nodes 7009 edges 4569 loops 0 1ms : 0 0ms : 0 9ms nodes 6656 edges 157 loops 0 0ms : 0 0ms : 0 10ms nodes 7617 edges 6113 loops 0 1ms : 0 0ms : 0 10ms nodes 2392 edges 8981 loops 804 213ms : 804 378ms : 804 415ms nodes 6829 edges 4110 loops 0 0ms : 0 0ms : 0 9ms nodes 1899 edges 9336 loops 922 292ms : 922 346ms : 922 556ms ant ms 2842 russcox ms 6680 rhundt ms 7368 ant 38% russcox 90% rhundt 100%
|
In this test Russel's implementation counted the loops in 90% of the time it took Robert's implementation to count the loops in the same graphs. But my implementation took only 38% of the time. My code wins. I think benchmarks can be a pretty accurate measure of performance, now I come to think about it.
I have my tongue in my cheek, in case you didn't know. I mean, these results are genuine, but I'm still wary of benchmarks. Just to confirm my code's superiority I changed
max_graph_size
to
50000
and reran the test. Here are the last few lines of output
. . . and so on . . . nodes 29381 edges 11279 loops 0 1ms : 0 3ms : 0 42ms nodes 8736 edges 46720 loops 3114 17774ms : 3114 28904ms : 3114 9482ms nodes 24733 edges 26201 loops 26 4ms : 26 11ms : 26 51ms nodes 13453 edges 35531 loops 3800 9737ms : 3800 22393ms : 3800 5261ms nodes 40050 edges 42545 loops 1445 473ms : 1445 22515ms : 1445 704ms nodes 9523 edges 35465 loops 4498 18358ms : 4498 22415ms : 4498 9013ms nodes 8233 edges 1458 loops 0 0ms : 0 1ms : 0 15ms nodes 39426 edges 13725 loops 0 2ms : 0 5ms : 0 56ms nodes 23209 edges 46433 loops 3839 5555ms : 3839 41132ms : 3839 3542ms nodes 20855 edges 31847 loops 1462 362ms : 1462 9499ms : 1462 565ms nodes 19441 edges 19568 loops 0 1ms : 0 2ms : 0 31ms nodes 22524 edges 30654 loops 1701 765ms : 1701 14722ms : 1701 907ms nodes 16581 edges 43258 loops 4331 16599ms : 4331 50270ms : 4331 7703ms nodes 8926 edges 40470 loops 4035 24388ms : 4035 34164ms : 4035 11233ms nodes 41346 edges 27901 loops 0 2ms : 0 6ms : 0 58ms nodes 11934 edges 16560 loops 944 151ms : 944 2433ms : 944 305ms nodes 23446 edges 16700 loops 0 1ms : 0 3ms : 0 34ms nodes 23147 edges 16770 loops 0 2ms : 0 3ms : 0 34ms nodes 1601 edges 43744 loops 1487 4655ms : 1487 4644ms : 1487 5378ms nodes 8003 edges 15245 loops 1144 179ms : 1144 1550ms : 1144 335ms ant ms 312512 russcox ms 755303 rhundt ms 177003 ant 176% russcox 426% rhundt 100%
|
In this test my code took twice as long as Robert's and Russel's code took twice as long as mine! Robert's code wins. I'm going off benchmarks again.
Three different benchmarks. Three different winners. It would be interesting to analyse the performance characteristics of these three implementations of the same algorithm to understand what's happening. But that's for another time.
Is it readable?
From the user's perspective code readability is neither here nor there. But in my experience from the programmer's perspective, you are more likely to achieve some of the things the user does care about, like being correct, if you can reason about the code. And to reason about the code you need to understand it. And code readability aids understanding. I believe the way the code looks is of lesser importance than other goals, but it isn't of no importance.
The core analyze_loops algorithm in the Havlak paper (reproduced in the Google paper) is written in a mathematical pseudocode. It is 30 lines long. I think there is a fairly clear resemblance to that pseudocode in my implementation, which is 53 lines long. One line of pseudocode to two of real code is a pretty good ratio, I think. (Note: I don't count blank lines, comments or lines containing only a curly brace.)
In general I think the resemblance between someone's description of an algorithm and the concrete realisation of that algorithm in a real programming language that actually works on a real computer is a good thing. Sometimes the algorithm description is clear but the computer language doesn't allow the realisation to reflect it.
It may be that computer scientists who publish algorithms often describe them in a way suited to implementation in C++ because they all use C or C++ to develop their algorithms. I don't know.
In what way is an algorithm "readable?" Some algorithms are simple and you can just see what they do without much difficulty. And some you can't keep enough in your head to just "see." In that case you resort to pencil and paper and boxes joined to other boxes with arrows. You work through simple concrete examples following each step in the algorithm to see how it transforms the data. That's what I do anyway.
Candid conclusion
The code is short, but I didn't find it easy to write. I bought the Havlak paper for $15 from the
ACM, but I didn't find it easy to move from the abstract description of the complex algorithm in the paper to the concrete representation of a working program; without the two Google examples by Robert Hundt and Russel Cox I may not have succeeded. I do not intend anything I've said here to imply any criticism of those two guys' work; I don't know them personally but I have no doubt they have a far deeper understanding of the Havlak algorithm than I do. But I suspect they were not very interested in the C++ implementation of this algorithm, perhaps because C++ is not a cool language. (Update: I regret saying this last bit; how do I know how interested they were? It was a snide comment. Sorry.)
One of my difficulties was a certain amount of confusion about when I should be using the basic block name, an int, and when the preorder number, also an int. Using the two typedefs for these two concepts helped me keep them separate.
The most interesting thing I learned was the UNION/FIND algorithm, which I'm sad to say, I had never before seen (or if I had I had forgotten). That path compression is really neat.
In the above narrative I may have made it sound like I implemented the algorithm, checked it against the Google results and found it was correct. This is not how it happened. I fiddled around for some time getting wrong answers and trying to figure out what mistakes I had made. So what? I believe a poet might work on a poem for years before it's published. Right matters. Right first time is overrated. Since I'm not 100% sure this code
is right I'm going to stop pontificating right here.
index of blog posts
Congratulations for this really great job !
ReplyDeleteOur congratulations are particulary sincere for we have (almost) been through the same experiment :
reimplementing the Hundt's C++ version.
Our approach was different, we are developping a new langage (a new hammer ;-), so we wanted to produce exact "cut and past" of Hundt's version : "you push back an element ? ok, we do it too, next move ?", and so on...
but with our own langage/STL...
We even copied the getters/setters functions.
We were not really interested in the algorithm efficiency itself but by the comparison between C++ and Cawen.
The results are here : http://melvenn.com/en/home/
We don't know if C++ is cool or not ;-), but we will be very happy to implement your own version as soon as we find some time to do it !
TS & GC
Hi TS & GC. Thank you for your kind compliment. I wish you well with your endeavour (I looked but couldn't actually find any code written in your new language). I think computer languages are fascinating and useful things. I'd love to invent one myself. I recently read that when Jerry Weinberg was asked what his greatest contribution to the field of computer science was he said "That's easy. I answered that some years ago, and my answer hasn't changed. My biggest contribution to the software world is that I never invented yet another programming language." (from http://jonjagger.blogspot.co.uk/2010/09/interview-with-jerry-weinberg.html)
DeleteHe he he, the quote is excellent.
ReplyDeleteIt reminds me of this one : "The nice thing about standards is that there are so many of them to choose
from."
But, please believe that we did not do too much harm to the software world, since
1) Cawen is not exactly a new langage...It should better be considered as a way to template (struct and functions) good old C (who has never been cool but does not mind, for he knows he is the King ;-)...It's some kind of "back to the future VI" language...You can download the C99 sources the Cawen preprocessor has generated here :http://www.melvenn.com/data/havlak_cawen_benchmark.tgz
2) may be no one will never hear about it expect you, R.Hundt, my friend Thomas and me.
regards,
Gwenael Chailleu
Great Job, Anthony, just saw this now...
ReplyDeleteRegards
Robert Hundt
Hi Robert. I'm honoured you took time to read it. Thank you for creating the original paper.
Delete