# How to Write a Program

Anthony C. Hay

## 15 January 2010

### Roman Numbers

I was dipping into the literate source of Donald Knuth's TeX.web when I came upon this

The TeX.web source that generates this snippet looks like Listing A.

Listing A:
 @ Roman numerals are produced by the |print_roman_int| routine. Readerswho like puzzles might enjoy trying to figure out how this tricky codeworks; therefore no explanation will be given. Notice that 1990 yields\.{mcmxc}, not \.{mxm}.@p procedure print_roman_int(@!n:integer);label exit;var j,@!k: pool_pointer; {mysterious indices into |str_pool|}@!u,@!v: nonnegative_integer; {mysterious numbers}begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000;loop@+ begin while n>=v do begin print_char(so(str_pool[j])); n:=n-v; end; if n<=0 then return; {nonpositive input produces no output} k:=j+2; u:=v div (so(str_pool[k-1])-"0"); if str_pool[k-1]=si("2") then begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); end; if n+u>=v then begin print_char(so(str_pool[k])); n:=n+u; end else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); end; end;exit:end;

I converted Knuth's original Pascal more-or-less directly into C++, except that this version returns the Roman numerals as a string rather than printing them, which makes it referentially transparent and easier to test.

 // converted from Donald Knuth's Pascal print_roman_int procedure in TeX.webstd::string roman_int(int n){ std::string result; const char * j = "m2d5c2l5x2v5i"; int v = 1000; for (;;) { while (n >= v) { result += *j; n = n - v; } if (n <= 0) break; // nonpositive input produces no output const char * k = j + 2; int u = v / (k[-1] - '0'); if (k[-1] == '2') { k = k + 2; u = u / (k[-1] - '0'); } if (n + u >= v) { result += *k; n = n + u; } else { j = j + 2; v = v / (j[-1] - '0'); } } return result;}

The wikipedia article on Roman numerals shows this table:

×1×2×3×4×5×6×7×8×9
OnesIIIIIIIVVVIVIIVIIIIX
TensXXXXXXXLLLXLXXLXXXXC
HundredsCCCCCCCDDDCDCCDCCCCM
ThousandsMMMMMMIVVVIVIIVIIIIX
Ten thousandsXXXXXXXLLLXLXXLXXXXC
Hundred thousandsCCCCCCCDDDCDCCDCCCCM

From this table a pattern is fairly easy to see and I wrote my own version shown in Listing B (like Knuth I didn't include over-barred numerals, though they would be just as easy to add).

Listing B:
 // return lower-case Roman numeral equivelant of given 'num'std::string roman_int_ii(int num){ struct pair { const char * const symbol; int value; }; const pair table[] = { {"m", 1000}, {"cm", 900}, {"d", 500}, {"cd", 400}, {"c", 100}, {"xc", 90}, {"l", 50}, {"xl", 40}, {"x", 10}, {"ix", 9}, {"v", 5}, {"iv", 4}, {"i", 1} }; std::string result; for (const pair * p = table; num > 0; ++p) { while (num >= p->value) { result += p->symbol; num -= p->value; } } return result;}

And then I checked my version against Knuth's and found the two functions produce identical results:

 int main(){ for (int i = 0; i < 5000; ++i) { const std::string a(roman_int(i)); const std::string b(roman_int_ii(i)); if (a != b) { std::cout << "Doesn't work for " << i << " got '" << b << "' instead of '" << a << "'\n"; } }}

I enjoy reading bits and pieces of Knuth's literate code, such as TeX and Adventure. Knuth's commentary can be interesting and the code looks quite beautiful in its typeset form. But I'd prefer to be working with code like that in Listing B rather than that in Listing A, principally because I can readily understand B whereas A requires a lot more effort. I guess, as he hints in the comment, in this case he just wanted to have fun with some tricky code; he’s produced a kind of domain-specific language interpreter in a dozen or so lines of code. (I love code.) But it’s not just the choice of algorithm I’m uncomfortable with, it’s the WEB source code itself.

I have not written any code using Knuth’s literate programming tools but from what I’ve seen of what others have written I think that WEB sources are less readable than plain sources.

Here is a fragment of the CWEB source for Adventure:

 @ Here is our overall strategy for administering the game. It is understoodthat the program might |goto quit| from within any of the subsectionsnamed here, even though the section names don't mention this explicitly.For example, while checking for interference we might find out thattime has run out, or that a dwarf has killed you and no morereincarnations are possible. The execution consists of two nested loops: There are minorcycles'' inside of major cycles.'' Actions define minor cycles in whichyou stay in the same place and we tell you the result of your action.Motions define major cycles in which you move and we tell you what youcan see at the new place. @=while (1) { @; loc=newloc; /* hey, we actually moved you */ @; commence: @; while (1) { @; @; } try_move: @; oldoldloc=oldloc; oldloc=loc; go_for_it: @;} @ Our main task in the simulation loop is to parse your input.Depending on the kind of command you give, the following sectionof the program will exit in one of four ways: \smallskip\item{$\bullet$} |goto try_move| with |mot| set to a desired motion. \item{$\bullet$} |goto transitive| with |verb| set to a desired actionand |obj| set to the object of that motion. \item{$\bullet$} |goto intransitive| with |verb| set to a desired actionand |obj=NOTHING|; no object has been specified. \item{$\bullet$} |goto speakit| with |hash_table[k].meaning| theindex of a message for a vocabulary word of |message_type|. \smallskip\noindentSometimes we have to ask you to complete an ambiguous command before weknow both a verb and its object. In most cases the words can be ineither order; for example, \.{take} \.{rod} is equivalent to \.{rod} \.{take}.A~motion word overrides a previously given action or object. Lots of special cases make the program a bit messy. For example,if the verb is \.{say}, we don't want to look up the object in ourvocabulary; we simply want to say'' it. @=verb=oldverb=ABSTAIN;oldobj=obj;obj=NOTHING;cycle: @;@;listen();pre_parse: turns++;@;@;@;parse:@;@;shift: strcpy(word1,word2);@+*word2='\0';@+goto parse;

The way the code is kind of lost in all the words and CWEB command codes and verbose section names, the way the structure is lost when some code is jammed up to the left margin, the way all the variables seem to end up global (how do you keep track of scope?), I just don’t like. And those goto labels all over the place! I really don’t know how you keep the structure in your head. I think I must be very visual: I need to see the structure on the screen.

Perhaps I’m mixing up all literate programming with the specific examples I’ve looked at; and anyway I might find it just as hard to read Mr. Knuth’s code if it was written in plain C. It’s clear that we are all different and we don’t all share the same coding style. And thank goodness for that! As I say, I haven’t worked with any literate code, just read some. So my opinions are just first impressions. Perhaps if the reason for the code is primarily to be typeset for exposition, the ugliness of the source is a price worth paying. But for ordinary working-class code I'd rather read and write plain idiomatic native code - but code written with care.

index of blog posts