I was dipping into the literate source of
Donald Knuth's TeX.web when I came upon this
The TeX.web source that generates this snippet looks like Listing A.
Listing A:@ Roman numerals are produced by the |print_roman_int| routine. Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given. Notice that 1990 yields \.{mcmxc}, not \.{mxm}.
@p procedure print_roman_int(@!n:integer); label exit; var j,@!k: pool_pointer; {mysterious indices into |str_pool|} @!u,@!v: nonnegative_integer; {mysterious numbers} begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000; loop@+ begin while n>=v do begin print_char(so(str_pool[j])); n:=n-v; end; if n<=0 then return; {nonpositive input produces no output} k:=j+2; u:=v div (so(str_pool[k-1])-"0"); if str_pool[k-1]=si("2") then begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); end; if n+u>=v then begin print_char(so(str_pool[k])); n:=n+u; end else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); end; end; exit:end;
|
I converted Knuth's original Pascal more-or-less directly into C++, except that this version returns the Roman numerals as a string rather than printing them, which makes it referentially transparent and easier to test.
// converted from Donald Knuth's Pascal print_roman_int procedure in TeX.web std::string roman_int(int n) { std::string result; const char * j = "m2d5c2l5x2v5i"; int v = 1000; for (;;) { while (n >= v) { result += *j; n = n - v; } if (n <= 0) break; // nonpositive input produces no output const char * k = j + 2; int u = v / (k[-1] - '0'); if (k[-1] == '2') { k = k + 2; u = u / (k[-1] - '0'); } if (n + u >= v) { result += *k; n = n + u; } else { j = j + 2; v = v / (j[-1] - '0'); } } return result; }
|
The wikipedia article on
Roman numerals shows this table:
| ×1 | ×2 | ×3 | ×4 | ×5 | ×6 | ×7 | ×8 | ×9 |
---|
Ones | I | II | III | IV | V | VI | VII | VIII | IX |
---|
Tens | X | XX | XXX | XL | L | LX | LXX | LXXX | XC |
---|
Hundreds | C | CC | CCC | CD | D | DC | DCC | DCCC | CM |
---|
Thousands | M | MM | MMM | IV | V | VI | VII | VIII | IX |
---|
Ten thousands | X | XX | XXX | XL | L | LX | LXX | LXXX | XC |
---|
Hundred thousands | C | CC | CCC | CD | D | DC | DCC | DCCC | CM |
---|
From this table a pattern is fairly easy to see and I wrote my own version shown in Listing B (like Knuth I didn't include over-barred numerals, though they would be just as easy to add).
Listing B:// return lower-case Roman numeral equivelant of given 'num' std::string roman_int_ii(int num) { struct pair { const char * const symbol; int value; }; const pair table[] = { {"m", 1000}, {"cm", 900}, {"d", 500}, {"cd", 400}, {"c", 100}, {"xc", 90}, {"l", 50}, {"xl", 40}, {"x", 10}, {"ix", 9}, {"v", 5}, {"iv", 4}, {"i", 1} }; std::string result; for (const pair * p = table; num > 0; ++p) { while (num >= p->value) { result += p->symbol; num -= p->value; } } return result; }
|
And then I checked my version against Knuth's and found the two functions produce identical results:
int main() { for (int i = 0; i < 5000; ++i) { const std::string a(roman_int(i)); const std::string b(roman_int_ii(i)); if (a != b) { std::cout << "Doesn't work for " << i << " got '" << b << "' instead of '" << a << "'\n"; } } }
|
I enjoy reading bits and pieces of Knuth's literate code, such as TeX and Adventure. Knuth's commentary can be interesting and the code looks quite beautiful in its typeset form. But I'd prefer to be working with code like that in Listing B rather than that in Listing A, principally because I can readily understand B whereas A requires a lot more effort. I guess, as he hints in the comment, in this case he just wanted to have fun with some tricky code; he’s produced a kind of domain-specific language interpreter in a dozen or so lines of code. (I love code.) But it’s not just the choice of algorithm I’m uncomfortable with, it’s the WEB source code itself.
I have not written any code using Knuth’s literate programming tools but from what I’ve seen of what others have written I think that WEB sources are less readable than plain sources.
Here is a fragment of the CWEB source for Adventure:
@ Here is our overall strategy for administering the game. It is understood that the program might |goto quit| from within any of the subsections named here, even though the section names don't mention this explicitly. For example, while checking for interference we might find out that time has run out, or that a dwarf has killed you and no more reincarnations are possible. The execution consists of two nested loops: There are ``minor cycles'' inside of ``major cycles.'' Actions define minor cycles in which you stay in the same place and we tell you the result of your action. Motions define major cycles in which you move and we tell you what you can see at the new place. @<Simulate...@>= while (1) { @<Check for interference with the proposed move to |newloc|@>; loc=newloc; /* hey, we actually moved you */ @<Possibly move dwarves and the pirate@>; commence: @<Report the current state@>; while (1) { @<Get user input; |goto try_move| if motion is requested@>; @<Perform an action in the current place@>; } try_move: @<Handle special motion words@>; oldoldloc=oldloc; oldloc=loc; go_for_it: @<Determine the next location, |newloc|@>; } @ Our main task in the simulation loop is to parse your input. Depending on the kind of command you give, the following section of the program will exit in one of four ways: \smallskip \item{$\bullet$} |goto try_move| with |mot| set to a desired motion. \item{$\bullet$} |goto transitive| with |verb| set to a desired action and |obj| set to the object of that motion. \item{$\bullet$} |goto intransitive| with |verb| set to a desired action and |obj=NOTHING|; no object has been specified. \item{$\bullet$} |goto speakit| with |hash_table[k].meaning| the index of a message for a vocabulary word of |message_type|. \smallskip\noindent Sometimes we have to ask you to complete an ambiguous command before we know both a verb and its object. In most cases the words can be in either order; for example, \.{take} \.{rod} is equivalent to \.{rod} \.{take}. A~motion word overrides a previously given action or object. Lots of special cases make the program a bit messy. For example, if the verb is \.{say}, we don't want to look up the object in our vocabulary; we simply want to ``say'' it. @<Get user input...@>= verb=oldverb=ABSTAIN; oldobj=obj; obj=NOTHING; cycle: @<Check if a hint applies, and give it if requested@>; @<Make special adjustments before looking at new input@>; listen(); pre_parse: turns++; @<Handle special cases of input@>; @<Check the clocks and the lamp@>; @<Handle additional special cases of input@>; parse:@<Give advice about going |WEST|@>; @<Look at |word1| and exit to the right place if it completes a command@>; shift: strcpy(word1,word2);@+*word2='\0';@+goto parse;
|
The way the code is kind of lost in all the words and CWEB command codes and verbose section names, the way the structure is lost when some code is jammed up to the left margin, the way all the variables seem to end up global (how do you keep track of scope?), I just don’t like. And those goto labels all over the place! I really don’t know how you keep the structure in your head. I think I must be very visual: I need to see the structure on the screen.
Perhaps I’m mixing up all literate programming with the specific examples I’ve looked at; and anyway I might find it just as hard to read Mr. Knuth’s code if it was written in plain C. It’s clear that we are all different and we don’t all share the same coding style. And thank goodness for that! As I say, I haven’t worked with any literate code, just read some. So my opinions are just first impressions. Perhaps if the reason for the code is primarily to be typeset for exposition, the ugliness of the source is a price worth paying. But for ordinary working-class code I'd rather read and write plain idiomatic native code - but code written with care.
index of blog posts
No comments:
Post a Comment