I've been struggling with a blog post that may never see the light of day. In the mean time, have you seen this?
About Me
Anthony C. Hay
23 June 2011
27 November 2010
Lisp interpreter in 90 lines of C++
Just for fun I wondered if I could write one in C++. My goals would be
1. A Lisp interpreter that would complete Peter's Lis.py test cases correctly...
2. ...in no more than 90 lines of C++.
Although I've been thinking about this for a few weeks, as I write this I have not written a line of the code. I'm pretty sure I will achieve 1, and 2 will be... a piece of cake!
In one short line of Python Mr. Norvig implements Lisp functions car, cdr and append. Another line and we've done the four basic mathematical operators on bignums. Gulp.
To give myself any sort of chance I don't intend to support bignums, garbage collection or error handling and I'm only going to implement the bare minimum to pass the test cases.
. . .
OK, I've done it. Here it is:
|
With Lis.py to guide me writing this was fairly straight-forward, with two exceptions. The first point I had to stop and think was in
eval()
when the symbol to be evaluated was "lambda
", i.e. at the point of definition of the lambda. In Peter's code he returns a Python lambda; not something available to me. Then I realised that I didn't have to actually do anything with the lambda body, just keep it around until it needs to be executed. That's why I can just change the cell type from Symbol to Lambda and return it as-is.The second pause for thought was much longer; this time it was at the point of execution of the lambda. Partly because I don't know Python (or Scheme) it wasn't clear to me what was going on with the environment. The problem was that lambdas that returned lambdas didn't work: variables that were defined at the time the lambda was returned were not defined at the time that returned lambda was executed.
It took a little while to work through the problem, but eventually I did. It helped to look at some simple examples to see where things went wrong. Here is my program executing some sample code adapted from an article that appeared in February 1988 Byte magazine:
|
Clearly,
doubler
is not only associated with the procedure (lambda (y) (* y n))
but it must also have access to an environment where the value of n
is 2
.I added some code to the start of the
eval
function to dump out the value of the paramaters x
and env
each time it is called. Here is the output while executing some of the above sample code.
|
"
[global]
" means the global environment where "*
" maps to the built-in proc_mul
, etc."
[y->4 [n->2 [global]]]
" means an innermost environment where y
maps to 4
, that environment's outer environment where n
maps to 2
and that environemt's outer environment, which is the global environment.When looking up the value of a variable the code starts with the innermost environment and if the variable is not defined there it tries the next outer environment and so on.
I don't know whether I've explained that clearly enough to make sense. The important bit I realised is that a new environment must be created when a lambda is evaluated containing (a) all the paramater->argument associations, and (b) this new environment's outer environment must be the reference to the environment that existed at the time the lambda was defined. I believe this is called a lexical closure.
Here's my program executing a another example I like, this time adapted from that Wikipedia article on lexical closures. It just underlines that the closure captures the variable, not just the value of the variable, and that the captured variable may be changed.
|
Here is another example, again adapted from the Wikipedia article on closures. This demonstrates that not only can the closure capture existing variables, it also captures variables created inside the closure (the variable
hidden
in this case). Also in this example, two procedures are created that share the same closure. (I spread some of the code over several lines to make it more readable; it has to be entered all on one line for my primitive program.)
|
Testing
Here are the 29 tests for Lis.py. The
main()
function in the code above is replaced by all this code to run the tests.
|
All 29 tests pass. Goal 1 achieved.
Goal 2: I'm not counting blank lines or comment-only lines, and I'm not counting lines that contain only an opening or closing curly brace, because I'm just not counting them.
|
Not quite 90. But south of 200 isn't bad, is it? (Just to be clear: I'm not claiming I've produced the equivalent of Lis.py in under 200 lines of C++.)
Conclusion
Compared with Lis.py my effort is incomplete, inefficient and leaks memory. Unlike some of the code I've presented in other blog posts the end result here is not useful in itself; it was the process of developing it that was useful, if only for me.
Nevertheless, I get a little thrill out of having made something that allows me to type
(define cube (lambda (n) (* n n n)))
and then (cube 3)
. It's like magic.EDIT June 2013: Since I published this post it has been viewed 37,561 times, which is a big deal for me. Occasionally someone will ask me if I'll allow the code to be used for their project and I always say yes. To make it simpler I've removed the copyright and placed it in the public domain.
17 June 2010
Why?
Actually, I'm not sure I like the name any more. I think at the time I wanted something simple and straight-forward. It isn't meant to be ironic!
(EDIT: I should also have said that the idea came from the title of an article written by Ian Fleming: How to Write a Thriller.)
I've been programming since I was 16 years old. I enjoy writing code. I write code for nothing, like I did when I was 16, as well as writing code for a living. I hope to be writing code on the day I die.
14 February 2010
Write a C function taking two strings...
My big question: "Write a C function taking two strings and returning a string containing one copy of the characters that occur in both." Hint: Spend some time looking at one of the strings and use an array of flags.
Given that hint, I guess we can assume the strings contain only 8-bit single-byte characters and the sort of thing the Google interviewers might have been looking for was something like this:
|
Yes, I wrote it in C++. (I'm the kind of guy Linus Torvalds loves to hate and Peter Seibel doesn't understand.) So how would we get the result back to the caller in C? I can think of about a dozen ways, all of which are also available to me in C++, and all of which are more tedious and error-prone both for the implementation and for the callers of the function. Of course, if it made sense to use one of those methods in the context in which this function was to operate, then I would use it.
Going back to the whiteboard: if I was standing in front of a hyper-intelligent Googler I would probably wet my pants and I suspect I would have written the function differently: I wouldn’t have commented it; I probably would have written 256 instead of UCHAR_MAX + 1; more seriously, I may have forgotten to cast away the (possible) sign of the char when used as an array index. All of these things would have occurred to me only after I'd left the building. (And, since the requirement was for a C function, I guess I would have had to fiddle around with buffers and buffer sizes and null terminators and shit.)
In the real world I would unit test this function with something like this:
|
I've been writing code for a long time - long enough to know that I make mistakes. I have found unit tests to be an excellent way to pick up some of those mistakes as well as being useful for other reasons I'll write about another time.
Of course, passing a bunch of unit tests does not mean the code doesn't have horrible bugs in it. For example, I tested this code under a compiler where
char
s are signed. If you remove the static_cast<unsigned char>()
casts the unit tests still all pass. What this means is that when *a
is '\x80'
the expression occurs_in_a[*a++] = 1
is similar to occurs_in_a[-128] = 1
and this sets a byte to 1 on the stack 128 bytes before the occurs_in_a array. But that's alright (NOT!) because if (occurs_in_a[*b])
will look for it in the same place, and find it there, which is why the tests still pass. But the stack is now corrupted, the consequences of which may be nil or catastrophic, immediate or postponed.19 July 2009
Einstein's Riddle
Perhaps 'sucky' is a bit harsh. In its favor I believe it to be correct: it cycles through all 24,883,200,000 (= 5^5 x 4^5 x 3^5 x 2^5 x 1^5) possible solutions and finds the one and only combination that satisfies all the requirements. It is also a straight-forward representation of the stated riddle, and was therefore quick to write. On the downside, it does take 10 minutes to find the solution, which does not compare well with the solutions referred to above.
|
Update November 2013:
I wanted to create a solution that was nearer to the performance of Edi Weitz’s Lisp code, referred to above. Here is what I came up with:
|
Notes:
- Let's call my original code Einstine09 and my new code Einstine13.
- Einstine13 run time is 0.012s (real 0.012s, user 0.001s, sys 0.002s) on a 2.8GHz Intel E8235 processor. (But an "int main() {}" runs in 0.010s (real), so should we subtract that from 0.012s and call it 2ms? I'm not sure. It doesn't really matter; what matters is that Einstine13 is in the same ballpark as the fastest solutions given by others, and not in the next county as Einstine09 was.) Einstine09 runs in 350.808s (real) on the same machine, compiled with the same compiler and -O3 option. Choosing a better algorithm gave us a 30,000× speed improvement (350.808 / 0.012 = 29,234).
- There are 5 different attributes (such as colour, nationality) and each attribute has 5 unique values. There are 5! permutations of 5 unique values so the total number of possible solution states is (5!)5 = 24,883,200,000. Einstine09 calls is_solution() for each and every one of these states and is sloooow. Einstine13 calls the possible() functions a total of just 18,360 times and is much faster.
- I knew the problem was one of combinatorial search, like the eight queens problem. But I couldn’t see how to prune the search tree by applying all the rules at each branch because a rule might refer to something that hadn’t yet been decided: you can’t chop off a branch where the Winfield smoker doesn’t drink beer if you don’t yet know who the Winfield smoker is. And so on. My personal aha! moment came while reading Edi Weitz’s explanation of his Lisp solution: “Note that every condition has to return T if [...] not all of its parameters are defined yet.” Yes, it’s obvious to me too, now.
- Reading Edi Weitz’s commentary gave me one of the clearest glimpses I’ve had of the secret ingredient in Lisp and I recommend it to anyone struggling to understand why so many Lisp cognoscenti rave about the language. But I have to say that without his explanation I think I would have struggled to understand his code. My C++ solution is fast, but it's specific to solving the Einstein problem. Dr. Weitz's Lisp solution is also fast, but in more-or-less the same amount of code he has created a domain-specific language for solving any Einstine-like problem. Thank you Dr. Weitz for sharing it.
13 May 2009
Lutz Prechelt's Language Comparison
“Interpersonal variability, that is the capability and behavior differences between programmers using the same language, tends to account for more differences between programs than a change of the programming language.”Over the years some similar studies were done for other languages and various individuals have taken up the challenge using the language of their choice; D, for example. So I thought I’d give it a go (see below). Note that I wrote this in the context of a challenge where keeping the number of lines of code to a minimum was more in my mind than usual. For example, code I usually work on would need to take some action if an attempted file open failed.
I’m rather pleased that my 49 line C++ effort is only 4 lines longer than the great Peter Norvig’s Lisp version.
It signifies not much; it’s just a bit of fun. If you want to try it you’ll need the data sets. My output is the same as the expected output, but in a different order, which you may verify with sort and diff.
|