13 May 2009

Lutz Prechelt's Language Comparison

About a decade ago Lutz Prechelt conducted a study to compare various computer languages, including C, C++, Java and Python. In the study he asked about 70 different programmers to code their own implementation of a given set of requirements. You can read all about it here. I think the most interesting conclusion he draws is that
“Interpersonal variability, that is the capability and behavior differences between programmers using the same language, tends to account for more differences between programs than a change of the programming language.”
Over the years some similar studies were done for other languages and various individuals have taken up the challenge using the language of their choice; D, for example. So I thought I’d give it a go (see below). Note that I wrote this in the context of a challenge where keeping the number of lines of code to a minimum was more in my mind than usual. For example, code I usually work on would need to take some action if an attempted file open failed.

I’m rather pleased that my 49 line C++ effort is only 4 lines longer than the great Peter Norvig’s Lisp version.

It signifies not much; it’s just a bit of fun. If you want to try it you’ll need the data sets. My output is the same as the expected output, but in a different order, which you may verify with sort and diff.


// Anthony C. Hay - my implementation of Lutz Prechelt's requirements
// see http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprtTR.pdf
// and http://www.flownet.com/ron/papers/lisp-java/
// and http://www.norvig.com/java-lisp.html
// and http://www.digitalmars.com/d/lisp-java-d.html

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <map>

// word_map: the key is the sequence of digits that encodes
// all the words in the associated vector
typedef std::vector<std::string> words;
typedef std::map<std::string, words> word_map;

// return digit associated with given letter 'ch' (must be in a..z or A..Z)
char encode(char ch)
{
/*
E | J N Q | R W X | D S Y | F T | A M | C I V | B K U | L O P | G H Z
e | j n q | r w x | d s y | f t | a m | c i v | b k u | l o p | g h z
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
*/
return "57630499617851881234762239"[toupper(ch) - 'A'];
}

// return digit sequence corresponding to given 'word'
std::string encode(const std::string & word)
{
std::string encoded_word;
for (const char * p = word.c_str(); *p; ++p) {
if (*p != '"') // ignore embedded double quotes (umlauts)
encoded_word += encode(*p);
}
return encoded_word;
}

// return the word map for dict.txt
word_map read_words()
{
word_map wp;
std::ifstream is("dictionary.txt");
std::string word;
while (std::getline(is, word))
wp[encode(word)].push_back(word);
return wp;
}

// return the given number with all the non-digit characters removed
std::string filtered(const std::string & num)
{
std::string result;
for (const char * p = num.c_str(); *p; ++p) {
if (isdigit(*p))
result += *p;
}
return result;
}

// the (messy) heart: find and print the sentences
void find_sentences(const word_map & dict, const std::string & rawnum,
const std::string & num, const std::string & sentence, bool trailing_digit)
{
if (num.empty()) {
if (!sentence.empty())
std::cout << rawnum << ":" << sentence << '\n';
}
else {
bool found_word = false;
for (size_t i = 2; i <= num.size(); ++i) {
word_map::const_iterator entry = dict.find(num.substr(0, i));
if (entry != dict.end()) {
found_word = true;
for (unsigned j = 0; j < entry->second.size(); ++j)
find_sentences(dict, rawnum, num.substr(i), sentence + " " + entry->second[j], false);
}
}
if (!found_word && !trailing_digit)
find_sentences(dict, rawnum, num.substr(1), sentence + " " + num[0], true);
}
}

int main()
{
const word_map dict(read_words());

std::ifstream is("input.txt");
std::string raw_number;
while (std::getline(is, raw_number))
find_sentences(dict, raw_number, filtered(raw_number), "", false);
}

// that's 49 non-blank, non-comment, non-just-curly-brace lines

index of blog posts

No comments:

Post a Comment