Thursday 4 September 2008 @10:13
I was bitten by this bug in version 1.9.2 of the Moodle course management system. It prevented restoring a backup made by any newer version of Moodle. Of course, such a restore could fail if it relies on features particular to the newer version. So there’s a conditional that will display a message to that effect, but within the conditional was a typo:
//We compare Moodle's versions
if ($CFG->version < $info->backup_moodle_version && $status) {
- $message = new message();
+ $message = new object();
$message->serverversion = $CFG->version;
$message->serverrelease = $CFG->release;
$message->backupversion = $info->backup_moodle_version;
$message->backuprelease = $info->backup_moodle_release;
print_simple_box(get_string('noticenewerbackup','',$message),
"center"", '', "20", "noticebox");
}
In other words, by trying to warn the user about possibly erratic behavior, it caused definite failure.
It’s dreadfully familiar, isn’t it? Just a throwaway conditional that you assume will work without sufficient testing. Just a fault that slipped into a point release and was fixed in the next one. Not a big deal, but there’s a lesson here.
My position on the static/dynamic typing divide is well established, even though I’m not as entrenched as some. I will somewhat happily use Python or PHP for web applications. I advocate for Scheme, and I think Ruby is neat. But I’m still disappointed whenever I produce or find faults that by all rights should have been caught by a compiler. It just demonstrates the increased importance of automated testing with coverage analysis when using dynamic languages.
Wednesday 12 March 2008 @10:04
Bootstrapping a compiler can be a finicky process, because many compilers are written in the language that they compile. It’s easy to paint yourself into a corner if you’re not extremely vigilant about binaries and configuration management.
I wanted to try SMLserver, a system for writing database-backed web services using Standard ML. It is tightly integrated with the MLkit compiler.
Unfortunately, the distributed binaries would not run on my installation of Debian stable (codename ‘etch’) because they expect a newer version of ‘libc’, the main system library. Upgrading that library could be pretty disruptive, which defeats the purpose of running a ‘stable’ distribution.
Unlike many compiler code bases, MLkit can be built by compilers other than itself, namely SML/NJ and MLton. Ordinarily, this would simplify the process, except that the SML/NJ version it requires is ancient (and doesn’t itself compile out of the box anymore on this system) and MLton has extreme memory demands.
Fortunately, an older binary of MLkit (4.3.0) runs on this system, but at best that’s a starting point. Once a particular revision can bootstrap itself, it’s natural for the source language to evolve beyond what the previous revision could handle. But in this case, the changes were small. Version 4.3.0 lacks some pieces of the ‘Posix.FileSys’ module that 4.3.2 needs, but they were small enough to rewrite:
Index: src/Tools/MlbMake/MlbFileSys.sml
===================================================================
--- src/Tools/MlbMake/MlbFileSys.sml (revision 2311)
+++ src/Tools/MlbMake/MlbFileSys.sml (working copy)
@@ -49,9 +49,9 @@
| EQUAL => SysWord.compare (b,d)
fun unique link f =
- let val s = if link then Posix.FileSys.lstat f else Posix.FileSys.stat f
- in (Posix.FileSys.inoToWord(Posix.FileSys.ST.ino s),
- Posix.FileSys.devToWord(Posix.FileSys.ST.dev s))
- end
+ let val {dev,ino} = OS.FileSys.fileId f
+ in (Word.fromInt ino,
+ Word.fromInt dev)
+ end
end
Index: src/Manager/Manager.sml
===================================================================
--- src/Manager/Manager.sml (revision 2311)
+++ src/Manager/Manager.sml (working copy)
@@ -807,7 +807,6 @@
val fu = (Posix.IO.close (Posix.FileSys.creat (lockfile ^ unique,Posix.FileSys.S.iwusr)) ; true) handle OS.SysErr _ => false
val f = if fu
then (Posix.FileSys.link{old=lockfile ^ unique, new=lockfile}; true)
- handle OS.SysErr _ => Posix.FileSys.ST.nlink (Posix.FileSys.stat (lockfile ^ unique)) = 2
else false
in if fu then (Posix.FileSys.unlink (lockfile ^ unique); f) handle _ => f
else false
Version 4.3.0 can build 4.3.2 patched thusly, which can then bootstrap its unmodified self. Really, I lucked out here. Imagine having to do this across a major compiler release cycle!
Tuesday 20 February 2007 @8:07
I’m giving a talk on Thursday for our CS club (a student ACM chapter). Our compilers course is hardly ever offered, because it ends up being a fairly arcane topic considering the career goals of the majority of our students. There are ways to make it more relevant of course, but I don’t want to argue either way on that today.
Instead, I decided to put together a fun little talk for the club on some of the ‘big ideas’ in the area. Here’s the abstract:
One of the more profound concepts in computer science is compiler bootstrapping: very often, the compiler for a programming language is written in that language itself. This begs an almost mystical question: what compiles the compiler? (And what compiled that compiler, and so on…) The first part of this talk is an adaptation of the famous Turing Award speech “Reflections on Trusting Trust” by Ken Thompson, co-inventor of the UNIX operating system. We explore the bootstrapping concept, and how to exploit it to devious ends. The second part is a very brief introduction to program analysis and compiler optimization, using static single assignment form.
I’ve been experimenting with some code to show that technique of teaching the compiler once, and then removing it from the source. Ken Thompson used the example of control codes, with a fragment of code that essentially said case '\n': return '\n'; but I found an enlightening post on the topic that cites this trick, from a Pascal compiler:
insertSymbolConstantBinding("INTEGER_MAX", INTEGER_MAX)
The value of INTEGER_MAX was ‘taught’ to the compiler at some point in its evolution, but since the compiler compiles itself, it is no longer needed. Pray you don’t lose all the binaries!
I’d like to turn the talk into a screen-cast, just because I’d like to experiment with that as a pedagogical format, and (I think) I have the tools. We’re starting to see lots of ‘Web 2.0’ tool-builders publish video tutorials. I doubt I’ll record any audio/video directly during my talk, but rather use that as a trial run of the script. Watch this space!
Monday 24 April 2006 @16:20
Each spring, our campus has an event called “discovery day,” where faculty and students get together in one space to exhibit their latest research and scholarship, using posters and demonstrations. In some ways, a presentation at an event like this is necessarily shallow; it just isn’t possible to appreciate one another’s contributions without a lot of background in the field. Sometimes this is even hard with different sub-disciplines within a computer science department, to say nothing of exhibiting to your colleagues in chemistry, media arts, or political science. Nevertheless, I think it’s a great exercise. I’m reminded of Feynman’s sentiment (paraphrased): “If I can’t teach a freshman lecture on it, it means I don’t really understand it.”
Last year, I presented my work on MetaOCaml Server Pages, which at that point was under consideration for a journal (now accepted). This year, I wanted to exhibit some work that is more preliminary. In fact, this is the first public mention of the result. My poster is called Type-based compression of XML streams (1.4M, PDF). [If you choose to print it, be careful to specify shrink-to-fit-page, because the PDF is formatted for the full size 40×30ʺ poster. It should still be readable shrunk to a letter-size page.]
The idea is basically to use information from the XML document type to encode the tree structure extremely compactly. Depending on the nature of the document type, many tags will take no space at all in the compressed form, and others are represented in just a few bits. This is essentially similar to how Amme, et al. encode their SafeTSA intermediate language.
In fact, my goal for this work is to be able to use XML libraries and tools for compiler intermediate representations. Other than that application area, I was never especially interested in XML. To me, it seemed that S-expressions were an improvement on XML, and they were around in the 1950s. (Lately I’m finding many advances in computing seem to be summed up by the phrase “same sh*t, different buzzword.”)
Our results are still very preliminary, and I’m probably not ready to release the code just yet. But as it is implemented now, our technique does extremely well on very ‘taggy’ XML documents, and is still somewhat reasonable (though not always the winner) on more ‘texty’ XML documents. Our best-case examples are XML results from the National Center for Biotechnology Information at NIH. I expect compiler intermediate representations to be extremely taggy too, but more on that as things develop.
One important advantage of our technique (over just gzipped XML, for example) is that I can generate SAX events directly from the compressed tree representation. This should prove to be much faster than uncompressing to text, and then re-parsing the XML.