Wednesday 19 March 2008 @18:10
This short video (3:24) is about an idea I have for an automated tool that would tutor and assess students on fundamental skills in computer programming. I call it the ‘codeTutor’. I find that students that need a lot of practice on rudiments are often not willing or not confident enough to create their own exercises, and I don’t have the time to keep them inundated with practice problems. So here’s my idea:
If you think this is a {useful,great,dumb,old,…} idea, leave a comment or get in touch by email. If you’d like to participate in building or testing, even better. I’d want the tool to run as some kind of {Flash,Java} applet, or maybe Java WebStart program, just to lower the barrier to running it. Since it will do some non-trivial manipulation of syntax trees (and for a variety of other good reasons), I may write it in Scala instead of just Java. I’ve been toying with Scala for a few days, even producing little Swing (GUI) apps, and I’m fairly pleased.
Wednesday 12 March 2008 @10:04
Bootstrapping a compiler can be a finicky process, because many compilers are written in the language that they compile. It’s easy to paint yourself into a corner if you’re not extremely vigilant about binaries and configuration management.
I wanted to try SMLserver, a system for writing database-backed web services using Standard ML. It is tightly integrated with the MLkit compiler.
Unfortunately, the distributed binaries would not run on my installation of Debian stable (codename ‘etch’) because they expect a newer version of ‘libc’, the main system library. Upgrading that library could be pretty disruptive, which defeats the purpose of running a ‘stable’ distribution.
Unlike many compiler code bases, MLkit can be built by compilers other than itself, namely SML/NJ and MLton. Ordinarily, this would simplify the process, except that the SML/NJ version it requires is ancient (and doesn’t itself compile out of the box anymore on this system) and MLton has extreme memory demands.
Fortunately, an older binary of MLkit (4.3.0) runs on this system, but at best that’s a starting point. Once a particular revision can bootstrap itself, it’s natural for the source language to evolve beyond what the previous revision could handle. But in this case, the changes were small. Version 4.3.0 lacks some pieces of the ‘Posix.FileSys’ module that 4.3.2 needs, but they were small enough to rewrite:
Index: src/Tools/MlbMake/MlbFileSys.sml
===================================================================
--- src/Tools/MlbMake/MlbFileSys.sml (revision 2311)
+++ src/Tools/MlbMake/MlbFileSys.sml (working copy)
@@ -49,9 +49,9 @@
| EQUAL => SysWord.compare (b,d)
fun unique link f =
- let val s = if link then Posix.FileSys.lstat f else Posix.FileSys.stat f
- in (Posix.FileSys.inoToWord(Posix.FileSys.ST.ino s),
- Posix.FileSys.devToWord(Posix.FileSys.ST.dev s))
- end
+ let val {dev,ino} = OS.FileSys.fileId f
+ in (Word.fromInt ino,
+ Word.fromInt dev)
+ end
end
Index: src/Manager/Manager.sml
===================================================================
--- src/Manager/Manager.sml (revision 2311)
+++ src/Manager/Manager.sml (working copy)
@@ -807,7 +807,6 @@
val fu = (Posix.IO.close (Posix.FileSys.creat (lockfile ^ unique,Posix.FileSys.S.iwusr)) ; true) handle OS.SysErr _ => false
val f = if fu
then (Posix.FileSys.link{old=lockfile ^ unique, new=lockfile}; true)
- handle OS.SysErr _ => Posix.FileSys.ST.nlink (Posix.FileSys.stat (lockfile ^ unique)) = 2
else false
in if fu then (Posix.FileSys.unlink (lockfile ^ unique); f) handle _ => f
else false
Version 4.3.0 can build 4.3.2 patched thusly, which can then bootstrap its unmodified self. Really, I lucked out here. Imagine having to do this across a major compiler release cycle!
Monday 4 February 2008 @16:46

Hm, why is my laptop running so hot this morning? Hm, what is that new icon in my menu bar? Why, it’s the Cisco Clean Access agent, eating up 100% CPU time! While I’m on a wired connection. Even if they don’t know how to tie into the MacOS network configuration properly, you’d think they might have heard of the sleep() system call…
Friday 21 December 2007 @14:43
A few months back, I was at a workshop on campus where some other staff member noticed my Mac and remarked (disapprovingly) on how quickly some hackers had gotten the latest OS X release running on stock (non-Apple) x86 hardware. “Why do people do that… why not just buy a Mac?” he asked. “They’re good computers.” I offered that I might like to have such skilled hackers as students (or employees). This seemed to surprise him, but I think it’s actually true.
For the moment, let’s set aside arguments supporting the morality of digital restrictions circumvention (legal or not). Just assume that the hack is ethically dubious. Assume also that it’s non-trivial. Then, I maintain that I would aim to recruit the hacker. Why? Ethics can be instilled, but raw technical talent is so rare that it’s still a net win.
Monday 9 April 2007 @22:44
For my rngzip XML compression tool, written in Java, I wanted to incorporate LZMA, a fast compressor based on Lempel-Ziv (like gzip), but with better compression ratios and the same or better speed. Apparently 7-Zip is somewhat popular in the Microsoft world, but has not made much impact on Unix yet, perhaps because the original implementation was extremely dependent on Microsoft APIs. That seems to be changing: ‘lzma’ and ‘p7zip-full’ packages are available in the new Debian etch release.
Anyway, there is a Java implementation of LZMA, but it does not use the standard FilterOutputStream approach of the java.util.zip package — closely followed for implementations of BZip and PPM-based arithmetic coders. And it’s not nearly as simple as changing around some method names.
No, the LZMA encoder wants to pull bytes from an InputStream and push the compressed representation to an OutputStream. In contrast, the FilterOutputStream approach is a ‘push’ model — the caller remains in control, pushing bytes to the encoder as it pleases. I had seen this contrast before, for example in XML parsing itself. The Simple API for XML (SAX) is a ‘push’ technique, which is why writing SAX event handlers is such a pain. I can hardly fault the author of the LZMA encoder for wanting to structure the code this way. It makes it much simpler, and if all you’re doing is compressing existing files, you don’t need a stream processor.
Of course, the programming languages geek in me recognizes this dilemma as a need for coroutines! The push/pull controversy is really about which module wants to be in control. With a single thread of control, all but one of the cooperating modules must be turned inside-out, written in a passive structure. Sometimes this isn’t too hard, and sometimes it’s monstrous.
With coroutines (or continuations, which can implement them), all of the modules can be written as though they maintain control, simply ‘yielding’ results to each other across the divide. Java doesn’t have coroutines or continuations, but it does have multi-threading. So, once I recognized what was needed, in short order I whipped up an LzmaOutputStream (and LzmaInputStream) that spawns the Encoder (Decoder) in a separate thread and passes bytes to it using a blocking queue. Classic producer/consumer concurrency, which should delight my students in Operating Systems.
I’ll make the code available here at some point, but if you’re looking and can’t find it yet, let me know! Efficiency is horrible at the moment, because I’m currently using ArrayBlockingQueue<Integer> from the Java library. This means, I presume, that each byte dropped into the queue is allocated as a new object! Might have to implement my own blocking queue with semaphores and a lighter-weight int[] representation.
Update (18 April 2007): code is now available here, and I ditched the generic ArrayBlockingQueue.
Saturday 7 April 2007 @17:27
Ugh, again I fight this battle. While hacking on my XML compression tool ‘rngzip’, I’m subclassing Java’s input/output stream hierarchy. The read and write methods use int instead of byte so that we may use -1 to represent end-of-file.
But surprisingly, Java bytes are signed. There is no way to specify unsigned numbers, and the primitive casts automatically do sign extension So when you cast a byte containing 0xFF into an integer you get 0xFFFFFFFF, which is -1. This can cause a great many bugs, some of them not apparent until you’re processing a binary stream with bytes equal to 0xFF.
The decision to use int here is questionable too, but Java has no lightweight way to specify ‘byte option’.
Tuesday 20 February 2007 @8:07
I’m giving a talk on Thursday for our CS club (a student ACM chapter). Our compilers course is hardly ever offered, because it ends up being a fairly arcane topic considering the career goals of the majority of our students. There are ways to make it more relevant of course, but I don’t want to argue either way on that today.
Instead, I decided to put together a fun little talk for the club on some of the ‘big ideas’ in the area. Here’s the abstract:
One of the more profound concepts in computer science is compiler bootstrapping: very often, the compiler for a programming language is written in that language itself. This begs an almost mystical question: what compiles the compiler? (And what compiled that compiler, and so on…) The first part of this talk is an adaptation of the famous Turing Award speech “Reflections on Trusting Trust” by Ken Thompson, co-inventor of the UNIX operating system. We explore the bootstrapping concept, and how to exploit it to devious ends. The second part is a very brief introduction to program analysis and compiler optimization, using static single assignment form.
I’ve been experimenting with some code to show that technique of teaching the compiler once, and then removing it from the source. Ken Thompson used the example of control codes, with a fragment of code that essentially said case '\n': return '\n'; but I found an enlightening post on the topic that cites this trick, from a Pascal compiler:
insertSymbolConstantBinding("INTEGER_MAX", INTEGER_MAX)
The value of INTEGER_MAX was ‘taught’ to the compiler at some point in its evolution, but since the compiler compiles itself, it is no longer needed. Pray you don’t lose all the binaries!
I’d like to turn the talk into a screen-cast, just because I’d like to experiment with that as a pedagogical format, and (I think) I have the tools. We’re starting to see lots of ‘Web 2.0’ tool-builders publish video tutorials. I doubt I’ll record any audio/video directly during my talk, but rather use that as a trial run of the script. Watch this space!
Saturday 20 January 2007 @16:11
The first week of classes seemed to go okay. I’m excited about exchanging code with students using Subversion. Not for group projects, mind you, but in place of downloading my support code and stubs, then emailing back their solutions.
Today I’ve been toying with GeekOS, trying to increase my understanding, and look for other projects to do with it. When I was taking and teaching OS at Maryland, I recall doing much lower-level projects: video and keyboard drivers, getting the task switching code to work, etc. Finally understanding the task switching mechanism was a big eureka moment, I believe. I kept trying to figure out how to jump to the next task in the queue, when really what you do is return to it. From the point of view of each task, it makes a call to the Yield() function, and then sometime later, returns from it. The whole trick is to do the context and stack switch and make the call from one task return to another.
Anyway, GeekOS seems to be designed for higher-level projects now, because the video, keyboard, and task-switching code is all in place from the start. But of course, that doesn’t prevent me from clearing it away, down to just the bootstrap code, and taking my students along on that journey.
Next week, I’m going to talk about PC video modes, and how to write characters and attributes (colors) to video memory. So today I started playing with basic VGA modes, the simplest standard one being 320×200 with 256 colors, also known as mode 13h. I imported some bitmap font code from the Linux console drivers, hacked on the screen.c driver a bit, and now I have GeekOS running in VGA:

That’s a 6×11 font, and seemed to be the best-looking choice with the limited resolution and non-square pixels of mode 13h. But it’s parameterized well enough that I can change a few definitions and employ other fonts stolen from Linux. Here’s a more typical 8×8:

Kind of deliciously chunky, and takes me back to those early days of the PC, when many graphical programs had text that looked about like this. Linux also has a 4×6 font, which is pretty much unreadable, but at least you get 80 columns:

(All of the screen-shots are shown here at 67%, so click through to see the actual size.)
Sunday 19 November 2006 @8:36
The byte type in Java is, strangely, signed. So bytes range from –127 to +128. So you cannot write byte b = 0xCA directly (because 202 > 128) but the (byte) cast does sign extension.
The problem I’m fighting with now is with input/output streams. The OutputStream class has two methods that take byte arrays, but the abstract method write that takes an int. With the InputStream, the abstract read returns an int mainly so that it can return –1 for end-of-file, so I guess that write uses an int for consistency.
But here’s where it gets really weird. Normally converting between byte and int does sign extension, but apparently that’s not what you want here… or is it? The specification of void OutputStream.write(int):
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
and of int InputStream.read():
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
Got it? In both methods, just the lower 8 bits are used, except in the case where read() returns –1 for EOF. But here’s the thing—OutputStream violates its own contract:
import java.io.*;
class StreamTest extends OutputStream {
public void write(int b) throws IOException {
System.out.println("Writing int " + b);
}
public static void main(String[] args) throws IOException {
byte[] bs = { (byte)0xCA, (byte)0xFE,
(byte)07F, (byte)032 };
new StreamTest().write(bs);
}
}
The above code sets up a byte array, and passes it to OutputStream.write(byte[]), which in turn iterates and calls my overridden write(int). But in converting byte to int, it does the normal sign-extension widening, so that it’s not just the lower 8 bits I need to pay attention to in write, it’s the lower 7 bits and the sign. (Update: oops, I overlooked that the representations are the same thanks to two’s complement!)
Arguably, write doesn’t need to be as picky about the sign as read, since read uses –1 to indicate EOF. Indeed, if I try:
import java.io.*;
class ReadTest extends InputStream {
private byte[] bs = { (byte)0xCA, (byte)040,
(byte)0xFF, (byte)032 };
private int i = 0;
public int read() throws IOException {
return bs[i++];
}
public static void main(String[] args) throws IOException {
byte[] buf = new byte[4];
new ReadTest().read(buf);
for( int i=0; i<4; i++ ) System.out.println(buf[i]);
}
}
then it outputs –54 64 0 0, because the 0xFF = –1 is interpreted as EOF. The read method needs some sign-inverting logic like: bs[i] > 0? bs[i] : bs[i]+256
This all started with some 3rd-party library code I’m trying to use, for arithmetic encoding (compression) using prediction by partial match. They defined filtering input and output streams that seem to work okay in tests, but fail miserably as soon as I wrap them in a DataOutputStream. It turned out that they were doing sign-inversions — not in the way I showed above — but in a weird place. As long as you write bytes as an array and then read them as an array, it happened to work. But if you write an array and then read one at a time, the conversions don’t work out. They also seemed to take the specification of write(int) at face value, just assuming that the parameter would be non-negative.
What a mess!
Thursday 21 September 2006 @20:41
Tonight went well. I mentioned AI previously, but another course I’m teaching this semester is CS101. This is not like the ACM CS1, but more like CS0 — a very broad introduction to computer science for non-majors and majors with limited prior experience. It’s a fun concept; we do a little algorithmics and problem-solving, followed by a brief survey of all the major topics in CS: databases, networks, languages, AI, etc. There’s a bit of programming here and there — just to make the algorithmics more concrete — but that stuff doesn’t begin in earnest until CS102.
Anyway, the sequence I’m following begins with algorithmics and pseudo-code. I’m just trying to convey the concept of what an algorithm is, and what it means to specify an unambiguous, step-by-step method. This can be incredibly difficult for someone who has never thought about things this way before. Everyone encounters tasks like searching, sorting, and path-finding in their daily lives, but most have no idea how these things actually are accomplished. And when you write down a multi-step procedure with a while loop and an auxiliary variable — just to find the largest number in a list — the uninitiated get justifiably annoyed. They protest, “Anyone can see that the largest number in the list (33 21 73 86 23 51) is 86… just look at it!”
Over the years I have encountered and developed several prop-based methods for conveying these and other complex ideas in computer science. One great source has been the Computer Science Unplugged project in New Zealand. I really believe that manipulating small props can help people understand things more concretely than chalk or markers or even computers.
Today I brought in 10 large index cards (about 5×7 inch) with one word written large on each card. I chose words with a food theme: Apple, Banana, Cilantro, Dates, Eggplant, etc. I began the lecture by placing 6 of the cards in arbitrary order on the chalk tray at the front of the room. Above them, I wrote the numbers 1 through 6. Nobody had any doubt that they could arrange the cards in alphabetical order.
So then I imposed a small restriction: you’re not allowed to slide the cards around on the tray; the only operation you can perform is swapping two cards. So I went around the room asking which two cards to swap (i.e., swap #1 with #3, or swap #4 with #5), with the aim of bringing us ‘closer’ to alphabetical order. As people called out pairs of cards, I swapped them, and eventually they were alphabetized. Distributed computation!
Next I tried to examine what led us to choose the pairs of cards that we did. In many cases, people chose pairs that would place cards directly into the positions where they belonged. For example, if Apple was #4, then they would swap #1 and #4 so that Apple would become #1. Although this is an efficient strategy, I actually tried to discourage it because it meant that they had already sorted the words in their heads and were just trying to make what was on the board look like what was in their head. The strategy is not ‘blind’ in the way that an algorithm must be.
Eventually we hit on the fact that you could arbitrarily choose any two cards that are out of order (relative to each other) and swap them. After that step, the number of out-of-order pairs necessarily decreases. So if you continue doing this — again, choosing completely arbitrarily — eventually you’ll hit a point where there are no more out-of-order pairs, and that means the cards are completely sorted. Although inefficient as an algorithm, this is pretty sophisticated reasoning!
Next I wanted to demonstrate the sense in which I mean that an algorithm must be ‘blind’. To do this, I sat in a chair facing the students, so my back was to the board. I asked a student to come up and choose any 4 cards and put them in any order on the board, under the numbers 1 through 4. I was not allowed to see the cards, and the other students would watch my helper to ensure her honesty. What ensued was reminiscent of the game “20 questions,” although I guess it was actually just 5 questions. It went like this:
Are #1 and #2 out of order? Yes.
Please swap #1 and #2. Ok.
Are #3 and #4 out of order? No.
Are #1 and #3 out of order? Yes.
Please swap #1 and #3. Ok.
Are #2 and #3 out of order? No.
Are #3 and #4 out of order? Yes.
Please swap #3 and #4 Ok.
And now, as if by magic, the words were suddenly in alphabetical order. I had caused it to happen by asking my helper extremely simple questions and giving her extremely simple instructions — without once looking at the board or knowing what words were there. This is actually fairly impressive — and I wasn’t afraid to tell the students when to be impressed!
(For those in the know, this was a 4-element merge sort.)
I went through several more exercises like this, and eventually was doing a binary search in the same ‘blind’ way. Does card #5 say ‘fennel’? No. Does the word on card #5 come before fennel in the alphabet? No. Does card #3 say ‘fennel’? And so on.
At any rate, I’m telling this story because I really felt that the technique was successful — in a way that just writing on the board is not — at conveying the concept of what it is like to design and specify algorithms like these. The completeness of your case analysis matters when you can’t see the input data. And when your assistant will respond only to the most rudimentary questions and commands, you have to enumerate every little step.
Friday 15 September 2006 @18:44
Such a gloomy, rainy day. Perfect for staying home and hacking.
One course I’m teaching this semester is artificial intelligence. It has nothing to do with my usual areas of interest, but it has always had a special place in my brain. Especially the early symbolic stuff… it’s good fun, no matter that the grand claims for it never panned out.
So I decided, in my subversive PL geek way, to use this opportunity to introduce my students to Scheme. Specifically, I’m using the PLT DrScheme environment as a companion for my AI course. I know that true Lisp is traditional, but I was really seduced by Scheme in my younger years (until ML came along) and it has been a pleasure so far to come back to it. Plus I can set up some nifty cross-platform graphical simulation thingamabobs for my students, which they’ll appreciate.
I chose the AI book by Nilsson. In my first course, we used Winston, and Russell/Norvig seems to be dominant these days. But I’m never one to choose the dominant platform just because it’s dominant (haven’t owned a copy of MS Windows since version 3.0). I still have fond memories from Winston, and was a touch disappointed to find that it hasn’t been updated in all these intervening years.
Anyway, Nilsson starts out with a “grid world” inhabited by an intelligent agent (robot). The first task is to try to make the robot follow the wall clockwise around the room. In other words, the first smidgen of intelligent, goal-directed behavior is to avoid walking into walls. Impressive, eh? Next stop: The Matrix.
So, I used DrScheme to bring the grid world to life, and allow my students (who, after all, have no Scheme exposure at all) to dig right in developing little wall-following robots, just by typing stuff like (if (or s1 s2) ’north (if … )), where s1–s8 are sensors for the 8-directions (clockwise starting from north-west) that return true if a wall is detected. Once you set up your little if-then-else robot controller, you can graphically place a robot on the grid and have it travel one step at a time around the room. The code for my simulator is here: gridworld.zip; just open grid-main.scm in DrScheme. (Select the graphical language, with MrEd + MzScheme.)
Next we’re going to jump right in to two kinds of machine learning, first by inferring a decision tree from a set of examples (Quinlan ID3 algorithm), then by evolving robot behavior by genetic programming. Both of these can be readily applied to the robot world, among other applications.
The students seemed fairly excited by Scheme, on their first exposure. “It’s so easy,” one said, although admittedly he had just been watching me; they have yet to write any code on their own. Nevertheless, I was a little surprised by the enthusiasm. Good signs.
Saturday 29 July 2006 @12:07
Really, I wanted to love you. You seem cleanly designed. I adore your model of a persistent file system, where even branches and tags are just sub-directories. Your commands mostly make sense. I appreciate that many of them work without repository access, so I don’t have to wait long to get a status or a diff.
But now you’re screwing me over. All I wanted to do was take the WordPress 2.0.4 upgrade for a spin in your vendor branch. I know it has been a few weeks since I last spoke to you. But now all you can tell me on my Powerbook is “Bad database version: compiled with 4.4.16, running against 4.3.29.” And on Debian, you say only “svn: bdb: Program version 4.2 doesn’t match environment version.” What did I do to deserve this?
You’re jealous of darcs, aren’t you? How petty. Anyway, it’s your affair with Berkeley DB that got us into this mess. I know, you’re seeing ‘FSFS’ now, whatever that is. You say things are better this way, but where does that leave me?
I suppose I should accept some blame too. Some of my repositories are private — read and written only by me — and I wanted them available for commits when I’m offline. So I put them in my home directory and synchronized with unison to other architectures and operating systems. I know, I know. To say this is ‘not recommended’ is understatement. But somehow it seemed to work okay for a while.
I wish I could quit you.
But I need access to my files first.
Wednesday 21 June 2006 @11:00
Sorry, I know I’m a few years late jumping on the wiki bandwagon. Yesterday I decided I wasn’t entirely happy with MediaWiki after all, and tried out PmWiki. I may have overlooked it before, because at some point in my life I guess I drank the RDBMS kool-aid and became skeptical of projects based on a standard file system (the database that god gave us).
Once I opened my mind to the PmWiki philosophy, I found that it does a lot of things quite well. One of the expected advantages of a database is to make searches faster and more convenient, but as the author points out, even this advantage is illusory. You usually fare better by having some dedicated tool (htDig, glimpse, google) index your files and provide search capability, rather than coding up your own based on a custom data model. This corroborates my experience with WordPress… you don’t see a search box on my site at the moment, do you? (But you can always add site:contrapunctus.net to a Google query.) The debate here also reminds me of one of the many differences between subversion and darcs; the latter uses flat files entirely for its bookkeeping.
PmWiki doesn’t insist on CamelCase for links, which is good. It has a ‘monobook’ skin that imitates the look of Wikipedia (still the least ugly wiki in my opinion). It has a decent authentication, permissions, and groups system, and it’s trivial to hack up more flexible policies (e.g., you must be on campus or logged in to view pages in this section). Actually, the lack of database is a slight security advantage too: you can embed the cryptographic hashes of passwords in the config file(s); there’s no need to store plain-text passwords that must be transmitted to the database.
The main limitation I see so far is that it doesn’t seem to support displaying arbitrary older versions of a page. It can show the page history and diffs, and you can restore older versions, so I believe all the required information is there, we just need a query interface for it. On Wikipedia, there’s a ‘permalink’ that provides you a URL for this exact version of the page that you are viewing. It’s useful for citing the page elsewhere, if you want to be sure that someone accesses the exact version you saw. It’s not clear to me that PmWiki can do that yet.
Also, there are a couple of choices for producing typeset (printable) copies of pages, but most of them look miserable or else are too complicated. I haven’t decided yet whether it’s worth my time to hack up something using pdftex. Let’s get some real documents up there first, I guess.
Oh yeah, the installation is here, if anyone wants to peek. Not much to see yet, particularly from the outside… unless the security is totally broken. 
Thursday 1 June 2006 @11:50
WordPress 2.0.3 was released today, so I had a chance to try out the vendor drop technique in Subversion… and it worked well!
The idea is to maintain a branch in the repository that mirrors the releases of the vendor. Mine now has this directory structure:
vendor/wordpress/latest/
vendor/wordpress/2.0.2/
vendor/wordpress/2.0.3/
Where the numbered directories are tags (snapshots) of the latest versions of WordPress at different points in time. The copy with my revisions — the code that powers this site — is in trunk/wp/.
Once the vendor branch was up to date, I just merged the changes made between 2.0.2 and 2.0.3 into the trunk. The changes touched many files, but since this was a point release, most of those changes were minor. In fact, only one line of code conflicted, and that was just a simple syntax issue. After fixing the conflict and briefly testing it, I uploaded the changes to the server, and now I’m running Wordpress 2.0.3.
Of course, a more substantial release (2.1 or 3.0, for example) would cause more problems. Probably some of my hacks will have to be rewritten. But still, the version control is an indispensible safety net. I won’t have to remember everywhere that I made revisions.
I had been programming for more than 10 years before I learned about version control. (But remember, I started programming when I was 10 years old!) So when I first encountered it, I thought, “where have you been all my life?” And that was just SCCS on Ultrix — pretty primitive compared to what we have now.
Subversion is pretty usable, but my first choice on new projects lately is Darcs. I like its distributed/disconnected nature, and its clean design. But there is some question as to how well patch-oriented (rather than snapshot-oriented) tools handle vendor drops. I’ll have to experiment more.
Tuesday 23 May 2006 @23:57
I’m toying with the newish JUnit version 4 for setting up test cases for my current Java project. I had tried the 3.x series, and didn’t find the experience quite as satisfying. Setting up parameterized suites with the TestSuite class, and having to rely on method naming conventions seemed… well… kinda ugly.
JUnit 4 is a radically different API, and it depends on lots of newish features in the Java language, including annotations on classes and methods. I really knew nothing about this feature until I saw it used in JUnit, and I still know very little. I’m a language nerd, but not a Java nerd, I guess. But it may be worth exploring further.
Anyway, I need to be able to pull test cases from files within a directory, so the Parameterized ‘runner’ is just the thing. You designate one method to produce all the test cases as a Collection, and then JUnit instantiates your class with arguments taken from that collection.
Unfortunately, when a test fails, the output leaves a lot to be desired. It prints the name of the class, the name of the method, but as far as which test case failed… they’re just numbered serially, like this:
There were 4 failures:
1) tryFib[1](FibTest)
java.lang.AssertionError: expected:<1> but was:<0>
2) tryFib[2](FibTest)
java.lang.AssertionError: expected:<1> but was:<0>
3) tryFib[3](FibTest)
java.lang.AssertionError: expected:<2> but was:<0>
4) tryFib[4](FibTest)
java.lang.AssertionError: expected:<3> but was:<0>
How are you supposed to know which test actually failed? As someone else pointed out, there really needs to be a way to annotate a name or description for each case.
Until then, here’s my silly work-around: just wrap your whole test method in a try/catch block, and rethrow the exception with a new message containing a description of the test case. I use the toString() method to produce that description. Complete example follows.
import java.util.LinkedList;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import static org.junit.Assert.*;
@RunWith(Parameterized.class)
public class FibTest {
@Parameterized.Parameters
public static LinkedList<Object[]> getTestCases() {
LinkedList<Object[]> cases = new LinkedList<Object[]>();
cases.add(new Object[] {0, 0});
cases.add(new Object[] {1, 1});
cases.add(new Object[] {2, 1});
cases.add(new Object[] {3, 2});
cases.add(new Object[] {4, 3});
return cases;
}
private int input, expected;
public FibTest( int i, int e ) {
input = i;
expected = e;
}
public String toString() {
return "fib(" + input + ")";
}
@Test
public void tryFib() {
try {
assertEquals( expected, fib(input) );
}
catch( Throwable th ) {
throw new Error(this + ": " + th.getMessage(), th);
}
}
private int fib( int i ) {
return 0; }
}
Here is the failure output now:
There were 4 failures:
1) tryFib[1](FibTest)
java.lang.Error: fib(1): expected:<1> but was:<0>
2) tryFib[2](FibTest)
java.lang.Error: fib(2): expected:<1> but was:<0>
3) tryFib[3](FibTest)
java.lang.Error: fib(3): expected:<2> but was:<0>
4) tryFib[4](FibTest)
java.lang.Error: fib(4): expected:<3> but was:<0>
The differences are highlighted. I guess maybe this isn’t such a convincing example, since the test description—fib(2)—is not sufficiently different from the serial number (2). In my application, test cases will be described by the input file name.