contrapunctus, by Christopher League
archive of ‘research’

To appear at CCSC

In April, I will be presenting a new paper — Something for Everyone: AI Lab Assignments that Span Learning Styles and Aptitudes — at the Consortium for Computing Sciences in Colleges, held in the exotic, distant locale of Staten Island. A preprint can be found in the publications section of my site.

I will be building a repository and wiki about the assignments. With any luck, it will be available in time for the presentation. Shown here are some photos from our Connect 4 tournament during the first year I used these lab assignments.

End of summer

I was in denial about it for as long as possible, but yesterday my classes began. I can’t pretend it’s still summer anymore. Although the summer wasn’t nearly as productive as I planned back in May (I believe in aiming high; realistic goals are for weak minds ;) ), I did get two papers accepted and I made progress on a new manuscript that I’d like to submit by mid-October.

I also sharpened my tools, battled a web-spam invasion, built a new home computer, and changed my web host. Didn’t travel very far this summer, but we hit Block Island (RI) in May, and Boston and Montréal for a few days each in August. In the coming weeks and months, I plan to finish final revisions on my RNGzip paper for JCP, pound out the new manuscript with Stefan, organize my coursework ideas for AI and write something about that, and, of course, submit my tenure portfolio. Oh, and I’ll be going to Germany for ICFP.

And the livin’ is easy

Hm, this space has been quiet for a while, but for justifiable reasons: I have two journal manuscripts submitted since the summer break began.

I’m never thrilled about writing for journals, because it often means that the key problem is already solved, and I usually would prefer to work on new problems than to “dot all the i’s” on old ones. On the other hand, it’s liberating to escape the strict space constraints of a conference paper. On the third hand, constraints are sometimes cited as catalysts for creativity. I’m reminded of the proverb “I wrote you a long letter because I didn’t have time to write a short one.”

I have also been ‘sharpening the saw’, also known as… Emacs hacking! Version 22.1 was finally released, and I took it as an opportunity to run through the manual and look for all the great little features and tweaks that have become available since the last time I studied the manual so intently. For example, just one thing that I adore for Java programming is glasses-mode (o^o). On-screen, it inserts some customizable little character in between LongCamelCaseWords so that you see them as Long·Camel·Case·Words. Ha!

Now I’d like to ‘sharpen my shell’ too. Zsh has lots of great stuff that I’m not currently using. I learned shell scripting in the early 90s on straight Bourne shell and tcsh, and only recently learned I could do concise parameter-frobbing things like ${file/foo/bar} rather than `echo $file | sed ’s/foo/bar/’` or whatever. Tab-completion for sub-commands (of svn, darcs, etc.) and host names (for ssh) would be great, and I know there are some directory-hopping features (beyond pushd/popd) that would help me. But one thing I’m grappling with is that I currently use zsh both in regular xterms and inside Emacs shell-mode. In the latter case, a lot of the fancy stuff in zsh won’t work. So do I avoid running shells inside Emacs, or hack shell-mode, or get term-mode working instead? Or, maybe forget zsh and do everything with eshell? Am I prepared to run always in Emacs, even when logged in to remote machines? I’m stuck.

Meanwhile, I cleaned up /usr/local/ on most of my machines. I try to avoid installing anything that’s not managed by apt, even if I have to backport it myself (such as with emacs22 on Debian etch). But sometimes it’s inevitable: either it’s something impossibly obscure, or I need a newer version than what’s available already, or it’s something I have hacked on myself and I need my version installed. So now what I do is keep a branch in /usr/local/src/, install it to /usr/local/stow/, and everything else in /usr/local/ is a symbolic link managed by GNU Stow. This should solve the problem of discovering some problematic file or library in local that I make-installed six years ago, and can’t remember what package it’s from or why it’s there.

Month of conferences

When it rains, it pours: I’m attending three conferences within the span of just about five weeks! Early in March, I presented at a conference on Combinatorics, Graph Theory, and Computing in Boca Raton, Florida. The talk was about a certain kind of subgraph for extracting test sequences from a formal protocol specification. It was mostly the work of a colleague at LIU — I joined the project fairly late — but he wasn’t available to present it, so I went. We intend to submit the work to a math journal sometime in May.

Model checking, type theory, and testing all have similar aims — improving the correctness and reliability of software and hardware systems — and yet the techniques seem to come from different communities, with different vocabularies and outlooks. I know of a few papers that have tried to build bridges between these areas, but I’ve worked in all three and I’m still not sure how they relate!

The second conference was the IEEE Data Compression Conference, this past week in Snowbird, Utah. I presented our work on type-based compression of XML. This was strange, being in a completely different community, and yet I found another person — a functional programming guy — who was in the same situation! The conference went reasonably well, and I did learn a lot. I learned that I have no understanding of (or affinity for) image compression, with its wavelet transforms and peak signal-to-noise ratios. Text compression just makes much more sense to me.

I saw very little of it, but Utah was indeed beautiful. The constant backdrop of mountains around Salt Lake City was breathtaking. I imagine if you live there it’s easy to lose sight of them. Just like how we never noticed how often the twin towers appeared in the background of movies filmed in pre-9/11 NYC — until you watch them post-9/11 and they suddenly stand out.

A ski resort is probably not my ideal conference location. I don’t really ski; I’ve been against it since grade school, when I observed the regularity with which my skiing classmates returned from trips with all manner of broken limbs and sprained joints. No, I tend to like urban settings, preferably foreign ones, where there are lots of other things to see and do, and not everyone stays at the same conference mega-hotel. Other recent conferences I’ve attended were in great cities such as Glasgow, Warsaw, Nice, London, Vancouver, and Paris.

Finally, tomorrow is the start of the Trends in Functional Programming conference here in New York City. I’m not presenting at this one, but since it’s in town it seemed worthwhile to go. I’ll have to miss most of Tuesday’s events because of my teaching schedule, but half of the day is reserved for a city tour, so that’s not a big loss for a NYC resident! :)

Epistemology and Wikipedia

I wrote once before about staunch criticisms of Wikipedia I heard in a faculty development seminar. Since then I’ve been thinking about different kinds of knowledge, the nature of truth, citation, and verification.

In my professional life, I have used Wikipedia for background on some mathematical concept or details on some algorithm I’m teaching. In these areas, knowledge has a sort of self-consistency, and appeals to authority are worth very little. Suppose a Wikipedia article says, “The following algorithm searches for a number in an ordered list in logarithmic time,” and provides the algorithm. The claim is something that an informed reader can verify for herself, independent of any works cited. The algorithm either works or it doesn’t.

Now, suppose the article also claims that “This algorithm was invented by Charles Babbage in 1856.” It seems to me that this is an entirely different kind of knowledge. There’s no self-consistency, no way to verify this on its own terms. A citation — an appeal to authority — is critical. And probably the cited authoritative work will cite other works, and so on, like a house of cards.

This is a curious situation. We’re not marshaling evidence to support an opinion, but a claim of fact. It’s either true or false. And yet there’s no way to convince you either way except by appealing to a vast network of authorities.

To someone versed in mathematical logic and proof, this is deeply unsatisfying. And yet for most people, in most fields, I guess it’s just the nature of knowledge. If a vast network of authorities is the only handle you have on the truth, then I could understand getting a little anxious at the thought of Wikipedia entering that space.

I have distinguished mathematical truth (e.g., “the real numbers are uncountably infinite”) from historical truth (e.g., “Cantor proved the real numbers are uncountably infinite in 1891”). Perhaps the ‘hard’ sciences offer something in between. Part of the purpose of claims made in scientific papers is that the evidence can — albeit at great expense — be replicated by readers. But still there’s no self-consistency in, for example, biology. Maybe because it deals in knowledge about the real world, you can’t just consider a claim on its own and verify its truth.

Mathematics is not about the real world. If anything, mathematics is about itself. Mathematical truth is meaningful and accessible, but no amount of observing, measuring, or citing esteemed scholars will get you there.

Two types of types

Follow-up about BitTorrent: at some point, I read the FAQ and found that for BT to work correctly, one needs to forward ports to overcome the Network Address Translation done by the router. (Would have been easy except that I forgot my router config password and had to reset it.) But after doing that, the number of peers increased significantly, and with it both upload and download rates. That’s more like it!

In my email the other day, I received a message with an image claiming that:

Leitura is an elegant, versatile Type System

Wow, sounds great, I thought. Where was this when I was doing my thesis research? So is it based on the lambda calculus? I read on:

that comes in 4 varieties: A crisp Roman, a stylish Sans, a News version, etc.

Ah, that kind of ‘type system’. Come to think of it, the message was from myfonts.com…

Happy 2007

The holidays were good this year. The fridge is packed with food from last night’s fest. My head is still a little cloudy. I have a stack of new books to read, a new digital SLR camera — maybe I can start posting new photos here more regularly — and several interesting projects to work on before classes begin again in two weeks.

One is the final draft of our paper on type-based compression of XML. It was accepted, so I’ll be going to Utah in March to present. One review in particular was extremely helpful, and I’m looking forward to studying what he/she wrote more closely, and chasing some more references.

I’ll be submitting my tenure portfolio in the Fall of this year. Hard to believe I’m in year 5 already. This past semester was busy as hell — good in some ways, and frustrating in others. There are a few unfinished administrative things — “open loops” in David’s jargon — that are nagging at me. Hopefully I can knock them out of the way and they don’t interfere with the more interesting projects.

And let’s hope that “Web 2.0” (among other things) inspires more students to come back to computing in ’07.

Paper compression

Our paper is just about ready for Wednesday’s conference deadline. This might be the earliest I’ve ever ‘finished’ a paper — is it ever really finished? — nearly 36 hours before the deadline. :)

Of course, 5 hours ago it was a page and a half too long, so I started applying my time-tested ‘paper compression’ techniques:

  • reduce the vertical space taken up by figures containing code and examples, even if that means abandoning customary line-breaking and indentation,
  • look for paragraphs with just one or two words on the last line, and rewrite them to make them a line shorter,
  • hunt down bibliographic entries that go on too long, and shorten Transactions to Trans., November to Nov., etc.

It’s amazing how much mileage you can get out of these silly tricks. Now it’s at the point where if certain key paragraphs need one more word inserted, the whole thing spills onto page 11.

This is a new field for me. I’ve read a fair bit, but still it’s strange and scary to be submitting somewhere that I don’t recognize any of the names on the program committee. (No offense folks, if any referees look me up and read this!) I had an idea that was vaguely related to compiler implementation, but actually could be generalized to a different field, and that seemed the best place for it. Plus this work was far more accessible to my M.S. students — one of them is my co-author — than any of my work on type theory. There’s something to be said for that. I don’t think I’ll post any version of the paper here until I hear whether it is accepted, but for the curious I put a poster online some time ago — with very preliminary data — that describes much of the result.

MetaOCaml proof

No, not proof of a theorem… a pre-print from Elsevier, the publisher of my MetaOCaml paper. It arrived by email this morning. So far, practically all of my publications are with ACM or Springer. This is the first that will appear in ScienceDirect. I almost feel like a real scientist. :P

metaserv-proof.png

Without digressing too much on the role of scientific publishers in the Internet age, one thing I do enjoy is getting back a proof that looks like a proper article. Whether it’s due to the banner, or little widgets they add in the header and footer, or just a typeface other than Computer Modern or Times, at least it looks like it was touched by a publishing house since I submitted it.

I’m reminded of visiting technical and academic bookstores when I was still an undergrad, or the first years of grad school. I would pick up a slim $90, 180-page treatise by some professor, and be disappointed when it looked exactly like what I could have printed out myself if only I had access to the .tex file: the Computer Modern typeface and LaTeX book class, with all the default settings. Not that these are necessarily ugly, they’re just not special. For $90 a copy, is it too much to ask that the publisher hire a designer?

The strange thing with this paper is that I used the LaTeX document class provided by Elsevier for my manuscript, and it looked like garbage. It was full of widows and orphans and huge irregular spaces between paragraphs. One of the reviewers even commented, “this can’t possibly be done by TeX.” I was indeed embarrassed by the typography, and I could have fixed it, but I chose not to stray from the publisher’s settings.

Maybe their public document class is intentionally crummy. That way, when you get back the real proofs, you’ll be pleasantly surprised. It will look like they actually did something in exchange for you relinquishing your copyright…

CASSIS paper published

Just now, a lovely message landed in my inbox from Heidelberg. It was a brief notice that our CASSIS paper now appears on the Springer web site, and is available for ordering. So I finally have page numbers and a publication date!

We are very pleased to be the first to congratulate you on the electronic publication of your article ‘Typed Compilation Against Non-manifest Base Classes’ published in Lecture Notes in Computer Science. If your institution has access to this journal, you may view your paper at: dx.doi.org.

This was initially a presentation at a workshop back in March 2005, but selected speakers were invited to submit a paper for the proceedings, which was then peer-reviewed and published in LNCS volume 3956. My thanks go out to the editors, reviewers, and to my co-author Stefan. Cheers!

Discovery day poster: XML compression

Each spring, our campus has an event called “discovery day,” where faculty and students get together in one space to exhibit their latest research and scholarship, using posters and demonstrations. In some ways, a presentation at an event like this is necessarily shallow; it just isn’t possible to appreciate one another’s contributions without a lot of background in the field. Sometimes this is even hard with different sub-disciplines within a computer science department, to say nothing of exhibiting to your colleagues in chemistry, media arts, or political science. Nevertheless, I think it’s a great exercise. I’m reminded of Feynman’s sentiment (paraphrased): “If I can’t teach a freshman lecture on it, it means I don’t really understand it.”

Last year, I presented my work on MetaOCaml Server Pages, which at that point was under consideration for a journal (now accepted). This year, I wanted to exhibit some work that is more preliminary. In fact, this is the first public mention of the result. My poster is called Type-based compression of XML streams (1.4M, PDF). [If you choose to print it, be careful to specify shrink-to-fit-page, because the PDF is formatted for the full size 40×30ʺ poster. It should still be readable shrunk to a letter-size page.]

The idea is basically to use information from the XML document type to encode the tree structure extremely compactly. Depending on the nature of the document type, many tags will take no space at all in the compressed form, and others are represented in just a few bits. This is essentially similar to how Amme, et al. encode their SafeTSA intermediate language.

In fact, my goal for this work is to be able to use XML libraries and tools for compiler intermediate representations. Other than that application area, I was never especially interested in XML. To me, it seemed that S-expressions were an improvement on XML, and they were around in the 1950s. (Lately I’m finding many advances in computing seem to be summed up by the phrase “same sh*t, different buzzword.”)

Our results are still very preliminary, and I’m probably not ready to release the code just yet. But as it is implemented now, our technique does extremely well on very ‘taggy’ XML documents, and is still somewhat reasonable (though not always the winner) on more ‘texty’ XML documents. Our best-case examples are XML results from the National Center for Biotechnology Information at NIH. I expect compiler intermediate representations to be extremely taggy too, but more on that as things develop.

One important advantage of our technique (over just gzipped XML, for example) is that I can generate SAX events directly from the compressed tree representation. This should prove to be much faster than uncompressing to text, and then re-parsing the XML.

Research pages ported

My research pages have now been ported to the new site design. I’m proudest of the publication list, which now contains nifty JavaScript doodads to reorganize the list without waiting for a round-trip to my server. Try the ‘Research’ tab at the top of the page.

This also serves as the first post in a few significant WordPress categories (aka topics, aka tags). I will post to the ‘publications’ topic whenever I add a new publication to the list, and I will post to ‘talks’ to announce upcoming talks. Finally, the ‘research’ topic will be for any ideas, commentary, or other tidbits that are relevant to my research interests.