contrapunctus, by Christopher League
 

Submit scripts and version control

Next semester, I’ll be teaching an intermediate programming course on OOP and design patterns in C++. Additionally, I may do a series of projects based on GeekOS in my operating systems course. (I taught CMSC 412 at UMCP before the advent of GeekOS, but I see in it some influences from the more ad hoc projects we did back then.)

Recently I have been thinking about online tutorial, submission, and assessment systems. Since both of next semester’s courses will involve exchanging a good bit of code, I hit on the idea of using Subversion both to distribute project code to my students, and for them to submit their code for assessment. This has been done before; I found a SIGCSE paper on using CVS for this purpose [Reid & Wilson, 2005].

In the old days (the early 1990s), most CS students did their major programming assignments on a semi-centralized (UNIX) system, and most departments maintained some setuid script for managing submissions. At UMCP, my friend Gabe automated the assessment of assignments to amazing levels, with the help of Perl and shell scripts.

An area that I think is under-explored still is using some kind of automated tutor to help students in a CS0 or CS1 comprehend and practice the very fundamentals of programming: conditionals, loops, arrays, etc. There was a special issue of JERIC recently (Journal on Educational Resources in Computing) on automated assessment, but the aim of many of the articles was to save time and give individualized feedback to classes with 400 students. That seems a little dated now — with CS enrollments down as they are — but I guess it may still occur at a few large schools.

I’m interested in automated assessment not for the time-saving or scalability, so much as for a mechanism that can encourage students to practice on their own time, outside of class, and in addition to assigned work. The system should be able to generate a variety of unique problems to solve, offer hints and help, and assess the student’s progress.

Anyway, I did figure out today how to set up Subversion as a submission tool. It requires the (slightly) more sophisticated access control that you get running it from Apache 2 and the authz module. I set up the top-level of the repository with a public/ folder, and folders for each student: alice/, bob/, carol/, etc. The instructor and TAs should be able to read and write anywhere, but students can read from public and read/write their own folders only. Here’s my authz file that seems to do the right thing:

[cs150s07:/]
league = rw
* = 

[cs150s07:/public]
league = rw
* = r

[cs150s07:/alice]
alice = rw

[cs150s07:/bob]
bob = rw

Then, files provided for the assignments are committed to public/a1/, public/a2/, etc. and copied into the student folders with svn copy.

Two tips from the CVS paper that I think are good ideas: first, when students have problems and seek assistance, insist that they commit what they have to the repository, so you can update and help them out without the awkward emailing of files back and forth. When helping a student through a problem face-to-face, check out a fresh copy, show him how to fix the problem, and then wipe the fresh copy so he still has to fix it again on his own.

Second, if we can encourage students to commit often, we may get a better glimpse of their working habits — such as when they start on assignments — and confront them about problems early on. This buys back a little of the surveillance power we had when everyone did their work on the same machine: you know there’s a problem when johnny hasn’t even logged in and the assignment is due in 5 hours.

As this semester is winding down, I feel the need to debrief myself a bit about how it went. But I’m going to try to hold off on that (at least publicly) until all the grades are in!

Subversion, the honeymoon is over

Really, I wanted to love you. You seem cleanly designed. I adore your model of a persistent file system, where even branches and tags are just sub-directories. Your commands mostly make sense. I appreciate that many of them work without repository access, so I don’t have to wait long to get a status or a diff.

But now you’re screwing me over. All I wanted to do was take the WordPress 2.0.4 upgrade for a spin in your vendor branch. I know it has been a few weeks since I last spoke to you. But now all you can tell me on my Powerbook is “Bad database version: compiled with 4.4.16, running against 4.3.29.” And on Debian, you say only “svn: bdb: Program version 4.2 doesn’t match environment version.” What did I do to deserve this?

You’re jealous of darcs, aren’t you? How petty. Anyway, it’s your affair with Berkeley DB that got us into this mess. I know, you’re seeing ‘FSFS’ now, whatever that is. You say things are better this way, but where does that leave me?

I suppose I should accept some blame too. Some of my repositories are private — read and written only by me — and I wanted them available for commits when I’m offline. So I put them in my home directory and synchronized with unison to other architectures and operating systems. I know, I know. To say this is ‘not recommended’ is understatement. But somehow it seemed to work okay for a while.

I wish I could quit you.

But I need access to my files first.

Trial by fire for subversion

WordPress 2.0.3 was released today, so I had a chance to try out the vendor drop technique in Subversion… and it worked well!

The idea is to maintain a branch in the repository that mirrors the releases of the vendor. Mine now has this directory structure:

  vendor/wordpress/latest/
  vendor/wordpress/2.0.2/
  vendor/wordpress/2.0.3/

Where the numbered directories are tags (snapshots) of the latest versions of WordPress at different points in time. The copy with my revisions — the code that powers this site — is in trunk/wp/.

Once the vendor branch was up to date, I just merged the changes made between 2.0.2 and 2.0.3 into the trunk. The changes touched many files, but since this was a point release, most of those changes were minor. In fact, only one line of code conflicted, and that was just a simple syntax issue. After fixing the conflict and briefly testing it, I uploaded the changes to the server, and now I’m running Wordpress 2.0.3.

Of course, a more substantial release (2.1 or 3.0, for example) would cause more problems. Probably some of my hacks will have to be rewritten. But still, the version control is an indispensible safety net. I won’t have to remember everywhere that I made revisions.

I had been programming for more than 10 years before I learned about version control. (But remember, I started programming when I was 10 years old!) So when I first encountered it, I thought, “where have you been all my life?” And that was just SCCS on Ultrix — pretty primitive compared to what we have now.

Subversion is pretty usable, but my first choice on new projects lately is Darcs. I like its distributed/disconnected nature, and its clean design. But there is some question as to how well patch-oriented (rather than snapshot-oriented) tools handle vendor drops. I’ll have to experiment more.

Subversion vs. distributed version control

Yesterday I started playing with GNU arch, version 1.x. (I know there’s a version 2.x now, but it seems like a very different interface, and the documentation is even more sparse.)

This is my first experience with a version control system that is not based on a centralized repository. And I must admit I find it a bit strange conceptually. Through the years I’ve tried SCCS, RCS, CVS, and now my default is Subversion. Mostly I’m quite happy with Subversion, particularly for myself or if there is just one or two other collaborators that need write access. It is indeed a vast improvement over CVS.

I’m using Subversion in a project course with about 10 M.S. students. The project platform is LAMP (Linux, Apache, MySQL, PHP). They log in to the Linux server to do most of their programming, and use the svn command-line utility. Since everyone is on the same machine, I set up the repository using a file:// URL, made everyone part of the same group, and gave that group write access to the repository.

Wow, this was a mistake. Periodically (like every three days or so) Subversion reports that the repository is corrupted, and I have to go in and run the recovery process, fix up the permissions again, etc. I’m not sure what it will take to make this work… perhaps just making sure that everyone’s umask is always set exactly right. But sometimes other things seem to go wrong. We also use websvn for browsing the repository over HTTP, and sometimes it seems to leave behind weird files owned by www-data (the user that runs the web server).

Next time I will do this with an svn:// URL and let the svn user be the only one who ever touches the repository. This means having a different set of password stored (in the clear) somewhere in the repository, and managed manually. So it’s okay for a handful of users, and that’s generally how I collaborate with co-authors and individual students.

Anyway, one version control strategy I rely on for myself is keeping a vendor branch. For example, I made a few hacks on WordPress 2.0.2, which I use to power this site. For this I keep a local subversion repository, within my home directory. It has two main branches (actually just directories in subversion’s persistent file system): the vendor branch and my development branch. When WordPress 2.0.3 is released, I load it into the vendor branch, look at the changes they made since the last release, and merge those changes into my own hacked version. So I spent a good part of my day yesterday figuring how to do something like this with GNU arch.

Some concepts from arch are appealing: (1) creating clean deltas (change sets) for adding particular features or fixing particular bugs, (2) cherry-picking which deltas to apply to a particular tree, (3) mirroring and branching from projects where you don’t have write access, (4) publishing a repository (archive) without needing any special software on the server, (5) etc.

But I’m not sure I have the hang of it yet. Version control is, without a doubt, a very complex problem. If we think we can solve it with simple tools, we’re probably kidding ourselves. So why do I find Subversion much simpler than Arch? Am I just more accustomed to its perspective on the problem? Or, as Tom Lord might say, is it because Subversion doesn’t actualy solve the problem at all?