contrapunctus, by Christopher League
 

A bad Linux day

Yesterday was a bad GNU/Linux day. I was getting angry, and I rarely get angry. Every once in a while, I type “apt-get upgrade” and everything goes to hell. Like the print server’s ability to print from PDF. Right when I’m trying to print last-minute boarding passes, confirmation pages for hotels and rental cars, directions from the airport, etc.

I of course found workarounds, then in some cases had to work around the workarounds. But in all it took about 3 hours to figure out what was going on and get 10 pages printed. Something was broken in how CUPS talks to ‘pdftops’, causing it to hang. There were recent security updates to CUPS, but when I downgraded it, the problem persisted. It’s still not fixed, but I found and subscribed to a probably-relevant entry on bugs.debian.org.

I guess I use Debian ‘stable’ so that this sort of thing happens very rarely. The frustration had me questioning my commitment to GNU/Linux and other free software. But of course it’s not that simple: what I treasure is the hackability, and so some instability is inevitable. Even on a Mac, I’m likely to access unsanctioned functionality with “sudo vi /etc/cups/cupsd.conf” or whatever, but this kind of customization rarely survives updates.

I suppose the solution is to take updates more seriously, evaluating them the way a business would: apply them only when there’s sufficient time to run a systematic set of regression tests.

New home PC

I think I was a tinkerer even before I encountered computer programming or computer science. And even though my research is pretty theoretical, I still enjoy breaking out the screwdrivers and anti-static wrist strap.

Our main home PC was a midrange Dell Optiplex that Art bought when he started med school in 2000. About two years later he switched to a Mac laptop and I converted the PC to a GNU/Linux workstation. We mainly use it as a home file and backup server, but it’s also my desktop when I’m working at home. Over the years I added substantially more disk capacity, bought a DVD writer, swiped a video card from an older machine for a dual-head display, etc. By this summer, I was itching for a more substantial upgrade. Software builds are pretty slow, firefox was struggling with memory limitations, and occasionally I had trouble helping out friends with USB backup drives because the system didn’t support USB2.

One of the great things about the PC architecture (as opposed to laptops and small form-factor consumer systems like the Mac mini and iMac) is that it’s entirely possible to upgrade it piecemeal. I had two high-capacity disks and an optical drive that were newer than the base system — no point in replacing those. So I went onto Newegg and did some research on the latest specifications. I generally don’t like buying the very latest stuff because the price/performance ratio is too high. The economical sweet spot on the curve is usually a generation or two back.

I went for the AMD Athlon 64-bit X2 dual-core processor. I got a compatible ASUS mini-ATX motherboard, 2G RAM, and a new mini-tower case. One thing I knew I needed out of the motherboard was two IDE (PATA) buses: one for the optical drive, another for the two legacy disks. The newer I/O bus is called SATA. The prices on drives are so good that I bought a 320G SATA disk too. The new PC will have well over half a terabyte of storage over the 3 disks.

Putting it all together went okay. I scraped my fingers to bleeding twice. :( The case seemed roomy — I chose it because having four hard drive bays was fairly rare for inexpensive cases — until I started putting the components inside. Before installing the disks, I booted an Ubuntu live CD to check that all the other components would work.

It turns out that three disks in a case this size is not ideal, even though it physically would hold four. After a few days of use, the disks were running hot. Really hot — the SMART temperature sensors reported 54°C (129°F)! Manufacturers are not always very precise about max operating temperatures. According to some numbers I was finding, this was on the high end, but not out of range. Also, the lifetime of the drive seems to depend more on the ambient case temperature, and ACPI was reporting 40°C on the motherboard.

Ultimately, I found a small fan in an older, unused system that I’ve been too lazy to take for recycling. I managed to secure it in between two of the drives and bring some air across them from the vents in the front of the case. Now the drive temperatures are in the range 45-47°C and reach 50°C only during heavy use (such as backups with rsync). I guess I’m satisfied with that, for now. But next time I think I will choose a case with more drive space, and with better front cooling facilities.

This is the first time I built a PC from the motherboard up, and overall, it has been a good experience.

And the livin’ is easy

Hm, this space has been quiet for a while, but for justifiable reasons: I have two journal manuscripts submitted since the summer break began.

I’m never thrilled about writing for journals, because it often means that the key problem is already solved, and I usually would prefer to work on new problems than to “dot all the i’s” on old ones. On the other hand, it’s liberating to escape the strict space constraints of a conference paper. On the third hand, constraints are sometimes cited as catalysts for creativity. I’m reminded of the proverb “I wrote you a long letter because I didn’t have time to write a short one.”

I have also been ‘sharpening the saw’, also known as… Emacs hacking! Version 22.1 was finally released, and I took it as an opportunity to run through the manual and look for all the great little features and tweaks that have become available since the last time I studied the manual so intently. For example, just one thing that I adore for Java programming is glasses-mode (o^o). On-screen, it inserts some customizable little character in between LongCamelCaseWords so that you see them as Long·Camel·Case·Words. Ha!

Now I’d like to ‘sharpen my shell’ too. Zsh has lots of great stuff that I’m not currently using. I learned shell scripting in the early 90s on straight Bourne shell and tcsh, and only recently learned I could do concise parameter-frobbing things like ${file/foo/bar} rather than `echo $file | sed ’s/foo/bar/’` or whatever. Tab-completion for sub-commands (of svn, darcs, etc.) and host names (for ssh) would be great, and I know there are some directory-hopping features (beyond pushd/popd) that would help me. But one thing I’m grappling with is that I currently use zsh both in regular xterms and inside Emacs shell-mode. In the latter case, a lot of the fancy stuff in zsh won’t work. So do I avoid running shells inside Emacs, or hack shell-mode, or get term-mode working instead? Or, maybe forget zsh and do everything with eshell? Am I prepared to run always in Emacs, even when logged in to remote machines? I’m stuck.

Meanwhile, I cleaned up /usr/local/ on most of my machines. I try to avoid installing anything that’s not managed by apt, even if I have to backport it myself (such as with emacs22 on Debian etch). But sometimes it’s inevitable: either it’s something impossibly obscure, or I need a newer version than what’s available already, or it’s something I have hacked on myself and I need my version installed. So now what I do is keep a branch in /usr/local/src/, install it to /usr/local/stow/, and everything else in /usr/local/ is a symbolic link managed by GNU Stow. This should solve the problem of discovering some problematic file or library in local that I make-installed six years ago, and can’t remember what package it’s from or why it’s there.

Spamitude

I had a run-in with a spamcatcher this week. At some point I realized that messages I sent from home into liu.edu were not getting through, and not returning error messages either.

It turned out they were getting caught up in the university’s new spam filtering appliances. Apparently the servers run by my ISP Optimum Online (Cablevision) have some pretty bad scores on some of the black-lists. That really sucks.

It’s a little hard to describe how I felt when I realized that 4–5 days’ worth of messages — to my colleagues, to my dean, to members of the committee that I chair, to the chairs of committees on which I serve — that all of them were lost. It must be something like the “metaphor shear” that Neal Stephenson describes in In the beginning was the command line. Email has become so natural and essential that it really feels now like communication is happening as I write it. To find out days later that the communication never really happened at all is jarring.

And I know it’s not rational — these filters are just computer programs after all — but the thought that kept seeping back into my brain was, “your work isn’t valued.” As if the filters were a collective “talk to the hand” from the university itself. There’s probably something deeper going on there — the brain’s a marvel, ain’t it? — but I’m going to let it slide for the moment.

It’s not clear to me that things are completely resolved, but I’ve been enough of a thorn in the side of our IT folks for now. They’re actually supremely competent. We just have a clear difference of opinion on the tolerance for false positives. :)

It seems there’s often some minor tension between computer science faculty and IT staff. Many larger CS departments hire their own system administrators and just ask the campus IT department to carry the bits in and out of the building with no questions asked.

And even then the faculty often dump on the sysadmins. I’ve seen it from both sides. As if the ability to sling some code and prove some theorems makes one well-qualified to provide semi-reliable email service to tens of thousands of needy users.

The admins did put me on a list so that messages addressed to me bypass the filters. That doesn’t help with mail I send to others from off campus, of course. But this morning I managed to get one particular message through, so maybe the black-listing of Optimum Online was a fluke anyway. If not, I guess I can always tunnel onto campus and send to their blessed SMTP servers through the tunnel.

Meanwhile, being left outside the spam wall did cause my junk messages to increase slightly. I had been running my own SpamAssassin installation since before all this began, but it has had trouble keeping up with spammers’ ‘innovations’ in the past 6 months or so. I train the Bayes filters weekly, but it’s not always enough. So I upgraded to the latest and realized that all the network-dependent tests (Razor, Pyzor, etc.) had not been enabled. Since upgrading and reconfiguring SpamAssassin my junk messages have dropped dramatically, and still no false positives yet.

The IDE revelation

When writing about my encrypted disk partitions, I noted that “the drop in performance [with IDE] is definitely noticeable [compared to SCSI]… any disk-intensive activity also drives up the CPU load.”

Well, it turns out that my kernel was configured incorrectly, leading to extremely poor IDE performance. Some investigation with hdparm revealed that DMA (direct memory access) was not enabled! No wonder it was so horrible. What’s worse, hdparm is unable to turn it on with my custom kernel. So I booted an Ubuntu stock kernel, and DMA was enabled automatically and read throughput was about 7 or 8 times faster. Wow, all this time I thought IDE just really sucked that bad!

So now I need to figure out what’s wrong with my custom kernel. True, I did build it back when I was primarily using SCSI — I think the only IDE drive I had in this machine at that time was for Windows 98.

If crypto is outlawed…

Last week, I bought a pair of 200G IDE disks, just because they were dirt cheap. Probably I’ll use one at home and one at work. I already have a 150G at home for music and such.

I used to be a SCSI snob — and I guess in some ways I still am — but I just can’t afford that habit anymore! Although I miss the performance of SCSI, the price differential per GB is enormous. The drop in performance is definitely noticeable, particularly since I now have two large disks on the same bus. Any disk-intensive activity also drives up the CPU load, which it never would do with SCSI. And forget running more than one disk-intensive process at a time. If I’m still on the computer when ‘updatedb’ starts running, it’s time for bed.

Anyway, in rearranging my file systems at home, I decided to try something new. I now have my root and /home file systems on encrypted partitions. Why? Just because I can, I guess. It might be a fairly valuable technique on a laptop, which is more easily lost or stolen. At least then, you can be reasonably confident the thief can’t access your data.

On a home desktop machine though, crypto seems admittedly frivolous. Am I part of the tinfoil hat set, who thinks the FBI (or some darker, more sinister organization) is going to sneak in and confiscate or clone my drives? Do I have anything on there to hide anyway? Not really. But I do believe strongly in a right to privacy. And if we don’t exercise the rights we do have, we are likely to lose them.

The Disk Encryption HOWTO by David Braun was essential reading, although I didn’t follow its prescriptions precisely. You will need a Linux 2.6 kernel with ‘cryptoloop’ and ‘aes’ compiled in, and the ‘loop-aes-utils’ package that provides crypto-aware versions of ‘mount’ and ‘losetup’.

What happens, essentially, is this: I keep a small unencrypted boot partition near the beginning of the disk. It contains the kernel, the aforementioned ‘loop-aes-utils’, some scripts, a set of keys, and a few other essential binaries: sh, ls, and pivot_root. I configure grub to boot and root from this partition, and provide the kernel with a custom init script. This script prompts the console for a master password (must be 20 or more characters), and uses this to unlock an image containing the keys to each partition. The keys themselves are totally random 60-character strings.

Once the keys are available, the init script uses ‘losetup’ to configure crypto-enhanced loop-back devices for each partition. Then it can unmount the keys, mount the soon-to-be root partition, pivot_root to it, and invoke the real /sbin/init. The remaining partitions will be mounted automatically later on, so long as you use the /dev/loop devices in /etc/fstab, or better yet, refer to them by filesystem label.

  LABEL=debian-root  /      ext3  defaults 0 1
  LABEL=linux-home   /home  ext3  defaults 0 2
  LABEL=linux-swap   none   swap  sw       0 0

It sounds fancy, but once I was familiar with the tools and their capabilities, it wasn’t that bad to set up. The HOWTO describes booting off of a USB stick that contains the keys and kernel; this way authentication is based on something you know (master password) and something you have (the USB stick). This was too much of a pain for my home setup, plus my BIOS is too old to boot from USB.

What took the most work was allaying my fears that I’d be totally hosed when something goes wrong with the boot process. It turns out that the current Ubuntu Live CD (6.06) includes a kernel with the required modules. So I can boot from the Live CD, mount my /boot partition, and then manually use the keys to mount my encrypted partitions. The only important thing is to keep a safe backup of the boot partition, especially the keys. If I lose those 60-character keys I really am hosed. Currently I have /boot mirrored on both disks, and the keys file copied on various other machines.

Why not just encrypt /home, or even just $HOME for the current user? Encrypting the root filesystem is something of a pain, involving as it does a pivot_root and delicate boot-time hacking. As the HOWTO points out, a GNU/Linux system really makes no guarantees about information flow; there’s no telling what stuff from /home may show up in /var/log or wherever. So it’s simplest just to encrypt everything, including the swap partition.

So maybe this display of Linux wizardry makes up for my gaffe about iptables earlier in the week. :)

(Funny that I started this post complaining about the performance of IDE drives, and then proceeded to add a layer of encryption on top of that. I haven’t done extensive benchmarking, but I did run ‘iozone’ a few times, and as far as I can tell the crypto only slows down reads and writes by 1 or 2%.)

Unison inode inadequacy

Okay, I resolved my performance problems with Unison. It seemed like it was taking far too long to search for changes; I noticed that it paused for significant lengths of time on extremely big files. If there are no changes and the archives files are intact, then it should just have to stat the file, so the time per file should be constant, not proportional to file size.

But this wasn’t happening. Then I realized that my recent disk reorganization probably had something to do with it. I installed a new disk, then repartitioned and moved file systems around on my home machine recently (more on that later), and this of course changed the inode numbers on all the files, which unison tracks in its archive.

Now, my expectation was that unison would be slow the first time around, but after noticing all the inode changes, it would be fast thereafter. This didn’t seem to happen. After a full sync (which was painful because I had to do it in pieces to avoid the dropped ssh connection), I had to delete the archive files, and then resync (again, in pieces). And now the new archive file has the new inode numbers and the normal sync is fast again. Yay!

Or, at least that’s my model of what happened.

Port forwarding

I’m trying to understand iptables on Linux 2.4 or 2.6, for a fairly simple task, but it doesn’t seem to do anything.

My desktop machine at work is behind a firewall. As a department, we have control over just one server that has a range of ports open to the wider internet. So I either ssh twice, or tunnel as needed to access services on my own machine. I also wrote a tiny C program to open a socket and forward all traffic back and forth; I run it from inetd on the server, so now if I ssh to port XYZ on the server, that will be forwarded directly to my desktop.

Unfortunately, things aren’t working so well lately, and I think it might be the fault of my little C program. I will ssh through it, only to have the connection reset after a few idle minutes. This doesn’t happen when I ssh directly to the server, only through my forwarder program.

It’s so bad that when trying to synchronize between home and work with unison, my home machine takes much longer to look for changes, and by the time it tries to communicate changes to the work machine, the connection has died.

So it seems like the right way to do this is throw away my crummy C program and just do port forwarding in the kernel. The tutorials and FAQs make it seem easy:

  iptables -t nat -A PREROUTING -p tcp --dport PORT \
      -j DNAT --to ADDRESS:22

The rule shows up in the tables just fine, but it doesn’t seem to change anything when I try to connect to the specified port. I wrote a 1 to /proc/sys/net/ipv4/ip_forward, as suggested. I checked that the modules ip_tables and iptable_nat were loaded. I’ve tried it both on 2.4 and 2.6 kernels. Still no changes.

My Linux kernel knowledge is fairly good these days, but networking is certainly the weak spot. In fact, in obtaining my 3 degrees in CS, I don’t think I ever once took a networking course! Not that it would necessarily help me now…

Update (23:35) Some tracing with tcpdump revealed what was happening. I compared the packet-slinging going on with a successful connection to that with my faulty iptables rules. It turns out (I presume) that the DNAT (destination network address translation) needs a corresponding SNAT (source NAT). I thought I gathered from the various tutorials and docs that iptables does the reverse translation for you. Ah, but now I understand… this isn’t really the reverse translation; it’s the same packet, but now the other end will know where to send the ACK. Blah, at least it works now. Here’s the successful configuration, with the actual IP addresses replaced by SERVER and DESKTOP.

# Generated by iptables-save v1.2.11 on Wed Jul 26 23:30:21 2006
*nat
:PREROUTING ACCEPT [7757:741448]
:POSTROUTING ACCEPT [4166:279366]
:OUTPUT ACCEPT [4175:279906]
-A PREROUTING -p tcp -m tcp --dport 2000 -j DNAT --to-destination DESKTOP:22
-A POSTROUTING -d DESKTOP -p tcp -m tcp --dport 22 -j SNAT --to-source SERVER
COMMIT
# Completed on Wed Jul 26 23:30:21 2006
# Generated by iptables-save v1.2.11 on Wed Jul 26 23:30:21 2006
*filter
:INPUT ACCEPT [939121233:266118375723]
:FORWARD ACCEPT [316:50624]
:OUTPUT ACCEPT [1194943253:1032490859121]
COMMIT
# Completed on Wed Jul 26 23:30:21 2006

So now I can throw away my little clunky inetd-spawned C program. But still the connection isn’t staying alive long enough for unison to do its thing. ARGH.

Another wiki tamed

Sorry, I know I’m a few years late jumping on the wiki bandwagon. Yesterday I decided I wasn’t entirely happy with MediaWiki after all, and tried out PmWiki. I may have overlooked it before, because at some point in my life I guess I drank the RDBMS kool-aid and became skeptical of projects based on a standard file system (the database that god gave us).

Once I opened my mind to the PmWiki philosophy, I found that it does a lot of things quite well. One of the expected advantages of a database is to make searches faster and more convenient, but as the author points out, even this advantage is illusory. You usually fare better by having some dedicated tool (htDig, glimpse, google) index your files and provide search capability, rather than coding up your own based on a custom data model. This corroborates my experience with WordPress… you don’t see a search box on my site at the moment, do you? (But you can always add site:contrapunctus.net to a Google query.) The debate here also reminds me of one of the many differences between subversion and darcs; the latter uses flat files entirely for its bookkeeping.

PmWiki doesn’t insist on CamelCase for links, which is good. It has a ‘monobook’ skin that imitates the look of Wikipedia (still the least ugly wiki in my opinion). It has a decent authentication, permissions, and groups system, and it’s trivial to hack up more flexible policies (e.g., you must be on campus or logged in to view pages in this section). Actually, the lack of database is a slight security advantage too: you can embed the cryptographic hashes of passwords in the config file(s); there’s no need to store plain-text passwords that must be transmitted to the database.

The main limitation I see so far is that it doesn’t seem to support displaying arbitrary older versions of a page. It can show the page history and diffs, and you can restore older versions, so I believe all the required information is there, we just need a query interface for it. On Wikipedia, there’s a ‘permalink’ that provides you a URL for this exact version of the page that you are viewing. It’s useful for citing the page elsewhere, if you want to be sure that someone accesses the exact version you saw. It’s not clear to me that PmWiki can do that yet.

Also, there are a couple of choices for producing typeset (printable) copies of pages, but most of them look miserable or else are too complicated. I haven’t decided yet whether it’s worth my time to hack up something using pdftex. Let’s get some real documents up there first, I guess.

Oh yeah, the installation is here, if anyone wants to peek. Not much to see yet, particularly from the outside… unless the security is totally broken. :)

Finiki

I’m toying extensively with MediaWiki today. I mainly want to set up something to help coordinate the efforts of a school-wide committee for which I recently became the chair. The work of this committee involves collaborative editing and tracking of lots of documents, so a wiki seems like the natural thing. Up until now, members mostly emailed Word documents around, and there seemed to be no canonical location for the latest versions. (More on this committee’s agenda another time, perhaps.)

Someone suggested that all the minutes and other documents from the committee could be burned onto CDs and distributed to members. His heart is in the right place, even if the recommended medium isn’t.

Aside: I feel that way too about these USB memory sticks that are so prevalent lately. All students seem to have one or more, and they seem to be the schwag du jour at certain workshops. To me, these things were obsolete long before they were even invented… they’re just high-capacity floppy disks. And the reason floppies are obsolete has nothing to do with their size. Physically moving data around on some device, no matter how convenient, is fundamentally flawed: the data should reside on the network, where it can be synchronized, backed up, accessed, and modified from anywhere. This net-centric view was how computer science departments arranged things even in 1990, but the rest of the world has not yet caught on. The fact that a computer user would reach for a USB stick to copy files from machine A to machine B — when they’re both on the same LAN — means something is fundamentally broken. Incidentally, I place blame not on the user, but on the designer of the dominant operating system.

Anyway, MediaWiki seems sufficiently powerful, but a little more confusing to configure than I expected. Configuration can mean editing ‘LocalSettings.php’ (reasonably well documented), editing other arbitrary PHP files (haphazardly documented), and/or editing certain pages in the ‘MediaWiki’ namespace of the wiki itself (mostly undocumented).

Like most software whose primary documentation is a wiki (which is by no means limited to wiki software itself), entropy prevails. Information becomes scattered and duplicated. Questions and answers from users are mixed in with exposition. The wiki model can be extremely useful for reducing the barriers for contributions, and that helps keep things up to date. I’d probably prefer having all documentation in a wiki than having it all, for example in the archive of a mailing list. But the wiki can’t compete with a well written and thoughtfully edited manual.

I seem to be taming my installation now, following a solid afternoon of tinkering. I am using namespaces to separate different projects (rather than having separate installations) and to restrict edit access to certain user groups. I learned that it makes sense to create user accounts using real names (First Last) so that you can easily refer to them and link to their user page: “Committee members include [[User:Chris League|]], [[User:Lisa Simpson|]], etc.”

Now the next step is to convince my colleagues that they can use this. I am finding it difficult to predict who will adapt to new technology and who will not. Field of expertise and age are not the determinants I would have expected. In other words, you can’t assume all the over-50s will avoid it and the under-40s will adopt it. Similarly, you can’t assume that the CS profs will be happy to learn a new system and the English profs will be reluctant. This is good, in a way. Why should everyone conform to my expectations? Keep me guessing!

Let the memory live again

I find the default configurations of most computers to be too low on memory. So one of the first things I did after ordering my PowerBook last year was to order a 1G SDRAM module from NewEgg. (The PB came with 2×256M, and Apple’s RAM is way overpriced.)

Unfortunately, last week I started having trouble with it. I had noticed that my laptop was reporting 768M rather than 1.2G as it did when I first installed the module. This was strange; can just half the memory be working? I removed and re-seated the module, with no improvement.

So then I tried swapping my module with the identical one in Art’s iBook. His computer reported the full amount (1.2G) and now so did mine. A strange situation, but I thought the problem was solved.

But then the next day my PB experienced a kernel panic. If you’ve never seen this on Darwin/OS X, the screen dims slightly, and a message comes up in about 5 languages that “you need to restart your computer,” with little additional explanation. Like a stable Linux, this kernel really does not panic lightly, and it is often indicative of a hardware defect.

At first, I’m not sure I blamed the panic on the recently swapped memory module. But then the computer was up just about 10 minutes before it crashed again. The next time, I only got about 8 minutes. This was becoming serious. As a stop-gap, I swapped the 1G module with the original 256M one, and the crashes ceased.

But this is strange: both 1G modules seem to work in Art’s iBook, neither works fully in my PB—although they exhibit very different symptoms—but the 256M module works fine. I don’t know whether to blame my laptop, the memory, or some crazy conspiracy between them. It’s far simpler to order a new 1G module than to phone AppleCare and complain about my laptop’s behavior with 3rd-party RAM. For now, I’m just surviving on 2×256M.