One would think that keyboard entry is a pretty basic feature of a spreadsheet program. Maybe I’m a newb — I just started using spreadsheets in, let’s see… 1984… with Lotus 1-2-3. Here is Excel 2004 for Mac, completely unusable for data entry. (One minute Flash movie, may not appear in feed.)
And people pay money for this? Denigrate free/open-source software all you wish, but I never had a problem just entering data with Gnumeric or OpenOffice.
Someone appears to have found a workaround, posted just a few days ago. Note that a workaround is not a fix. Normally, “edit directly in cell” is something you might want to do.
Bootstrapping a compiler can be a finicky process, because many compilers are written in the language that they compile. It’s easy to paint yourself into a corner if you’re not extremely vigilant about binaries and configuration management.
I wanted to try SMLserver, a system for writing database-backed web services using Standard ML. It is tightly integrated with the MLkit compiler.
Unfortunately, the distributed binaries would not run on my installation of Debian stable (codename ‘etch’) because they expect a newer version of ‘libc’, the main system library. Upgrading that library could be pretty disruptive, which defeats the purpose of running a ‘stable’ distribution.
Unlike many compiler code bases, MLkit can be built by compilers other than itself, namely SML/NJ and MLton. Ordinarily, this would simplify the process, except that the SML/NJ version it requires is ancient (and doesn’t itself compile out of the box anymore on this system) and MLton has extreme memory demands.
Fortunately, an older binary of MLkit (4.3.0) runs on this system, but at best that’s a starting point. Once a particular revision can bootstrap itself, it’s natural for the source language to evolve beyond what the previous revision could handle. But in this case, the changes were small. Version 4.3.0 lacks some pieces of the ‘Posix.FileSys’ module that 4.3.2 needs, but they were small enough to rewrite:
Index: src/Tools/MlbMake/MlbFileSys.sml
===================================================================
--- src/Tools/MlbMake/MlbFileSys.sml (revision 2311)
+++ src/Tools/MlbMake/MlbFileSys.sml (working copy)
@@ -49,9 +49,9 @@
| EQUAL => SysWord.compare (b,d)
fun unique link f =
- let val s = if link then Posix.FileSys.lstat f else Posix.FileSys.stat f
- in (Posix.FileSys.inoToWord(Posix.FileSys.ST.ino s),
- Posix.FileSys.devToWord(Posix.FileSys.ST.dev s))
- end
+ let val {dev,ino} = OS.FileSys.fileId f
+ in (Word.fromInt ino,
+ Word.fromInt dev)
+ end
end
Index: src/Manager/Manager.sml
===================================================================
--- src/Manager/Manager.sml (revision 2311)
+++ src/Manager/Manager.sml (working copy)
@@ -807,7 +807,6 @@
val fu = (Posix.IO.close (Posix.FileSys.creat (lockfile ^ unique,Posix.FileSys.S.iwusr)) ; true) handle OS.SysErr _ => false
val f = if fu
then (Posix.FileSys.link{old=lockfile ^ unique, new=lockfile}; true)
- handle OS.SysErr _ => Posix.FileSys.ST.nlink (Posix.FileSys.stat (lockfile ^ unique)) = 2
else false
in if fu then (Posix.FileSys.unlink (lockfile ^ unique); f) handle _ => f
else false
Version 4.3.0 can build 4.3.2 patched thusly, which can then bootstrap its unmodified self. Really, I lucked out here. Imagine having to do this across a major compiler release cycle!
Unfortunately, the image size is somewhat big for the design of my web site, unless your browser is already maximized. Anyway, ‘jing’ is a tool to record and upload screen-shots and screen-casts in pretty much one shot. Then they can be linked and embedded from screencast.com just like YouTube.
Unfortunately, it seems hard or impossible to edit videos effectively after recording. Same as other one-shot recording tools like iShowU and Snapz Pro. If you want to edit, transition, or separately mix audio and video, then it seems iMovie needs to get involved. Still need to read up on that. But Jing seems to be the most convenient of the one-shot tools.
Hm, why is my laptop running so hot this morning? Hm, what is that new icon in my menu bar? Why, it’s the Cisco Clean Access agent, eating up 100% CPU time! While I’m on a wired connection. Even if they don’t know how to tie into the MacOS network configuration properly, you’d think they might have heard of the sleep() system call…
Over the past few days, I’ve been cleaning up compromised web sites that run PHP-based content management systems. This got me thinking (not for the first time) about the sad state of CMS security.
One of the key problems is that CMS software is not always treated with the same level of care as regular system software. As with any net-facing software, flaws must be carefully tracked and patches swiftly applied. This is tricky because CMSs are not always installed using the regular package administration system; often they are uploaded into the public_html spaces of regular users.
But I want to address an underlying problem that has more to do with the design of these systems themselves (which of course impacts their suitability for packaging and maintenance in something like the Debian archive). In my experience, many content management systems jumble together the following kinds of files and code that should — following the principles of least privilege and privilege separation — be kept distinct:
Data files or scripts that should directly correspond to URLs.
Data files or scripts that are merely included or opened from other files, and therefore should not be reachable from any URL.
Code that needs privilege to write to the file system or database.
Code that merely needs read privilege.
Directories in which the code can create and modify files.
Writable directories whose contents are accessible as URLs.
Writable directories accessible as URLs, in which script extensions (.php, .cgi, etc.) are honored.
The astute reader should be able to deduce the security implications of each of the above, but here are some hints. Web-accessible library code increases the surface area for exploits. Writable, web-accessible directories invite spam content. Writable, executable, web-accessible directories are havens for malware.
Keeping these categories distinct would not just increase the baseline security of the CMS, it would also improve usability with other tools like chroot jails and suPHP. The latter provides privilege separation by executing scripts with the privileges of their owner rather than the web user (www-data), but of course that requires spawning a new process for each request. One of the reasons that PHP (and later, Python) proliferated in web applications is the lower overhead of running the interpreters as modules within the web server. By isolating the bits of code that need write access, one might achieve a better compromise of efficiency and security.
I understand that a reason for flouting security conventions (apart from ignorance) is ease of installation. Content management systems are often set up by naive users over FTP connections to unprivileged accounts on shared hosts. But I believe it’s entirely possible to design the system more securely for expert users or site administrators, while still allowing naive users their (more vulnerable) one-click installs. Pay-as-you-go security. The content management systems that do get packaged for Debian are usually configured in a pretty reasonable way, but it’s telling that the worst offenders don’t get packaged at all.
Hm, this space has been quiet for a while, but for justifiable reasons: I have two journal manuscripts submitted since the summer break began.
I’m never thrilled about writing for journals, because it often means that the key problem is already solved, and I usually would prefer to work on new problems than to “dot all the i’s” on old ones. On the other hand, it’s liberating to escape the strict space constraints of a conference paper. On the third hand, constraints are sometimes cited as catalysts for creativity. I’m reminded of the proverb “I wrote you a long letter because I didn’t have time to write a short one.”
I have also been ‘sharpening the saw’, also known as… Emacs hacking! Version 22.1 was finally released, and I took it as an opportunity to run through the manual and look for all the great little features and tweaks that have become available since the last time I studied the manual so intently. For example, just one thing that I adore for Java programming is glasses-mode (o^o). On-screen, it inserts some customizable little character in between LongCamelCaseWords so that you see them as Long·Camel·Case·Words. Ha!
Now I’d like to ‘sharpen my shell’ too. Zsh has lots of great stuff that I’m not currently using. I learned shell scripting in the early 90s on straight Bourne shell and tcsh, and only recently learned I could do concise parameter-frobbing things like ${file/foo/bar} rather than `echo $file | sed ’s/foo/bar/’` or whatever. Tab-completion for sub-commands (of svn, darcs, etc.) and host names (for ssh) would be great, and I know there are some directory-hopping features (beyond pushd/popd) that would help me. But one thing I’m grappling with is that I currently use zsh both in regular xterms and inside Emacs shell-mode. In the latter case, a lot of the fancy stuff in zsh won’t work. So do I avoid running shells inside Emacs, or hack shell-mode, or get term-mode working instead? Or, maybe forget zsh and do everything with eshell? Am I prepared to run always in Emacs, even when logged in to remote machines? I’m stuck.
Meanwhile, I cleaned up /usr/local/ on most of my machines. I try to avoid installing anything that’s not managed by apt, even if I have to backport it myself (such as with emacs22 on Debian etch). But sometimes it’s inevitable: either it’s something impossibly obscure, or I need a newer version than what’s available already, or it’s something I have hacked on myself and I need my version installed. So now what I do is keep a branch in /usr/local/src/, install it to /usr/local/stow/, and everything else in /usr/local/ is a symbolic link managed by GNU Stow. This should solve the problem of discovering some problematic file or library in local that I make-installed six years ago, and can’t remember what package it’s from or why it’s there.
I’m really not into file sharing. Really! We currently have a 40G music archive and the vast, vast majority of it is legit; I have a closet full of CDs to prove it. The few tracks that are less than legit are more likely to be ripped from a borrowed CD or copied from a friend’s hard drive than downloaded from an anonymous peer-to-peer network.
The first time I heard about BitTorrent, it sounded extremely cool. You get pieces of the file from various peers, and make the pieces you’ve got available to other peers. They call the whole thing a swarm. Until today, I have used it only to download images of Linux Live CDs — perfectly legit.
But even for less legit targets, I’m not sure it’s doing me much good, perhaps because my tastes are too obscure? If there’s just one peer, in rural Spain, with a 2kb/s throttle, then it’s going to take two days to download a 1 hour video? And if nobody wants the seeds I have, am I destined to remain a leech?
This doesn’t really seem worthwhile!
I blurred the torrent filename, but for the curious, I’ve been a Showtime subscriber for the past 6 years, and for now I have no intention of canceling. I guess we subscribed for Queer as Folk, but when that ended the various other original series have kept us hooked: Huff, Sleeper Cell, L Word, Weeds, Penn & Teller Bullshit; and I’m looking forward to The Tudors and This American Life. But the other day, my DVR messed up and missed one episode in an ongoing series. I already pay for the production of these shows, so I firmly believe using an ‘alternative’ distribution channel to access them is legitimate.
So here I am installing a VNC client on my Powerbook so I can connect to my desktop Linux at work and control a VMware installation running Windows XP. And on that virtual XP? I’m running a GeekOS kernel on Bochs.
Just thought I’d share. Although I’d never personally choose XP for anything, I must admit that it has been convenient to be able to run it in VMware, just so I can see what kind of environment the majority of my students are using, and what problems they may run into. All the software I require for my courses is cross-platform, because I don’t want to be tied to anything. I even can cross-compile GeekOS on my PPC Mac and run it on Bochs there.
I managed to get a virtual XP running on my Linux desktop at work, but so far it doesn’t work at home… and that Linux machine at home is so underpowered at this point, I’m not sure I’d want it on there anyway. So using VNC to connect to it from elsewhere made sense.
More on GeekOS later, but so far hacking it is definitely fun. Learned more about segmentation registers on Intel this week than I ever needed to know.
For the past several weeks, I had been getting up to 30 spam comments per day in the moderation queue. None of them appeared on the site, but receiving the email notifications and having to clear out the queue periodically was a pain. Besides, when I set up WordPress, I took care to implement my own custom “Turing test,” where would-be respondents must answer simple questions like “What is Prof. League’s first name?” Were the spam-bots lucky or clever enough to be answering these questions correctly? Or were they somehow bypassing the test?
This morning, I finally had a chance to investigate what was going on. I added some tracing statements to the commenting functions, so that when they were invoked I would receive an email with some information about variables and control flow. Some tracing emails started showing up within 15 minutes, and I learned two things: the spam-bots were not providing correct answers to my Turing questions (that’s good), and the IP addresses in the traces and the ones getting spam into the moderation queue were disjoint (that’s bad). Well, good and bad. It means that the spam-prevention measures in the regular comment code are working, but also that there must be a back door.
By grepping the server logs for yesterday’s spam-submitting IP addresses — don’t know why I didn’t think of that first thing — I discovered the back door: trackbacks. This is a facility for one blog post to link to another as a comment. This is an interesting idea, but since it’s some other blogging software that does the posting, I can’t really implement extra spam prevention measures here. So I decided just to disable trackbacks and pingbacks completely. That should do the trick!
Next semester, I’ll be teaching an intermediate programming course on OOP and design patterns in C++. Additionally, I may do a series of projects based on GeekOS in my operating systems course. (I taught CMSC 412 at UMCP before the advent of GeekOS, but I see in it some influences from the more ad hoc projects we did back then.)
Recently I have been thinking about online tutorial, submission, and assessment systems. Since both of next semester’s courses will involve exchanging a good bit of code, I hit on the idea of using Subversion both to distribute project code to my students, and for them to submit their code for assessment. This has been done before; I found a SIGCSE paper on using CVS for this purpose [Reid & Wilson, 2005].
In the old days (the early 1990s), most CS students did their major programming assignments on a semi-centralized (UNIX) system, and most departments maintained some setuid script for managing submissions. At UMCP, my friend Gabe automated the assessment of assignments to amazing levels, with the help of Perl and shell scripts.
An area that I think is under-explored still is using some kind of automated tutor to help students in a CS0 or CS1 comprehend and practice the very fundamentals of programming: conditionals, loops, arrays, etc. There was a special issue of JERIC recently (Journal on Educational Resources in Computing) on automated assessment, but the aim of many of the articles was to save time and give individualized feedback to classes with 400 students. That seems a little dated now — with CS enrollments down as they are — but I guess it may still occur at a few large schools.
I’m interested in automated assessment not for the time-saving or scalability, so much as for a mechanism that can encourage students to practice on their own time, outside of class, and in addition to assigned work. The system should be able to generate a variety of unique problems to solve, offer hints and help, and assess the student’s progress.
Anyway, I did figure out today how to set up Subversion as a submission tool. It requires the (slightly) more sophisticated access control that you get running it from Apache 2 and the authz module. I set up the top-level of the repository with a public/ folder, and folders for each student: alice/, bob/, carol/, etc. The instructor and TAs should be able to read and write anywhere, but students can read from public and read/write their own folders only. Here’s my authz file that seems to do the right thing:
[cs150s07:/]
league = rw
* =
[cs150s07:/public]
league = rw
* = r
[cs150s07:/alice]
alice = rw
[cs150s07:/bob]
bob = rw
Then, files provided for the assignments are committed to public/a1/, public/a2/, etc. and copied into the student folders with svn copy.
Two tips from the CVS paper that I think are good ideas: first, when students have problems and seek assistance, insist that they commit what they have to the repository, so you can update and help them out without the awkward emailing of files back and forth. When helping a student through a problem face-to-face, check out a fresh copy, show him how to fix the problem, and then wipe the fresh copy so he still has to fix it again on his own.
Second, if we can encourage students to commit often, we may get a better glimpse of their working habits — such as when they start on assignments — and confront them about problems early on. This buys back a little of the surveillance power we had when everyone did their work on the same machine: you know there’s a problem when johnny hasn’t even logged in and the assignment is due in 5 hours.
As this semester is winding down, I feel the need to debrief myself a bit about how it went. But I’m going to try to hold off on that (at least publicly) until all the grades are in!
If things go well in the next week or so, I’ll be submitting a paper to a conference in a field very different from my own. Somehow the progression makes sense to me, hopefully others agree.
Today I decided to check out the submission process, which is run by Microsoft. Not only do both Firefox and Safari complain about the SSL certificate (unknown certification authority), but the site is full of overlapping text and boxes (as shown in Firefox):
Is this stuff still so hard? Are the folks at MS not permitted to test against Firefox?
This TV show advert — plastered all around New York lately — is making me angry. Why?
That curly thing is not an apostrophe! And you don’t have to be a font freak or typography wonk to know the difference. In grade school — before I could distinguish Garamond from Gill Sans, before Adobe Systems was founded — I knew that an apostrophe curved down and to the left.
So how did this happen? Considering that computer keyboards have no ‘left curly single quote’ key — and that probably 98% of all computer users wouldn’t know how to type that character if their lives depended on it — how could this gaffe occur when the apostrophe key is right there on your keyboard?
Yes, you know where I’m going with this: SmartQuotes.™
This is the feature on many word processors and desktop publishers that automatically converts typewriter-style straight quotes into curly ones. Unfortunately, it does a poor job of it, and that’s often worse than not doing the job at all.
Now, I’m not one to ridicule or be offended by home-made garage sale fliers and grocery store signage with their superfluous quotation marks. Er, well, I don’t extensively ridicule them.
But here is a case of a major broadcasting firm with professional graphic artists plastering their large full-color ads across a major city in which you can’t swing a cat without hitting a designer. There’s just no excuse.
P.S., it’s even wrong in the HTML on the web site:
When writing about my encrypted disk partitions, I noted that “the drop in performance [with IDE] is definitely noticeable [compared to SCSI]… any disk-intensive activity also drives up the CPU load.”
Well, it turns out that my kernel was configured incorrectly, leading to extremely poor IDE performance. Some investigation with hdparm revealed that DMA (direct memory access) was not enabled! No wonder it was so horrible. What’s worse, hdparm is unable to turn it on with my custom kernel. So I booted an Ubuntu stock kernel, and DMA was enabled automatically and read throughput was about 7 or 8 times faster. Wow, all this time I thought IDE just really sucked that bad!
So now I need to figure out what’s wrong with my custom kernel. True, I did build it back when I was primarily using SCSI — I think the only IDE drive I had in this machine at that time was for Windows 98.
Really, I wanted to love you. You seem cleanly designed. I adore your model of a persistent file system, where even branches and tags are just sub-directories. Your commands mostly make sense. I appreciate that many of them work without repository access, so I don’t have to wait long to get a status or a diff.
But now you’re screwing me over. All I wanted to do was take the WordPress 2.0.4 upgrade for a spin in your vendor branch. I know it has been a few weeks since I last spoke to you. But now all you can tell me on my Powerbook is “Bad database version: compiled with 4.4.16, running against 4.3.29.” And on Debian, you say only “svn: bdb: Program version 4.2 doesn’t match environment version.” What did I do to deserve this?
You’re jealous of darcs, aren’t you? How petty. Anyway, it’s your affair with Berkeley DB that got us into this mess. I know, you’re seeing ‘FSFS’ now, whatever that is. You say things are better this way, but where does that leave me?
I suppose I should accept some blame too. Some of my repositories are private — read and written only by me — and I wanted them available for commits when I’m offline. So I put them in my home directory and synchronized with unison to other architectures and operating systems. I know, I know. To say this is ‘not recommended’ is understatement. But somehow it seemed to work okay for a while.
Last week, I bought a pair of 200G IDE disks, just because they were dirt cheap. Probably I’ll use one at home and one at work. I already have a 150G at home for music and such.
I used to be a SCSI snob — and I guess in some ways I still am — but I just can’t afford that habit anymore! Although I miss the performance of SCSI, the price differential per GB is enormous. The drop in performance is definitely noticeable, particularly since I now have two large disks on the same bus. Any disk-intensive activity also drives up the CPU load, which it never would do with SCSI. And forget running more than one disk-intensive process at a time. If I’m still on the computer when ‘updatedb’ starts running, it’s time for bed.
Anyway, in rearranging my file systems at home, I decided to try something new. I now have my root and /home file systems on encrypted partitions. Why? Just because I can, I guess. It might be a fairly valuable technique on a laptop, which is more easily lost or stolen. At least then, you can be reasonably confident the thief can’t access your data.
On a home desktop machine though, crypto seems admittedly frivolous. Am I part of the tinfoil hat set, who thinks the FBI (or some darker, more sinister organization) is going to sneak in and confiscate or clone my drives? Do I have anything on there to hide anyway? Not really. But I do believe strongly in a right to privacy. And if we don’t exercise the rights we do have, we are likely to lose them.
The Disk Encryption HOWTO by David Braun was essential reading, although I didn’t follow its prescriptions precisely. You will need a Linux 2.6 kernel with ‘cryptoloop’ and ‘aes’ compiled in, and the ‘loop-aes-utils’ package that provides crypto-aware versions of ‘mount’ and ‘losetup’.
What happens, essentially, is this: I keep a small unencrypted boot partition near the beginning of the disk. It contains the kernel, the aforementioned ‘loop-aes-utils’, some scripts, a set of keys, and a few other essential binaries: sh, ls, and pivot_root. I configure grub to boot and root from this partition, and provide the kernel with a custom init script. This script prompts the console for a master password (must be 20 or more characters), and uses this to unlock an image containing the keys to each partition. The keys themselves are totally random 60-character strings.
Once the keys are available, the init script uses ‘losetup’ to configure crypto-enhanced loop-back devices for each partition. Then it can unmount the keys, mount the soon-to-be root partition, pivot_root to it, and invoke the real /sbin/init. The remaining partitions will be mounted automatically later on, so long as you use the /dev/loop devices in /etc/fstab, or better yet, refer to them by filesystem label.
It sounds fancy, but once I was familiar with the tools and their capabilities, it wasn’t that bad to set up. The HOWTO describes booting off of a USB stick that contains the keys and kernel; this way authentication is based on something you know (master password) and something you have (the USB stick). This was too much of a pain for my home setup, plus my BIOS is too old to boot from USB.
What took the most work was allaying my fears that I’d be totally hosed when something goes wrong with the boot process. It turns out that the current Ubuntu Live CD (6.06) includes a kernel with the required modules. So I can boot from the Live CD, mount my /boot partition, and then manually use the keys to mount my encrypted partitions. The only important thing is to keep a safe backup of the boot partition, especially the keys. If I lose those 60-character keys I really am hosed. Currently I have /boot mirrored on both disks, and the keys file copied on various other machines.
Why not just encrypt /home, or even just $HOME for the current user? Encrypting the root filesystem is something of a pain, involving as it does a pivot_root and delicate boot-time hacking. As the HOWTO points out, a GNU/Linux system really makes no guarantees about information flow; there’s no telling what stuff from /home may show up in /var/log or wherever. So it’s simplest just to encrypt everything, including the swap partition.
So maybe this display of Linux wizardry makes up for my gaffe about iptables earlier in the week.
(Funny that I started this post complaining about the performance of IDE drives, and then proceeded to add a layer of encryption on top of that. I haven’t done extensive benchmarking, but I did run ‘iozone’ a few times, and as far as I can tell the crypto only slows down reads and writes by 1 or 2%.)