For the past several weeks, I had been getting up to 30 spam comments per day in the moderation queue. None of them appeared on the site, but receiving the email notifications and having to clear out the queue periodically was a pain. Besides, when I set up WordPress, I took care to implement my own custom “Turing test,” where would-be respondents must answer simple questions like “What is Prof. League's first name?” Were the spam-bots lucky or clever enough to be answering these questions correctly? Or were they somehow bypassing the test?
This morning, I finally had a chance to investigate what was going on. I added some tracing statements to the commenting functions, so that when they were invoked I would receive an email with some information about variables and control flow. Some tracing emails started showing up within 15 minutes, and I learned two things: the spam-bots were not providing correct answers to my Turing questions (that's good), and the IP addresses in the traces and the ones getting spam into the moderation queue were disjoint (that's bad). Well, good and bad. It means that the spam-prevention measures in the regular comment code are working, but also that there must be a back door.
By grepping the server logs for yesterday's spam-submitting IP addresses – don't know why I didn't think of that first thing – I discovered the back door: trackbacks. This is a facility for one blog post to link to another as a comment. This is an interesting idea, but since it's some other blogging software that does the posting, I can't really implement extra spam prevention measures here. So I decided just to disable trackbacks and pingbacks completely. That should do the trick!