Sun. Dec 12, 2004
MT Plus Comment Spam Equals Dead Site
Let’s start by explaining our terms (and thus likely narrowing our audience). If you don’t know that the “MT” in the title stands for “Movable Type” (a very popular program used in creating weblogs), this article likely won’t interest you. As for “comment spam,” Adam Kalsey explains: “Usenet news succumbed to spam long ago. Email was next. Now spammers have turned their attention to weblogs and comment forms. In order to increase search engine rankings you are posting advertisements to our Web pages.” Note, his “manifesto” was written over a year ago, and the problem has only gotten worse since then. Much, much worse.
In fact, it’s now stressing web servers so greatly that a number of hosts are shutting down comments in Movable Type, or shutting down Movable Type itself. So, if you run a weblog using Movable Type, and have comments enabled (even with MT Blacklist, as you’ll see below), you’ve got a problem. Or rather, you may be causing one at your web host, and you may get shut down with no notice.
I’ve had to restore permissions on MT for a friend who got shut down at Pair due to comment spam, and the server my site is on at TextDrive has been taken down (briefly) twice in the past week or so by thousands of MT processes run amuck. And we’re talking industrial strength web servers, like the one discussed below:
You mean the persistent disruptions of my weekend luncheons? Yes.
All MT. All mt-comments.cgi.
We typically run at a [server] load less than 1 just about everywhere. Despite having millions of emails and millions of web hits every day now.
One.textdrive.com is a relatively expensive server with dual xeons, 6GB hard wired RAM, 6×73GB RAID-arrayed SCSI 10K Seagate Cheetahs and a FreeBSD kernel optimized for running web and database servers. It’s also on a 100Mbps switch with two Gigabit ethernet cards. I have a Linux-threaded static build of MySQL and I load that build into 1GB of RAM when the database server loads up. I have a fast static build of Apache in front of that.
The server is designed to be the kind of thing that could do 20,000,000 requests a day without a problem. In many ways no one uses servers like these for shared hosting.
If you were to lease one.textdrive.com’s hardware and it’s network connection, if you would cost you about $12,000+ a year. If you were to contract someone like myself to build it and get it up and running, it would cost you about $10,000. If you were to hire someone like me to watch over it, it would cost you well into the six figures a year.
So that said…
MT-comment.cgi, it’s inherent nature and the fact that it’s targeted so can push this server to loads of nearly 300…Jason at Textdrive Forum: MT comments on one are off
Jason also says “MT-Blacklist sucks,” and has the screenshots and figures to illustrate (a site that gets 300-400 legitimate comments per month is getting over 46,000 hits on mt-comments.cgi, while MT-Blacklist has less than 200). And it’s not just at TextDrive, as Joe Katzman reports:
...We have taken other measures to combat the 20,000 comment spam attempts we’ve seen in the last 2 weeks. For reasons we’re still trying to figure out, the spams are causing problems for our hosts at Total Choice Hosting due to server load.
Over the last 2 weeks, Blacklist may have blocked 18,000 spams, but it also forced moderation of another 2,000 or so, and in many cases they were already-blacklisted items that got through due to a flaw in the system.
Blacklist also changes in one more important way. Instead of comparing new comments against a text file blacklist, it stores the blacklist items (in our case about 2,750 items and 60 programmed “catch alls” for various things, and we aren’t unusual) in MySQL. This forces MySQL database calls whenever a comment is submitted. 1500 database hits a day may not mean much, but if you get 100 from various IPs in about 10 seconds, is that a problem? I don’t know what’s happening elsewhere, but it has been a problem for us at Total Choice Hosting.
Meanwhile, if you’re considering following in our recent technical footsteps [an “upgrade” to MT 3.x and Blacklist 2.0], a word of friendly advice: DON’T.
And elsewhere, from Eric Rice: “I woke up this morning to pissed-off people. My ISP was faced with Movable Type (MT) scripts bringing down the server, as the newest wave of comment spam seemed to make anything near this server not respond — DNS, Email, WWW, SSH — you name it [...] The culprit? MT-comments.cgi was running tons of processes, taking half the memory for a light-trafficked community blog site.”
And from Chris Lehmann:
I came home and found us in the middle of a blog spam attack. The load on our machine was up around 200 by the time I was able to get a root prompt and shut down httpd. I had to let the machine calm down, restart httpd and then log into MT to be able to add the offending spammer to our blacklist. Each time I restarted httpd, the attack started again, and I had to shut it down after getting one step closer.
The whole process took about an hour of my life that I can’t have back.
Now, right now, Beacon can’t afford to upgrade to MT 3.1. So either we have to turn off comments, stop blogging or deal with the fact that we all will have really disgusting email messages in our inboxes every day and have periodic shutdowns of our system.
Elise Bauer has become well know for her helpful tutorials on using MT, and she says “Spammers are getting more aggressive with Movable Type blogs everyday. I have found the only really effective measure to completely block spam is the combination of using MT3 and TypeKey to require approval of comments before they are posted and the MT-Blacklist to keep your inbox from being swamped by hundreds of spam comments waiting for approval.”
I don’t mean to criticize Elise, as I’m sure what she says is true. But from what I’ve seen, Typekey is a real barrier to a lot of would-be commenters, who will simply go away rather than sign up (plus those who do create an account, and still have trouble leaving a comment). And then if the visitor successfully jumps through that hoop, the only way left for the site owner to block spam is to approve each and every comment manually, even though you’re running server hogging “automated” apps? That’s not a commenting system, that’s a commenting bureaucracy. On the front end, you’ve got a hurdle for the visitor who just wants to bang out their thought, and on the back end you’ve got a hurdle for the site owner, who must manually approve each and every comment. Like it’s a job, or something.
If it has truly come to that, why not just put an e-mail link at the bottom of each article that says “if you have a comment, please e-mail it to me,” and then paste them into a passworded form yourself. Because that is in essence what you’re doing with the above “system.”
In the saddest irony, posts from the developers themselves are afflicted with trackback spam (scroll to the bottom). Including the one announcing that Jay Allen is joining them to combat comment spam. I’ve watched spam appear on those threads, be removed, and then one week later, it’s back. On the developer’s site.
Not very encouraging, since Six Apart is the obvious direction MT users turn when they face this issue, and many of them have paid money for a license this year. To Six Apart’s credit, they did hire Jay Allen, and one would assume they’re hard at work on a solution. I don’t know, I’m not programmer, but it seems to me they are so deep in the Perl Soup that the whole MT Community may take some heavy blows before any viable solution is publicly available. It’s going to take more than a band aid. Again, I’m no programmer, but it seems as long at the rendering of comments involves static files rather than dynamic display (as well as some throttling of MySQL requests under the stress of a spam attack), there will continue to be big issues.
But I don’t think you can lay all of this on Six Apart. First of all, how can you damn them for the fact their software has become so popular? It is, in effect, the “Outlook” of blogging tools, and it is therefore targeted by Black Hats, just as Outlook is. For the same reason. Simple predominance. It’s hard to blame Six Apart for that.
Then, of course, you have the cockroaches of the Internet, the spammers themselves, who deserve nothing less than to be cast into a swirling sucking pit of despair, where they will spend eternity taking 80% of the overdose level of Phentermine, Viagra, and Rogaine, while being forced to play a version of Texas No Hold ‘Em Poker where the losers will get forcible breast implants … if they’re male … and penis enlargements for the rare spamming woman.
Finally, we have Google. In fact, they are the Patient Zero of this plague. These spammers leave comments with links in weblogs because many weblogs have a relatively high Page Rank (the way Google sorts returns for any search), and by creating a link within that highly rated site, they “steal” some of that Page Rank, in hopes of increasing their own search returns for their various nefarious schemes.
I hear those guys at Google are pretty smart, and gots lots of computers. I’m betting they could figure out a way to filter these spam comments from their index, based on keyword or URL, or even establish a common protocol where anything wrapped in a certain tag/id/class (like the whole list of comments) would have no URL’s indexed by the Googlebot. Like I said, they’re pretty smart, and I feel certain they could provide a solution to this.
So … why haven’t they? Well, my guess is because Google owns Blogger. And Blogger competes with MT. So why would Google go out of its way, or be in any hurry at all to help a competitor with this problem? Perhaps especially now that it’s causing MT to be shut down in some places.
Meanwhile, in just my limited personal experience or reading over the past two weeks, five hosts have in some way disabled MT or MT comments because of the server load they were creating. Not five little Mom & Pop hosts, at least three of them I’d consider serious to top-notch hosts. One of them, Pair, has been around forever, has a serious rep, and as late at this May, claimed they were very “MT friendly.” Yet last week I had to log in to a disabled MT install at Pair, and get the permissions back up again so the author could switch over to Typekey. Which has generated problems and complaints from visitors.
If you’re an MT user, I’m not sure what advice to give you. I still use MT, but for two “miniblogs” in the sidebar with no comments, and one that does have comments enabled, but only gets a legitimate one maybe once per month (it’s never gotten spammed either). Still, my primary weblog, what you’re reading right now, is powered by Textpattern. In addition to its various built-in spam countermeasures, it’s a dynamic system rather than static. No rebuilds. Wordpress is similar, in that it is dynamic and has spam countermeasures.
Of course, neither app’s user base is as large as MT’s, and therefore they are less targeted. That could well change. But if it does, I think both have some advantages that MT lacks. Both are open source, and both have devoted communities. The collective response to a serious problem would be organic, broad based, and swift. With Six Apart, it’s not open source, it’s a corporation’s property. Though they’ve been hiring coders at an increasing clip, Six Apart still has a limited number of man hours to throw at any problem, given the profit demands of product growth.
While I very much hope Six Apart can pull a mean fanged rabbit out of their hat, I wonder if they can do it in time. The problem is rapidly escalating, and users are having their sites shut down. Afterwards, they face the same problem with a choice of MT solutions that is very limited, and overly complex for many users.
Frankly, I’m out of the MT support business, simply because I’m now dangerous. I don’t know MT 3.x (just up to v2.6, for me), don’t know Typekey, and don’t know MT-Blacklist. Don’t wanna know. Don’t need to know. I’d rather spend the time learning about things with which I can earn money, and I think it’s pretty conclusive that there never be a Blogger Caste living in mansions in Beverly Hills.
You, however, may be an MT user who hasn’t had a problem. Or, have only had a few comment spams. With the number of blogs out there (especially with less than attractive Page Ranks), the odds are in your favor. You just have to wait and see. And the same with a new solution from Six Apart. I have no doubt they will do something. You just have to wait and see what it is, and how long it takes.
Just know that the e-mail could come with no warning: “Due to problems it was causing on the server, we were forced to disable the following script located in your account…”
So you can wait. Or you can move forward. You’re going to invest time in it either way, so pick yer poison.
And be aware that MT is becoming less than popular on many web hosts out there on the InterWeb. It’s taken down this site twice in a couple of weeks, and this site wasn’t under attack. Its MT-using server-mates were.
Later: An example of how Typekey and MT-Blacklist can frustrate the hell out of a user.
Even Later: From Anil on the Six Apart site: “There are a variety of ways to deal with spam, ranging from technical to legal to social methods, and we’ll discuss them all [...] We’ll have more details today, and a full overview within 48 hours.”
Important Update: The solution is coming.