I’ve talked about this before (Spam/Splog Wars) but instead of telling you how to fight spam, I’m going to talk about how it works.
There are two main kinds of spambots, email and forum. Forum bots are a misnomer, as they’re really ‘posting’ bots, but we won’t get into the semantics of names. These posting bots basically are written to scan the web, via links, looking for anything that accepts data (like a comment form on your blog) and submit spam content. There is a limited number of ways for data to be entered on a website, and generally it’s all done via a form. All a bot has to to so is search the code of the page for form data and input it’s crap content. Given that coders tend to label forms things like ‘content’ and ’email’ and ‘username’ for ease of support and modification, it’s not too hard to see how the spambots are written to exploit that functionality.
The main ways to prevent spam are to block the spammers or make users prove their real people. Obviously manual spammers (and yes, there are people paid to post spam on blogs) can’t be defeated this way. Blocking spammers is either done by their behavior (if they access your site via anything other than a browser, or if they post data in any way other than from the ‘post comment’ link on your site, they’re probably a spammer) or by a known-bad-guy list. Those both have problems, as spammers have multiple methods of attack. Making a person prove their humanity is generally done via CAPTCHA, which I’ve mentioned before has its own issues. (CAPTCHA Isn’t Accessible ) Another way is to go at it in reverse. Knowing that the spam bots scan a webpage for fields they can fill in, you set it up so there’s a hidden field which only a spammer would fill in. If they fill that field in, they’re spam and they go to the bin.
I think that our methods for blocking spammers is wrong.
Everything we’re doing is defensive. We wait for a spammer to attack, sort out what they did, and write better code to stop them. Sure, we have some Bayesian spam tools out there, and some fairly impressive algorithmic learning engines that check the content of a post and punt it to the trash. But in the end, it’s all reactive medicine.
Anyone who does support will tell you that even one email or comment sent to the spam bin accidentally is bad. How bad? How many times have you said ‘Did you check your spam emails?’ to someone who didn’t get an email? Any time you impede your real users, the regular people who visit your site, the worse your spam-fighting is. If a user has to jump through hoops to comment, they’re not going to do it. If a user has to register to post, they probably won’t do it (forums are an exception to this).
The goal of spam fighting should be to prevent the spam from being sent.
We all know why spammers spam. If even one person clicks on their links and buys a product, they’ve profited. It’s just that simple. They make money for a low-cost output, it works, so they keep doing it. Therefore the only way to make them stop is to remove the profitability. Of course if it was that easy, we’d have done it already.
This begs the question ‘Who runs these spam bots?’ It’s two fold. There are people out there who run spamming programs, certainly, but there are also people who’ve had their computers infected by a virus or malware, and unknowingly spam people. That’s part of why I don’t like blocking IPs. However the software had to come from somewhere. Cut off the head of the snake and the spam stops! We already have the Spamhaus project, which patrols the internet and looks for sites pushing malware, as well as sending spam email. There’s also Project Honey Pot, which fakes links and forms to trick a spam-bot.
Sadly, until we have more resources dedicated to stopping spam service providers, we’re going to have to keep trying to out think them and play defense.