Spam / Splog Wars

If you run a blog where anyone can comment, no matter what the software du jour is, you’ve had spammers. There’s really no way around it. As soon as someone comes up with a great, easy, way for you to share content and open discussion with other people, the door opens for people to use that great, easy, way to spam you.

A lot of people take spam for granted. We get junk mail, we get telemarketers, we get spam. It’s a constant of life. And spam is just like junk mail and telemarketers. They want to get your attention, they want you to click on their links, and they want your money. Some spammers link to actual products (usually sex products) and others link to sites that will infect your computer with a virus. The end result, oddly enough, is the same. They want you, and your visitors, to click on their links and somehow make a profit. The cost overhead is so low that even if just one of my visitors clicks on a link and buys something, they’ve made a huge profit. 1

A spammer posts spam comments or uses your site to propagate their crap. A splogger is a spam-blog. A blog that only exists to pimp the same crap. If you have an open-registration CMS site where anyone can make a blog, you will get sploggers. Some people will argue that WordPress is less secure because it gets more splogs than Joomla. I would disagree. More people use WordPress’s built in ‘anyone can make a blog!’ feature than Joomla, so it’s a better target if you’re a spammer. You’re going to get more bang for your spam-buck, so you aim for the biggest target.

No matter what type of tool you use for spam-trapping, remember that the best tool you have is your eyes, your brain, and your common sense. YOU are the number one, best defense, against spammers. Yes, this means you have to give up of your free time to maintain the site, to monitor the new blogs, to monitor the new comments and users, and stop them. While some posts can be hard to determine if their spam, if you check the email, the URLs and the context, usually you can sort them out.

Unless you have a dedicated team of people monitoring your site for trouble, it’s hard to keep up. This is where I start throwing in tools to help my site. There are three levels of defense: Server, Account and Application. I’m not linking to many tools, since a lot of this is preference. I like certain tools, other people like other tools. At the end of the day, no matter how good your tools are, the human element is required to be attentive and aware of the site. I’ve blogged before on the dangers of an unchecked multisite, and they remain true. Running a website is work. If you’re not willing to put the time in to maintain, monitor, and defend your site, something bad will happen to your site.

Server Level

Set up a good firewall. I use ConfigServer Security & Firewall, which checks against Mod Security and bans people who hit it too hard. This prevents a lot of automated spammers and also stops them, once they GET in, from being able to send out spam emails. A good firewall does wonders for other reasons too, but only if you configure it correctly. ConfigServer has a test it can run to see if your setup is good or middlin’ or poor, and I check it every time I upgrade. Oh yeah, keep current with your firewall tool, too!

Account Level

I hesitate on this one, but .htaccess can be used to ban IPs. I don’t like to do this and, generally speaking, don’t do this. If someone skirts by my firewall, I’m not going to block them at the IP level, since there are probably some legit users. Also, the firewall is automated, whereas my .htaccess I’d have to manually update. The point of a good tool is that you don’t have to fiddle-fart around manually too much! That said, there are ways you can kill spammers and sploggers via the .htaccess.

D’Arcy Norman came up with this awesome way to stop them on WordPress Multisite:

# BEGIN ANTISPAMBLOG REGISTRATION
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-signup.php*
RewriteCond %{HTTP_REFERER} !.yourdomain.tld. [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule (.*) http://someotherpage.tld/ [R=301,L]

Credit: D’arcy Norman.net

The part I like best is that if you change wp-signup.php to where ever it is your site has a signup page, you can make this work with just about anything. What it does is check the POST requests (which is the server request when you submit a form) for the page wp-signup.php. If those requests do not come from your domain (which if you click on the sign-up button, they must), then it sends them to http://someotherpage.tld/. I send them to a page that says they’ve been caught as a spammer, and to behave better.

If you must use an .htaccess blacklist, I would strongly suggest you follow D’Arcy’s advice and block them from accessing your registration page only. His thermonuclear option is above and beyond what I need, but I can see it being useful.

Application Level

Once you’ve set up your server and account as best you can, you have to start modifying your application. Thankfully, any modern CMS has a good plugin or extension system, and you can leverage that. Sadly, most modern CMS have the same problem of keeping up with the spammers. The tools used are pretty much the same.

Most CMS have a limited array of built-in tools. WordPress has a way to make all first-time posts require approval for example, but really when you get down to it, you want to throw in some plugins/extensions/modules that are designed to help.

Content Moderation

Blacklists are simple. Here is a list of people I don’t want to have access my site. Done. There are the usual caveats for this and the same warning applies as with .htaccess IP blocking: you may block legit users. Personally I prefer moderation lists versus delete blacklists. They put the possible spammer into a bin for me to review and approve or not. Blacklists and mod lists work best, I’ve found, for spam comments rather than splogs. I know, normally, no one on my site will be talking about viagra, but what happens if they have a question about it? The term is on my moderation list. 2

That said. There are good, reliable, blacklists. Stop Forum Spam and Project Honey Pot both are blacklists maintained by the community, and get enough information that they can be use reliably. There’s also Spamhaus, but that’s mostly email. You can use these on most applications as well.

Behavior

There’s actually a plugin called Bad Behavior, which is a great example of what I mean. Bad Behavior the app analyzes the HTTP headers, IP address, and other metadata regarding the request to determine if it is spammy or malicious. What I mean by the term is simply ‘Is the visitor coming to my site from a legit browser?’ See, spammers don’t use the normal browsers that you and I use, so if someone comes from a special method to my site, they’re probably not a good person.

Behavioral monitors learn as they go, or as they’re updated, and can be pretty effective. Sometimes people on archaic browsers (IE 6, for example) will get nailed by false positives. There’s not a whole lot I worry about with those, since any browser since 2005 is usually safe to go. There are two people who hit my sites regularly who’ve run into this problem, and once I sorted out what they were doing (Netscape Navigator 4 came out in 1998 and IE 6 came out in 2001), I told them ‘Look, the site will look like crap anyway. Upgrade.’ One guy couldn’t, the other switched to FireFox and admitted to being much happier anyway. 3

CAPTCHA

Personally I hate these. CAPTCHA is named after ‘capture’ and is a contrived acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.” It looks like this:

The idea behind CAPTCHA is that it should provide a problem easy enough for all humans to solve while, at the same time, prevent standard auto-fill software from filling out the form. Sounds great, but the problem is people have created software to read the CAPTCHA files. Personally, I love the fact that we’re making AIs smart enough to parse this stuff, but it hurts CAPTCHA because in order to defeat the AIs, they tend to become harder and harder to read for real people. Also, most CAPTCHAs are not friendly to people with limited accessibility. If you have dyslexia or glaucoma, they’re of the devil. I would never consider using one on my site unless forced to.

Human vs Computer Test

Originally CAPTCHA just meant any challenge/response to stop automated form fillers. Since it now is used, almost exclusively, to refer to those images, I’m pulling out Human Tests. A human test is when you have a question, or form, that requires thought to answer. Like ‘Who’s buried in Grant’s Tomb?’ We all know the answer is ‘Grant’ so you type that into a text field when you register or comment, and magically you have access. I’m fond of simple math questions like ‘What is 12 + 8?’ but also good are site specific questions. One on my friend’s site is about Marg Helgenberger, and they use questions about Marg (like ‘what’s the last name of her character on CSI?’).

The reverse is to trick the computer. Put a hidden checkbox on your site, that is NOT human readable, and if that box is checked, aha! Spammer! That’s a pretty cool trick if you can make it work.


Do you have tips and tricks you use?

About these ads

Notes:

  1. By the way, if a post looks like spam, don’t click on their links. Only give your money to reliable companies. Research them, ask around. Be smart.
  2. Funny but true story. I had the word ‘sex’ on a mod filter on another site. Suddenly people were talking about how sexy someone was, and all the posts hit my mod filter. Sometimes these terms are great to block out, but sometimes you forget how they’re really used.
  3. I don’t believe in supporting browsers pre-2005 at this point. I monitor my site stats for them, but realistically, it’s not going to happen. Upgrade, people. You’ll be happier.
StudioPress Theme of the Month

Comments

  1. Great post. I’m not sure that I agree with blocking out pre 05 browsers but you have some really useful hints for wordpress multi site installations. Particularly liking the .htaccess option!

    • It’s 2011 in 22 days. Name me a pre-2005 browser that people are still using, besides Lynx.

      Anything pre 2005 doesn’t support HTML 4 and CSS completely, and lord knows not HTML 5. It’s old, out of date and should be upgraded. It’s like VHS. :)

  2. Use Akismet… It works fine…

Trackbacks

Half-Elf? Try Half OFF WordPress ebooks!