Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: spam

  • CAPTCHA Isn’t Accessible

    CAPTCHA Isn’t Accessible

    I’m just going to start this with a possibly startling fact. PWNtcha can break 90% of known CAPTCHA algorithms. If that doesn’t tell you why they’re totally useless, then I don’t know what will.

    It’s no secret that I detest and will not use CAPTCHA on any site I build. I have a math-test on one site where I get a lot, but that’s as far as I’m willing to get into that world. People often ask me why I hate it, and I tell them that it doesn’t work and it’s bad for accessibility. The fact that it doesn’t work is proven by PWNtcha pretty well, but the concept that it’s bad for accessibility seems to be lost on a lot of people.

    Screenshot from Star Trek episode 'Wink of an Eye' where Kirk is ordering dinner from the computer CAPTCHA stands for Completely Automated Public Turing test to Tell Computers and Humans Apart. In the begining, it was a great idea. The computer world had just started to try and make AI, and the first attempts at that on the Internet was to put little bots out that talked to people, asking and answering questions. That, in itself, is pretty damn cool, I agree. With working AI, we’re one step closer to ‘Computer, I’d like a bottle of Chateau Picard’s chardonnay, chilled to 68 degrees Fahrenheit, and play some Barry White at volume level 3.’ (illustrated to the right). AI is a great concept. But. What we actually got was people thinking ‘Wouldn’t it be cool if I made something that listened for key phrases and told them about my cool product?’ Basically, spam.

    An early defense against spam was that you had to enter a CAPTCHA code, which showed a picture with letters and numbers, and you entered those letters and numbers into a text field. The magic CAPTCHA verified they were the same and let you in. Pretty cool, right? Except that if there was a way for CAPTCHA to compare the image to the text you entered, then there had to be a way to reverse engineer that so a spam bot could read and enter the same code. Ever since then, it’s been an ongoing fight to make a better mousetrap.

    See, a human can easily read CAPTCHA like these:
    captcha examples that don't matter, suffice they're readable by most sighted people

    But the best ones, the ones that can’t be solved by computers, the ones that even PWNtcha says will last for a long time, are ones I look at and wince:
    captcha examples that don't matter, suffice they're totally unreadable by most sighted people

    Clearly if you make it good enough that a computer can’t crack it, you make it harder for a human to be able to understand it. In that one moment, anyone who has limited vision can’t access your site. Which means you’ve lost a visitor. If this is your business, you’ve lost revenue. And if you think there aren’t a lot of people that this will keep out of your site, think about how many people you know with some form of dyslexia. Think about how many people over the age of 40 (the age at which most of us need reading glasses) visit your site. Even if you run a trendy under-30 store, grammy may want to buy junior a new hip shirt. And don’t even pretend that older people don’t matter. Remember how long ago you were in College? Yeah, you’re getting older too, buddy.

    So they don’t work, they keep real people out of your site, and did I mention you probably don’t need it? I’ve been running ipstenu.org for a very long time (on Internet time – it’s been over a decade). I’ve had less than 20 spam posts show up on my site. None since I turned on comment approval (where I must approve your FIRST comment, but after that, you’re free to post). Akismet has caught about 50k spam posts. Bad Behavior’s caught even more (100k at last gasp) and only two ‘real’ people have ever complained about being caught (one had a virus, one had a bad firewall at school). Sure, if you’re Yahoo, you might need it, but did you know the ‘unreadable’ examples I used above were from Yahoo? Yeah. Google has a pretty basic, easy to read one, and so does Twitter. Facebook has too many, and they’re annoying. They actually probably don’t need them, either.

    Turn off your CAPTCHA. Your users will thank you.

    Continued Reading
    Inaccessibility of CAPTCHA – W3.Org
    It’s Official: Captchas Are Bad for Business – The ZURBlog
    Why you should never use a CAPTCHA – Online Aspect
    CAPTCHA Effectiveness – Coding Horror

  • Spam / Splog Wars

    Spam / Splog Wars

    If you run a blog where anyone can comment, no matter what the software du jour is, you’ve had spammers. There’s really no way around it. As soon as someone comes up with a great, easy, way for you to share content and open discussion with other people, the door opens for people to use that great, easy, way to spam you.

    A lot of people take spam for granted. We get junk mail, we get telemarketers, we get spam. It’s a constant of life. And spam is just like junk mail and telemarketers. They want to get your attention, they want you to click on their links, and they want your money. Some spammers link to actual products (usually sex products) and others link to sites that will infect your computer with a virus. The end result, oddly enough, is the same. They want you, and your visitors, to click on their links and somehow make a profit. The cost overhead is so low that even if just one of my visitors clicks on a link and buys something, they’ve made a huge profit. (By the way, if a post looks like spam, don’t click on their links. Only give your money to reliable companies. Research them, ask around. Be smart.)

    A spammer posts spam comments or uses your site to propagate their crap. A splogger is a spam-blog. A blog that only exists to pimp the same crap. If you have an open-registration CMS site where anyone can make a blog, you will get sploggers. Some people will argue that WordPress is less secure because it gets more splogs than Joomla. I would disagree. More people use WordPress’s built in ‘anyone can make a blog!’ feature than Joomla, so it’s a better target if you’re a spammer. You’re going to get more bang for your spam-buck, so you aim for the biggest target.

    No matter what type of tool you use for spam-trapping, remember that the best tool you have is your eyes, your brain, and your common sense. YOU are the number one, best defense, against spammers. Yes, this means you have to give up of your free time to maintain the site, to monitor the new blogs, to monitor the new comments and users, and stop them. While some posts can be hard to determine if their spam, if you check the email, the URLs and the context, usually you can sort them out.

    Unless you have a dedicated team of people monitoring your site for trouble, it’s hard to keep up. This is where I start throwing in tools to help my site. There are three levels of defense: Server, Account and Application. I’m not linking to many tools, since a lot of this is preference. I like certain tools, other people like other tools. At the end of the day, no matter how good your tools are, the human element is required to be attentive and aware of the site. I’ve blogged before on the dangers of an unchecked multisite, and they remain true. Running a website is work. If you’re not willing to put the time in to maintain, monitor, and defend your site, something bad will happen to your site.

    Server Level

    Set up a good firewall. I use ConfigServer Security & Firewall, which checks against Mod Security and bans people who hit it too hard. This prevents a lot of automated spammers and also stops them, once they GET in, from being able to send out spam emails. A good firewall does wonders for other reasons too, but only if you configure it correctly. ConfigServer has a test it can run to see if your setup is good or middlin’ or poor, and I check it every time I upgrade. Oh yeah, keep current with your firewall tool, too!

    Account Level

    I hesitate on this one, but .htaccess can be used to ban IPs. I don’t like to do this and, generally speaking, don’t do this. If someone skirts by my firewall, I’m not going to block them at the IP level, since there are probably some legit users. Also, the firewall is automated, whereas my .htaccess I’d have to manually update. The point of a good tool is that you don’t have to fiddle-fart around manually too much! That said, there are ways you can kill spammers and sploggers via the .htaccess.

    D’Arcy Norman came up with this awesome way to stop them on WordPress Multisite:

    # BEGIN ANTISPAMBLOG REGISTRATION
    RewriteEngine On
    RewriteCond %{REQUEST_METHOD} POST
    RewriteCond %{REQUEST_URI} .wp-signup.php*
    RewriteCond %{HTTP_REFERER} !.yourdomain.tld. [OR]
    RewriteCond %{HTTP_USER_AGENT} ^$
    RewriteRule (.*) http://someotherpage.tld/ [R=301,L]
    

    Credit: D’arcy Norman.net

    The part I like best is that if you change wp-signup.php to where ever it is your site has a signup page, you can make this work with just about anything. What it does is check the POST requests (which is the server request when you submit a form) for the page wp-signup.php. If those requests do not come from your domain (which if you click on the sign-up button, they must), then it sends them to http://someotherpage.tld/. I send them to a page that says they’ve been caught as a spammer, and to behave better.

    If you must use an .htaccess blacklist, I would strongly suggest you follow D’Arcy’s advice and block them from accessing your registration page only. His thermonuclear option is above and beyond what I need, but I can see it being useful.

    Application Level

    Once you’ve set up your server and account as best you can, you have to start modifying your application. Thankfully, any modern CMS has a good plugin or extension system, and you can leverage that. Sadly, most modern CMS have the same problem of keeping up with the spammers. The tools used are pretty much the same.

    Most CMS have a limited array of built-in tools. WordPress has a way to make all first-time posts require approval for example, but really when you get down to it, you want to throw in some plugins/extensions/modules that are designed to help.

    Content Moderation

    Blacklists are simple. Here is a list of people I don’t want to have access my site. Done. There are the usual caveats for this and the same warning applies as with .htaccess IP blocking: you may block legit users. Personally I prefer moderation lists versus delete blacklists. They put the possible spammer into a bin for me to review and approve or not. Blacklists and mod lists work best, I’ve found, for spam comments rather than splogs. I know, normally, no one on my site will be talking about viagra, but what happens if they have a question about it? The term is on my moderation list. (Funny but true story. I had the word ‘sex’ on a mod filter on another site. Suddenly people were talking about how sexy someone was, and all the posts hit my mod filter. Sometimes these terms are great to block out, but sometimes you forget how they’re really used.)

    That said. There are good, reliable, blacklists. Stop Forum Spam and Project Honey Pot both are blacklists maintained by the community, and get enough information that they can be use reliably. There’s also Spamhaus, but that’s mostly email. You can use these on most applications as well.

    Behavior

    There’s actually a plugin called Bad Behavior, which is a great example of what I mean. Bad Behavior the app analyzes the HTTP headers, IP address, and other metadata regarding the request to determine if it is spammy or malicious. What I mean by the term is simply ‘Is the visitor coming to my site from a legit browser?’ See, spammers don’t use the normal browsers that you and I use, so if someone comes from a special method to my site, they’re probably not a good person.

    Behavioral monitors learn as they go, or as they’re updated, and can be pretty effective. Sometimes people on archaic browsers (IE 6, for example) will get nailed by false positives. There’s not a whole lot I worry about with those, since any browser since 2005 is usually safe to go. There are two people who hit my sites regularly who’ve run into this problem, and once I sorted out what they were doing (Netscape Navigator 4 came out in 1998 and IE 6 came out in 2001), I told them ‘Look, the site will look like crap anyway. Upgrade.’ One guy couldn’t, the other switched to FireFox and admitted to being much happier anyway. (I don’t believe in supporting browsers pre-2005 at this point. I monitor my site stats for them, but realistically, it’s not going to happen. Upgrade, people. You’ll be happier.)

    CAPTCHA

    Personally I hate these. CAPTCHA is named after ‘capture’ and is a contrived acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.” It looks like this:

    The idea behind CAPTCHA is that it should provide a problem easy enough for all humans to solve while, at the same time, prevent standard auto-fill software from filling out the form. Sounds great, but the problem is people have created software to read the CAPTCHA files. Personally, I love the fact that we’re making AIs smart enough to parse this stuff, but it hurts CAPTCHA because in order to defeat the AIs, they tend to become harder and harder to read for real people. Also, most CAPTCHAs are not friendly to people with limited accessibility. If you have dyslexia or glaucoma, they’re of the devil. I would never consider using one on my site unless forced to.

    Human vs Computer Test

    Originally CAPTCHA just meant any challenge/response to stop automated form fillers. Since it now is used, almost exclusively, to refer to those images, I’m pulling out Human Tests. A human test is when you have a question, or form, that requires thought to answer. Like ‘Who’s buried in Grant’s Tomb?’ We all know the answer is ‘Grant’ so you type that into a text field when you register or comment, and magically you have access. I’m fond of simple math questions like ‘What is 12 + 8?’ but also good are site specific questions. One on my friend’s site is about Marg Helgenberger, and they use questions about Marg (like ‘what’s the last name of her character on CSI?’).

    The reverse is to trick the computer. Put a hidden checkbox on your site, that is NOT human readable, and if that box is checked, aha! Spammer! That’s a pretty cool trick if you can make it work.


    Do you have tips and tricks you use?