Referrer Madness

Everyone’s heard of Semalt by now. They are, weirdly, an actual company run by actual people, who are entirely weird and annoying.

I should explain. I’ve talked to them in email and twitter, and I’ve read about them all over the net, like everyone else has. They’re an ‘SEO’ company who trawls the net via bots, just like Google and everyone else, tracking you and your competitors. Here’s how they explain it:

Semalt is a professional webmaster analytics tool that opens the door to new opportunities for the market monitoring, yours and your competitors’ positions tracking and comprehensible analytics business information.

That sounds vaguely legit when you look at it on the surface. They’re based in the Ukraine, which explains the imperfect English, and showed up right around the time Russia was invading, so most of us made Putin jokes and moved on. They’re not actually doing anything bad, they’re just acting like a regular bot, scanning your site…

Except they’re not.

A coral reef

When asked, they’ll tell you that Semalt crawler bots visit websites and gather statistical data for their service, simulating real user behavior. Their crawler bots, and yes, they admit they’re bots, don’t click on advertising banners or extend links. And all the visits are automatic and random.

This means their goal is to get a bot that acts like a human. Now I don’t know about you, but I don’t trust anything when I can’t see it’s brain, and I certainly know better than to believe in true random when it comes to software. But what gets me is how you stop the bot from scanning.

Everyone uses a robots.txt file to block bots from scanning things they don’t need to scan. If you use WordPress and have pretty permalinks on, go to http://example.com/robots.txt and you’ll see a default file, made by WordPress, to block various folders like wp-admin from being scanned.

Semalt ignores these. They also ignore things like bot rate limiting, and they use IPs from around the world to scan your site (arguably to get a better idea of real speed and response), so they end up acting a little like a DDoS attack. Worse, they claim to act like a ‘user’ but I never have a link to my wp-admin pages from the front of my site, which means their bot is checking for WordPress and going there not because a user would have any reason, but simply because Semalt knows WordPress is there.

They have a form you can fill out to have it removed if you want, but we’ve been using robots.txt for years, and I simply fail to understand why they’re ‘better’ than the standard.

Besides that, what’s the real issue here? Semalt is screwing up my stats. They’re using referrer links to check my sites out, which means I have a bunch of referral links like this:

semalt.com/competitors_review.php?u=http://halfelf.org

Those links tell me someone linked to me, and generally I go back and check them out to see if they’re something I want to talk to or work with. These are not. Worse, they don’t really act like ‘real’ users, despite the claim. Karen Francis has a great explanation as to why Semalt is ruining your bounce rates in Google, and a couple good ways to block them.

Am I blocking them? No, not right now. Do I trust them? Not at all. They make it ‘easier’ for someone else to compare themselves to me, which is laudable, but they do it in a way that makes it harder for me to understand how my sites are doing. And that, to me, is the epitome of the goal of all black hat SEO companies. They gain at someone else’s loss.

About these ads
StudioPress Theme of the Month

Comments

  1. OMG I hate them with a passion! I had to block Semalt because they ruined my google analytics. It was a bit more difficult since my server uses nginx before apache. So I had to run this in nginx which works:

    if ($http_referer ~* (semalt.com)) { return 403; }

  2. I see semalt.com as a referrer all the time and based on other traffic I’d guess they’re referring spammers. For my WP site I use iThemes Security and have blocked #.semalt.com (because after I blocked semalt.com other referrer variations like semalt.semalt.com and 100.semalt.com and 64.semalt.com showed up. It reminds me of the telemarketing ‘Do Not Call’ list implemented in Canada a few years ago: I swear robocalls increased after I added our phone number.

  3. Hi Mika,

    Thanks for the indepth information on your post and for the pingback to the research I have done.
    I’ve used Analytics for many years and although sometimes the stats have the occasional skew in them, this is far over the top and makes some of the expectations I have for the data unusable.

    Will I be swapping to another tool for analytics? No, there is too much benefit with the tool Google supplies, I’ll do what I can to take control of the data again and ride the storm.

    I like your summary sentence “gain at other’s loss”… it’s rather appropriate in this situation.

Half-Elf? Try Half OFF WordPress ebooks!