Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: analytics

  • Monstrous Site Notes

    Monstrous Site Notes

    If you have a MonsterInsights Pro or Agency, you have access to Site Notes.

    They’re a great way to automagically connect your traffic reports to ‘things’ you’ve done on your site. The problem is that Site Notes are, by default, manual. To automate them, you need yet another plugin.

    Now to a degree, this makes sense. While the out of the box code is pretty clearcut, there’s one ‘catch’ and its categories.

    The Basic Call

    The actual code to make a note is pretty simple:

    		$note_args  = array(
    			'note'        => 'Title,
    			'author_id'   => 'author,
    			'date'        => 'date',
    			'category_id' => 1, // Where 1 is a category ID
    			'important'   => [true|false],
    		);
    
    		monsterinsights_add_site_note( $note_args );
    

    But as I mentioned, category_id is the catch. There isn’t actually an interface to know what those IDs are. The automator tools hook in and set that up for you

    Thankfully I know CLI commands and I can get a list:

    $ wp term list monsterinsights_note_category
    +---------+------------------+-----------------+-----------------+-------------+--------+-------+
    | term_id | term_taxonomy_id | name            | slug            | description | parent | count |
    +---------+------------------+-----------------+-----------------+-------------+--------+-------+
    | 850     | 850              | Blog Post       | blog-post       |             | 0      | 0     |
    | 851     | 851              | Promotion       | promotion       |             | 0      | 0     |
    | 849     | 849              | Website Updates | website-updates |             | 0      | 0     |
    +---------+------------------+-----------------+-----------------+-------------+--------+-------+
    

    But I don’t want to hardcode the IDs in.

    There are a couple ways around this, thankfully. WordPress has a function called get_term_id() which lets you search by the slug, name, or ID. Since the list of categories shows the names, I can grab them!

    A list of the categories for site notes.
    Screenshot of Site Notes Categories page.

    That means I can get the term ID like this:

     $term = get_term_by( 'name', 'Blog Post', 'monsterinsights_note_category' );
    

    Now the gotcha here? You can’t rename them or you break your code.

    Example for New Posts

    Okay so here’s how it looks for a new post:

    add_action( 'publish_post', 'create_site_note_on_post_publish', 10, 2 );
    
    function create_site_note_on_post_publish( $post_ID, $post ) {
        if ( function_exists( 'monsterinsights_add_site_note' ) ) {
            return;
        }
    
        if ( $post->post_type !== 'post' ) {
            return;
        }
    
        $post_title = $post->post_title;
        $term       = get_term_by( 'name', 'Blog Post', 'monsterinsights_note_category' );
    
        // Prepare the site note arguments
        $args = array(
            'note'        => 'New Post: ' . sanitize_text_field(   $post_title  ),
            'author_id'   => $post->post_author,
            'date'        => $post->post_date,
            'category_id' => $term->term_id,
            'important'   => false
        );
    
        monsterinsights_add_site_note( $args );
    }
    

    See? Pretty quick.

  • Referrer Spam In Adsense

    Referrer Spam In Adsense

    You may have heard of Semalt.com. I’ve heard them argue that they’re not spammers, they’re not evil, they’re not bad people.

    You know what? They are. They are spamming, they are doing evil, and they’re bad people.

    The other day I was checking my top-sites in Google Adsense, trying to think of how to increase revenue on my passive income, when I saw this random domain showing up on my list of sites. A site that wasn’t mine. A site that looked like a spammer:

    Adsense top sites shows one of Semalt's URLs

    Why is this happening?

    According to Google, this happens when a site loads cached content of your domain (Google does this). It can also happen when someone copies your whole webpage into an HTML email, or if someone uses a bad iframe.

    There’s also the obvious, but rare, case where someone uses your code without your knowledge.

    Do I need to block them?

    No. Except for the part where they screw up your analytics metrics and cause load on your server. Keep reading, I’ll explain.

    Will I Be Penalized by Google?

    My first thought was “Oh shit, Google’s going to yell at me!” I quickly checked that I had site authorization on, which means only domains I’ve approved and added can show my ads. Whew.

    This is a big deal by the way. While it would be nice to earn more views, if a site that isn’t mine uses my ads without knowing, I can get in trouble. More than once I’ve told off plugin developers about using Adsense in their plugins. This is for a couple reasons, first is that you can use it to track who uses your plugin (bad), but also because Google doesn’t want you to. They outright say that you cannot put ads “on any non-content-based page.” An admin dashboard is not a content page. Done and done. No ads in your plugins, thank you.

    But that’s exactly why I was worried!

    Where is Semalt showing my ads?

    What is this URL for anyway?

    The URL was http://keywords-monitoring-your-success.com/try.php?u=http%3A%2F%2Fexample.com (not my real URL). The only reason I could find it was I dug into my Google stats and found it as a referrer. If you happen to pop that into a browser, you will be redirected to http://semalt.com/ — Real nice.

    That is, by the way, how I knew it was Semalt.

    What is Semalt?

    Semalt is a professional SEO and marketing service. They literally make their money ‘crawling’ websites. When their site started, it was really the scamiest looking thing I’d seen in a long time. A year and a half later, they’ve cleaned up their act a bit but back in 2014 we all looked at them with a massive Spock eye.

    As it turned out, they were using infected computers to scan the web. My personal guess was that they are leveraging hacked computers and using them to scan for vulnerable websites. Once they find a site, they hack it and use it to push malware.

    That’s a guess. I have no proof. But based on their search patters and behavior, it’s looking pretty likely to me.

    Can I block them?

    Yes! But there’s a catch.

    You see, everyone says you can do this:

    # Block visits from semalt.com 
    RewriteEngine on 
    RewriteCond %{HTTP_REFERER} ^http://([^.]+\.)*semalt\.com [NC]
    RewriteRule .* - [F]
    

    And while that works, it’s obvious that Semalt is on to us because now they use keywords-monitoring-your-success.com and other URLs as passthroughs.

    How do I get them out of my analytics?

    Do you use WordPress.com? Or Jetpack? Great! Report the referrer as spam! WordPress.com blocked Semalt back in 2014, but obviously they’re on the rise again.

    If you’re using Google Analytics, Referrer Spam Blocker is probably your best bet.

  • Referrer Madness

    Referrer Madness

    Everyone’s heard of Semalt by now. They are, weirdly, an actual company run by actual people, who are entirely weird and annoying.

    I should explain. I’ve talked to them in email and twitter, and I’ve read about them all over the net, like everyone else has. They’re an ‘SEO’ company who trawls the net via bots, just like Google and everyone else, tracking you and your competitors. Here’s how they explain it:

    Semalt is a professional webmaster analytics tool that opens the door to new opportunities for the market monitoring, yours and your competitors’ positions tracking and comprehensible analytics business information.

    That sounds vaguely legit when you look at it on the surface. They’re based in the Ukraine, which explains the imperfect English, and showed up right around the time Russia was invading, so most of us made Putin jokes and moved on. They’re not actually doing anything bad, they’re just acting like a regular bot, scanning your site…

    Except they’re not.

    A coral reef

    When asked, they’ll tell you that Semalt crawler bots visit websites and gather statistical data for their service, simulating real user behavior. Their crawler bots, and yes, they admit they’re bots, don’t click on advertising banners or extend links. And all the visits are automatic and random.

    This means their goal is to get a bot that acts like a human. Now I don’t know about you, but I don’t trust anything when I can’t see it’s brain, and I certainly know better than to believe in true random when it comes to software. But what gets me is how you stop the bot from scanning.

    Everyone uses a robots.txt file to block bots from scanning things they don’t need to scan. If you use WordPress and have pretty permalinks on, go to http://example.com/robots.txt and you’ll see a default file, made by WordPress, to block various folders like wp-admin from being scanned.

    Semalt ignores these. They also ignore things like bot rate limiting, and they use IPs from around the world to scan your site (arguably to get a better idea of real speed and response), so they end up acting a little like a DDoS attack. Worse, they claim to act like a ‘user’ but I never have a link to my wp-admin pages from the front of my site, which means their bot is checking for WordPress and going there not because a user would have any reason, but simply because Semalt knows WordPress is there.

    They have a form you can fill out to have it removed if you want, but we’ve been using robots.txt for years, and I simply fail to understand why they’re ‘better’ than the standard.

    Besides that, what’s the real issue here? Semalt is screwing up my stats. They’re using referrer links to check my sites out, which means I have a bunch of referral links like this:

    semalt.com/competitors_review.php?u=https://halfelf.org

    Those links tell me someone linked to me, and generally I go back and check them out to see if they’re something I want to talk to or work with. These are not. Worse, they don’t really act like ‘real’ users, despite the claim. Karen Francis has a great explanation as to why Semalt is ruining your bounce rates in Google, and a couple good ways to block them.

    Am I blocking them? No, not right now. Do I trust them? Not at all. They make it ‘easier’ for someone else to compare themselves to me, which is laudable, but they do it in a way that makes it harder for me to understand how my sites are doing. And that, to me, is the epitome of the goal of all black hat SEO companies. They gain at someone else’s loss.

  • Mailbag: One Analytics to Bind Them

    Mailbag: One Analytics to Bind Them

    Mailbag on Monday because Angie Meeker asks:

    Do you have an article about getting Google Analytics right on Multisite, so SA can see indiv stats for each site, but also parent. Where the SAdmin owns the entire GA account (site owners don’t need their own GA account)

    There are a few ways to do this, and they’re all pretty easy.

    Google Mod_PageSpeed

    This is the ‘easiest’ way if you already have PageSpeed installed. You can put in your GA filter in the .htaccess and be done with it:

    ModPagespeedEnableFilters insert_ga
    ModPagespeedAnalyticsID <Analytics ID>
    

    That’s actually what I do here, because I’m incredibly lazy and I have Pagespeed set up on my server. I can even make this a little more special by using if statements in Apache 2.4:

    <If "$req{Host} == 'www.domain.com'">
        ModPagespeedEnableFilters insert_ga
        ModPagespeedAnalyticsID <Analytics ID>
    </If>
    

    Graph Background

    MU Plugin

    But if you’re still on Apache 2.2 or don’t want to mess with .htaccess for whatever reason, then you should try an mu-plugin, my favorite things in the world. And all you have to do is this:


    // Paste your Google Analytics code here

  • Who’s Following Me Anyway

    Who’s Following Me Anyway

    There are two basic ways to deal with tracking people who follow you on the internet: don’t do it or overdo it.

    In a recent post, Brian Gardner talked about embracing his unsubscribers, as he noticed a number of people un-subbing after he posted a personal post on his personal site. Since then, my friends and followers have asked me about how I feel, and I’ve had to explain that my basic philosophy of ‘tracking’ followers is this: I don’t.

    Now this doesn’t mean I don’t keep tabs on metrics and browsers and the like, to know how to appeal to my readers, but it means the raw numbers, like how many people read a post, are by in large ignored. Except sometimes they’re not. At that point, I agreed that my methodology was complicated, and needed a blog post. So here’s when I do and when I don’t and why I do or don’t as needed.

    It’s Personal

    messing with your statisticsFor the most part, I don’t track visitors on personal sites. I don’t track metrics. I couldn’t tell you who follows my blog on ipstenu.org, and I don’t really care. It’s my personal blog where I’ve decided to write for me, so if I track anything at all, it’s what browsers. A lot of people read the site on an iPhone? Okay, better have a good theme for that! One person is still using Netscape? Forget about ’em.(Sorry Mr. Netscape. It’s 2013, the Internet called and wants you to upgrade.) I never pay attention to the number of my followers on my personal social media accounts. Facebook, Twitter, whatever. I know who I’m following. When you stop being entertaining/interesting/enjoyable, I unfollow. It’s all just me being me for me. You’re welcome to read along, but it’s a personal site for personal people.

    It’s Technical

    Okay, so what about my professional sites? Well, I do and I don’t follow along, depending on how professional the site is. Take this site, for example. While this is certainly my more professional site (I initially split it out because my family reads the main blog and didn’t care about tech babble), I don’t have a dedicated Twitter account for it, or a Facebook fan page. It’s just another aspect of me. I do track metrics here, though. It matters a little more when I’m presenting content for education. I want to make sure everyone can read the site, get the data they need, and move on. And I do keep tabs on my subscription numbers a little, but I don’t actively watch who signs up and who leaves. The way I figure it, if you find the information valuable, you read.

    The only times I’ve ever actually noticed traffic here was when Matt Mullenweg linked here and I got a massive uptick of rabid folks pissed off that I’d used the naming of Constantinople as a metaphor (you’re welcome for the earworm), and when Ars Technica linked to my posts about stopping the botnets with mod_security or with .htaccess.

    This does not mean I haven’t noticed the increase in visitors from tens a day to a hundred and beyond. It just means that since I’m not trying to making a living from this site, it’s not something I dwell on very much. Every time I have to write an article about stats, that means I have to sit and study them here, because I’m just not tracking.

    It’s Professional

    But that really wasn’t me being ‘professional.’ What about my site where I have a custom Facebook page, a Tumblr, a dedicated Twitter account, and the whole nine yards? Oh yes, I track. I check analytics to see entry and exit pages, and I even have conversion goals. I notice my bounce rate, traffic flow, and all of those things. For work, yes, I monitor all these things, talk to marketing and sales about how to improve those things, write code to make things serve up faster and better. How did our campaigns go? What should be targeting?

    Lies, Damn Lies and StatisticsMost importantly here, I try to understand the data I’m getting. We’re really good at collecting data these days, but we’re pretty crap at understanding it and using it to our benefit. How often have you seen A/B testing result in flawed assumptions? It’s not easy understanding what to do with the data. It’s not something you can do quickly, and most of us can use metrics and analysis to prove the point we want to make.

    This is hard. It’s really hard and worst of all, how much weight you put in everything depends on who your audience is. How hard? Well there is a science in the testing but not many people use it right.

    It all depends

    And that’s really my point here. It all depends on what your goal is. Who are your readers and who are you writing for (they may not be the same)? Also who do you want to write for?

    Everything comes down to having a goal, knowing what you want to do, and doing it. And there is absolutely nothing wrong with just writing for yourself.

  • Collecting Conflicting Stats

    Collecting Conflicting Stats

    StatisticsWhile, like many people, I use Google Analytics, I don’t really trust it’s parsing. I do use mod_pagespeed which lets me auto-embed my GA code in every page without plugins or extra work on my part, which is great, but the results are questionable and often wildly disparate and conflicting.

    Let me demonstrate:

    Google AWStats Webalizer
    Page views 2,607 10,354 8,502
    Hits 49,830 59,542
    Visits 888 1,274 2,255

    First of all, I can’t find ‘hits’ anywhere on Google. Their layout is different and changes regularly. Secondly, and I’m sure this jumps out at you, according to AWStats and Webalizer, I’m getting 4 to 5 times the pageviews compared to Google. I previously configured AWStats and Webalizer to exclude wp-admin and other ‘back end’ pages by editing the configuration files. I did the same in my .htaccess for PageSpeed, so I know no one is tracking admin pages.

    I already know that AWStats errs on the site of users, so if it can’t tell something is a bot, it assumes it’s a user. I also know it tends to overcount, since it bases its counts on traffic in a way that is a little generous (a 60 minute count for a visit). Not a huge deal, but enough to say that yes, the 10k pageview is probably closer to the 9 or 8 of Webalizer. Speaking of Webalizer, it uses a 30 minute count, so there it skews higher. Fine, let’s be harsh and halve them.

    That gives me 4000-ish pageviews. Google gave it 2600-ish.

    Interestingly, Google gives a 30 minute visit count too, but it also uses cookies and javascript, which while fairly safe, doesn’t run on everyone’s browser. As an amusing side-bar, when I switched from using a plugin or manually injecting Google Analytics into my sites and started using mod_pagespeed’s insertion, my results went up. Noticeably. In part this is attributed to the fact that my site is having higher traffic than normal, but when I compared it to WordPress Stats, it was a bigger than expected jump.(I’m not using WordPress’s Stats ala Jetpack in this experiment because it only counts WordPress pages, and the site I’m using is not just WP. However on a pure WP site, WP’s stats tend to skew higher than GA.)

    Which one is right? Most people will say Google is ‘closer to the truth’ but I don’t know how much I can rely on that. Certainly it’s more true for how many actual people are visiting my site, and when I’m judging metrics for marketing, I’m a little more inclined to use Google. That said, if I’m trying to understand why my page speed is slow, or where I’m getting hammered with traffic, AWStats and Webalizer are far more accurate, since they’re counting everything.

    Data that can, and cannot, be measured
    From “Manga Guide to Statistics,” Shin Takahashi, 2008
    Right now, I’m keeping Google Analytics on my sites. I don’t really need the measurements for marketing (that would involve doing marketing), but there are better social engagement stats provided that make it helpful. Like of all the social media sites, Facebook and Twitter are tied for traffic, and Google Plus is only high scored on my tech blog. I think that if Google let us auto publish to Google+, those stats would change, but for now, it’s all manual.

    This is not to say that I think auto-posting is great for social engagement, but I find I actual pay attention more to the social aspect of the media if I don’t have to remember to post all over the place. This is a massive shift since October 2011, when I’d stopped auto-posting for SEO reasons. Why did I change my stance? Well it because easier to autopost and keep that personal touch with Jetpack’s Publicize feature. Now I can easily insert a custom message, and I know it’s going to (mostly) use my excerpt.(For some reason Tumblr is a moron about this) That saves me effort and allows me to spend more time actually interacting!

    Auto-generating my stats with little effort, and being able to easily read them without needing a degree in SEO (no they don’t exist) is also hugely important. Google Analytics is easy to read, but curiously I find it overly complicated to understand. The different pages and layouts make it surprisingly hard to find ‘What were my stats for yesterday?’ Sometimes I have a boom in traffic on one day (like the day I had a 600% increase) and I want to see what went on and why. Where was this traffic coming from? WordPress’s stats do this amazingly well, just as an example.

    No one tool provides all the data I need to measure all aspects of my site, nor does anyone one tool collect all the data. Google tells me more about browser size, screen resolution, and everything it can grab about the user, where AWStats and Webalizer give me more information about traffic by showing me everything, bots and humans. Basically server tools are great for collecting server stats, and webpage tools are great for user stats. But you need both.

    So in the end, I have at least four different statistic programs I check on, regularly, to try and understand my traffic and measure success.