Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • Referrer Madness

    Referrer Madness

    Everyone’s heard of Semalt by now. They are, weirdly, an actual company run by actual people, who are entirely weird and annoying.

    I should explain. I’ve talked to them in email and twitter, and I’ve read about them all over the net, like everyone else has. They’re an ‘SEO’ company who trawls the net via bots, just like Google and everyone else, tracking you and your competitors. Here’s how they explain it:

    Semalt is a professional webmaster analytics tool that opens the door to new opportunities for the market monitoring, yours and your competitors’ positions tracking and comprehensible analytics business information.

    That sounds vaguely legit when you look at it on the surface. They’re based in the Ukraine, which explains the imperfect English, and showed up right around the time Russia was invading, so most of us made Putin jokes and moved on. They’re not actually doing anything bad, they’re just acting like a regular bot, scanning your site…

    Except they’re not.

    A coral reef

    When asked, they’ll tell you that Semalt crawler bots visit websites and gather statistical data for their service, simulating real user behavior. Their crawler bots, and yes, they admit they’re bots, don’t click on advertising banners or extend links. And all the visits are automatic and random.

    This means their goal is to get a bot that acts like a human. Now I don’t know about you, but I don’t trust anything when I can’t see it’s brain, and I certainly know better than to believe in true random when it comes to software. But what gets me is how you stop the bot from scanning.

    Everyone uses a robots.txt file to block bots from scanning things they don’t need to scan. If you use WordPress and have pretty permalinks on, go to http://example.com/robots.txt and you’ll see a default file, made by WordPress, to block various folders like wp-admin from being scanned.

    Semalt ignores these. They also ignore things like bot rate limiting, and they use IPs from around the world to scan your site (arguably to get a better idea of real speed and response), so they end up acting a little like a DDoS attack. Worse, they claim to act like a ‘user’ but I never have a link to my wp-admin pages from the front of my site, which means their bot is checking for WordPress and going there not because a user would have any reason, but simply because Semalt knows WordPress is there.

    They have a form you can fill out to have it removed if you want, but we’ve been using robots.txt for years, and I simply fail to understand why they’re ‘better’ than the standard.

    Besides that, what’s the real issue here? Semalt is screwing up my stats. They’re using referrer links to check my sites out, which means I have a bunch of referral links like this:

    semalt.com/competitors_review.php?u=https://halfelf.org

    Those links tell me someone linked to me, and generally I go back and check them out to see if they’re something I want to talk to or work with. These are not. Worse, they don’t really act like ‘real’ users, despite the claim. Karen Francis has a great explanation as to why Semalt is ruining your bounce rates in Google, and a couple good ways to block them.

    Am I blocking them? No, not right now. Do I trust them? Not at all. They make it ‘easier’ for someone else to compare themselves to me, which is laudable, but they do it in a way that makes it harder for me to understand how my sites are doing. And that, to me, is the epitome of the goal of all black hat SEO companies. They gain at someone else’s loss.

  • The Revolution of .htaccess and Multiple Domains

    The Revolution of .htaccess and Multiple Domains

    I mentioned this in a post about jiggering Google and Multisite WordPress, and my buddy Jan went “WOW!” So I’ll start with the best thing ever in Apache, to me at least. Apache 2.4 allows for real if/else statements in .htaccess.

    Benjamin Franklin's glasses - because revolution jokes are funny

    In 2012, I wrote about how I did a lot of request header detection in order to make myriad blocks of checks to sort out my .htaccess anarchy. As I explained here, I have to do a check for the domain for each and every rewrite rule:

    Why did I duplicate the RewriteCond? Typically, you cannot use multiple RewriteRule statements following a single RewriteCond. That means for ever call I make to a domain, I can use but one rewrite rule. There are ways around that, but none of them worked well for me.

    It’s ugly on Apache 2.2. Since that time, however, I’ve moved to Apache 2.4, and the world is vastly different thanks to If/Else calls!

    Let’s take this old section I had from my move (about 6 years ago) from blog.ipstenu.org to ipstenu.org, as well as some permalink changes and a change to my uploads folder:

    # Ipstenu Moves
    RewriteCond %{HTTP_HOST} ^blog\.ipstenu\.org
    RewriteRule ^(.*) https://ipstenu.org/$1 [L,R=301]
    RewriteCond %{HTTP_HOST} ^ipstenu\.org
    RewriteRule ^blog/([0-9]{4})/([0-9]{2})/(.*)$ https://ipstenu.org/$1/$3 [L,R=301]
    RewriteCond %{HTTP_HOST} ^ipstenu\.org
    RewriteRule ^blog/(.*)$ https://ipstenu.org/$1 [L,R=301]
    

    Taking each block at a time, the first was easy:

    <If "%{HTTP_HOST} == 'blog.ipstenu.org'">
        RewriteRule ^(.*) https://ipstenu.org/$1 [L,R=301]
    </If>
    

    Pretty simple. If the host is blog.ipstenu.org, redirect. And it mostly worked. Except where http://blog.ipstenu.org/dsfasfsdf sent me to https://ipstenu.org/home/ipstenu/public_html/dsfasfsdf … Which isn’t good! The fix here is that you use RedirectMatch instead of RewriteRule, which gives us this!

    <If "%{HTTP_HOST} == 'blog.ipstenu.org'">
    	RedirectMatch (.*) https://ipstenu.org$1
    </If>
    

    Excellent, then on the second one which becomes this:

    <If "%{HTTP_HOST} == 'ipstenu.org' ">
    	RedirectMatch ^/blog/([0-9]{4})/([0-9]{2})/(.*) https://ipstenu.org/$1/$3
    	RedirectMatch ^/blog/(.*) https://ipstenu.org/$1
    </If>
    

    What’s the difference here? Well, RewriteRule is handled by Apache’s mod_rewrite, while Redirect is handled by mod_alias. I know I have mod_rewrite on, but I don’t know why it insists on tossing in the path statement.

    However I can assure you that these ifs work perfectly. I’m using them right here on this site. My whole .htaccess is wrapped with them.

  • Don’t iframe Me In

    Don’t iframe Me In

    I review plugins for WordPress.org, and one of my pet peeves is when I see a plugin that purports to connect your site to their service…. using an iframe.

    I have a stock reply to those:

    Having the admin dashboard be just an iframe isn’t permitted.

    We don’t permit plugins to phone home like that (for two main reasons – security and appearance – too often people assume that they just signed into WORDPRESS and not your plugin). Please change your code to use an API or just link back to your site so they can configure things there.

    The minority of the time, this is accepted, fixed, and moved on. The majority of the time, people complain that it’s ‘easier’ or ‘not confusing’ or ‘someone else is doing it.’

    If everyone on the planet would stop using someone else doing something wrong as a reason to allow them to do it, I’d be so happy… But that isn’t the point. The point is that using an iframe in a plugin is a bad idea in general, and a horrible idea for your admin panel.

    Let me step back. Like everything else, iframes are awesome to a point. They’re a great, easy, way to include content in your site without having to include a mess of code. When YouTube was new, iframes was the only way to include videos, and they looked like this:

    <iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen></iframe>
    

    You’ll notice the iframe has to specify a height and width, which means I can’t adjust my site quite as much as I want to. The other major issue here is that I’ve had to specify http, which means if my site runs https, I will have security issues. Now, there are workarounds to this (and YouTube now uses a src of //www.youtube.com/embed/oHg5SJYRHA0 to mitigate the security issue), but there’s another, cooler, aspect to how it’s all working.

    If you use WordPress, you don’t have to paste in the iframe at all, ever, because you have embeds! What the embed does is use the magic of oembeds to … well … embed! This allows WordPress to reach out to YouTube, ask how it wants to embed itself, and use either embed code or iframes or html5, or whatever else we come up with! It does this using an API (application programming interface) which let’s it talk back and forth. This same principle applies to your plugin pages. If you use an iframe, it’s a quick and dirty way to include content from your service (like a login form) on the plugin-user’s site.

    So why don’t I like people to use it? Let’s start with the login issue. The admin page could be changed to point that iframe anywhere it wants, making it easy to send you to a page that looks a lot like the ‘right’ page but isn’t. This is a lot harder to do if, instead, you have an API that securely transmits data. If someone can edit the php code of your plugin, either one is possible to be redirected, but the API details are a little harder to fake.

    Railroad bridge 'framed'

    An iframe is also a problem when you consider layout. Yes, it can make design easier in that you are in full control of the design, but you’re not in control of the rest of the site. I like the eggplant color for my admin dashboard, and if your iframe clashes with that because you don’t ‘look’ like WordPress, then it’s jarring for a user. Another worry? Adblock or Ghostery, the two most popular browser extensions ever, will often block those things.

    Finally, and for me this is the biggest one, your users won’t know where their account is. Look, I know it should be obvious when you have something that ‘makes’ an account with a service that the account belongs to the service. And yet. One of the places I help out is with WordPress.org password resets. 90% of the emails are for people trying to reset the password on their own blogs. You can’t convince me that people actually know what they’re doing anymore, if they ever did.

    So please, don’t put an iframe in your plugin as the only way I can access your admin area. It’s lazy, it’s insecure, and it’s confusing. It’s 2014. Trust me, people understand allowing their blog to connect to Twitter now.

  • Customize Network Toolbars

    Customize Network Toolbars

    This came up in the WordPress Support Forums. If you use Multisite, the WordPress toolbar (at the top of your site when logged in) has a special item called “My Sites” which shows all the sites of which you are an administrator. This is great and works as a quick jump to get to a different site really fast. It has problems, though, in that if you make a network with a bunch of sites named the same thing, it’s hard to tell which site you’re on.

    Now I know what you’re thinking! “Mika!” You say. “Mika, come on, no one has 100 sites with the same name unless they’re doing what you say is a terrible idea, and duplicating sites!!”

    Au contraire, mon frère. There are a few totally understandable reasons why this might happen. Fairly recently I was helping a school sort out Multisite, and they wanted a site for each classroom (easy) and the names of the sites would all be the same: Super Cool School – Class Frog

    And their ‘My Sites’ list was all the same.

    An example of a site list where the names aren't really readable because they're too long

    As you can see, WordPress wisely puts a practical limit on the title length, which makes sense. Now when I was faced with this problem, I remembered something that had come up in the forums, where someone wanted the ‘language’ of the site to show up in the site list, so his sites would be showing as “SiteName (en)” and so on. Since he was using the site path (en, de, etc) as the site’s slug, it was easy for him to come up with this, once I (accidentally) pointed him the right way:

    &lt;?php
    /*
    Plugin Name: Show Site Path
    Description: Show Site Path in My Sites Menu
    */
    
    function helf_customize_my_sites( $wp_admin_bar ) {
    
        $mysites = $wp_admin_bar-&gt;user-&gt;{'blogs'};
    
        foreach($mysites as $site) {
          $site-&gt;blogname .= ' (' . $site-&gt;path . ')';
        }
    }
    
    add_action('admin_bar_menu', 'helf_customize_my_sites');
    

    I say it was an accident because I did read the question wrong, but it actually gave me the answer to my schools. They too used the classroom name as the site path, so for them I changed one line:

        foreach($mysites as $site) {
          $site-&gt;blogname = '$site-&gt;path';
        }
    

    That was it. Now the sites showed up the way they wanted.

  • Mailbag: One Analytics to Bind Them

    Mailbag: One Analytics to Bind Them

    Mailbag on Monday because Angie Meeker asks:

    Do you have an article about getting Google Analytics right on Multisite, so SA can see indiv stats for each site, but also parent. Where the SAdmin owns the entire GA account (site owners don’t need their own GA account)

    There are a few ways to do this, and they’re all pretty easy.

    Google Mod_PageSpeed

    This is the ‘easiest’ way if you already have PageSpeed installed. You can put in your GA filter in the .htaccess and be done with it:

    ModPagespeedEnableFilters insert_ga
    ModPagespeedAnalyticsID <Analytics ID>
    

    That’s actually what I do here, because I’m incredibly lazy and I have Pagespeed set up on my server. I can even make this a little more special by using if statements in Apache 2.4:

    <If "$req{Host} == 'www.domain.com'">
        ModPagespeedEnableFilters insert_ga
        ModPagespeedAnalyticsID <Analytics ID>
    </If>
    

    Graph Background

    MU Plugin

    But if you’re still on Apache 2.2 or don’t want to mess with .htaccess for whatever reason, then you should try an mu-plugin, my favorite things in the world. And all you have to do is this:


    // Paste your Google Analytics code here

  • Mailbag: Translations

    Mailbag: Translations

    The second hardest thing about translations is trusting the translator.

    I sometimes joke that I barely speak English, so when someone said he translated my entire ebook about Multisite into French, I was delighted and scared. While I do kind of understand French, I’m not qualified to translate it, so having someone else do it would be a fantastic offering. But since I can’t translate it, I have no way of knowing how to gauge if they understood my meaning, which is hard enough to figure out in English.

    After a while, I decided to tell him that I’d like to see them, but I wasn’t sure if I’d want to put them up online to sell or give away. Of course he was welcome to give them away all he wanted!

    French Fries

    The problem isn’t that I trust him, or not, but that I don’t have a failsafe. With coding, I have coworkers who can spot check me. With blog posts I could use an editor, and it’s the same with books. If this was a contracted book, I’d be able to let my publisher find someone we all agree fits the bill. When you’re on your own, it’s a lot harder.

    The same goes with my plugins. I don’t actually package anything in my plugins by way of translations. The closest I have is my Varnish Plugin, which has a folder on github for people to store translations. Since they don’t have to be included in core, it’s easy enough for me to say “Use at your own risk.”

    With code, there’s a lot more you have to do in order to make your code translatable though. With my books, I just write. With my code, I have to remember to escape properly. Which I nearly never get correct the first time out. With code, you have to remember from the start to write your words in a way that can be translated, and you have to worry every time you change things that it will be broken for everyone on the next update.

    It’s chaining, really, to realize I can’t just ‘write’ in my plugin like I do on my blog.

    So what’s the question and the answer? Should you translate your work? Maybe. You should always make it translatable, but whether or not you should manage the translations is a really strange question without a perfect answer. Unless you’re fluent in two languages.

    I keep the following links bookmarked, just to keep me on track when I start editing any plugin, and I try to work backwards to fix all my old ones, but it’s really slippery.