Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • Blocking Referrer Spam Server Wide Sucks

    Blocking Referrer Spam Server Wide Sucks

    A while back I talked about Referrer Spam in Google Adsense and I mentioned how you could block referrer spam with some .htaccess calls. That’s cool, but when you have 12 sites on a server, making this one more thing to manage per site is a pain in the ass. Well okay, what can we do constructively? And sadly the answer is “Not much.”

    First of all, forget the idea of using a robots.txt file. If they were real seo crawlers, they would honor this. They don’t and that’s how I know they’re evil.

    Secondly, this will only work if you have server wide access. That should be obvious, but server wide settings need server wide access, and that’s just how it is. I say that it sucks because it can be a little complicated and messy to understand where you put things.

    If you have your own server, like I do, then you can make a custom VirtualHost template

    Since I’m using Apache 2.4, I made local templates:

    $ cd /var/cpanel/templates/apache2_4/
    $ cp ssl_vhost.default ssl_vhost.local
    $ cp vhost.default vhost.local
    

    If you’re using 2.2 then the files are in /var/cpanel/templates/apache2_2/ instead. In each file, I added this to the top of the VirtualHost settings.

      RewriteEngine On
      RewriteOptions Inherit
    

    What that does is it tells Apache that it should inherit Rewrite rules from the main server. That means each virtual host (i.e. each website) will abide by any rules in the local settings.

    Where you put this in that file can be weird. I ended up looking for this section, and putting it right below:

    [% IF !vhost.hascgi -%]
      Options -ExecCGI -Includes
      RemoveHandler cgi-script .cgi .pl .plx .ppl .perl
    [% END -%]
    

    Put that in both files. Because you use HTTPS, right? Then you need to bounce httpdconf:

    /scripts/rebuildhttpdconf

    Since I’m using WHM, the next step is to go in to the Apache Configuration section and open the Include Editor. Then you want to add your blocking directive in ‘Pre-Virtual Host Include’ for All versions. If you don’t use it, you’ll want to edit /usr/local/apache/conf/includes/pre_virtualhost_global.conf and bounce Apache after.

    As you can see, I have some content in there already.

    My pre_virtualhost_global.conf file

    I added this below:

    <IfModule mod_rewrite.c>
      RewriteEngine on
      RewriteCond %{HTTP_REFERER} spammerseocompany\.com [NC,OR]
      RewriteCond %{HTTP_REFERER} keywords-monitoring-your-success\.com [NC]
      RewriteRule .* - [F,L]
    </IfModule>
    

    Does it work? Yes. It blocks ‘spammerseocompany’ from all my domains on my server. I put in the other URL since that’s the one they have that’s currently spamming the heck out of my stuff. There are a lot of other options with Apache 2.4, like sending them a 403 and so on. You should read up on using mod_rewrite to control access and pick the method you find most sustainable. For example, you could single like it:

      RewriteCond %{HTTP_REFERER} (spammerseocompany|keywords-monitoring-your-success)\.com [NC]
    

    I find that a bit clunky.

    If you’re using nginx, you’ll want this I believe:

    if ($http_referer ~* "keywords-monitoring-your-success\.com|spammerseocompany\.com") {
        return 403;
    }
    

    A big note of caution here. If your list gets too long, you’ll end up slowing your server down. A lot. So keep it as simple as you can. I find that CSF does a dandy job of blocking the most of my trouble makers, and I only need this for the unnamed spammerseocompany because they don’t abide by the common rules of robots.

    If, one day, they do, I will stop blocking them and allow their robots. As it stands, they’re idiots and need to go away.

  • How To Pick Your Webhost

    How To Pick Your Webhost

    This is not a real conversation, except it totally is.

    User: I want hosting.
    Me: What kind of site do you want to host?
    User: A WordPress site!
    Me: What kind of content do you plan on writing?
    User: Oh you know, blog stuff.
    Me: Okay… A food blog, a photo blog, a tech blog…?
    User: ​Why are you asking me all this!?!?!

    I’ve had so many conversations like this, I’m of the opinion that recommending hosting is a mugs game that simply cannot be ‘won’ so I generally don’t play.

    Then why am I presuming I can tell you how to pick a webhost? Because I’m telling you how to pick a webhost, not who the best webhost is.

    Preface

    Someone will hate every single webhost on the planet. I use Liquidweb and DreamHost. People hate both of those. There’s the bevy of EIG companies whom people will detest and lambast and accuse of shady actions to be listed somewhere. There are the millions of small companies. There are good and bad companies, and there are reasons to use them. Whenever someone asks what host to use, I remind them that someone will hate their choice. That’s okay, just don’t take it personally and I recommend you ignore people who simply jump on bandwagons to tell you “X SUCKS!” They’re not being helpful.

    Needs vs Wants

    I repeat this a lot in myriad situations. Your needs are what your website needs, which should be obvious. If you’re running WordPress you need a webserver than runs a modern version of PHP and a MySQL (or MariaDB) database. That’s it. But that isn’t all of what you need for a website, and to understand your needs you need to be very clear about your own abilities, your capabilities, and the time you’re willing to commit to your project. Running a website is very time consuming and stressful. You can’t just set it and forget it.

    Who Are You?

    We should all know who we are, what our skills are, and what we enjoy doing. I’m like playing with servers and code. My wife prefers practical experimentation (she makes cheese and mead). My father is a mathematician. We’re all writers of a sort, but of the three of us I’m the one who runs the website and puts of articles on the regular. This is not because the others can’t, but because they know who they are. My father sends me his articles to post, my wife posts her own, and I both write my own for me but for my company, and I maintain the servers (and email). If you’re not me, and don’t have a me, you need a me. That may be your host, and it may not.

    How Do You Communicate Best?

    Do you get anxiety with phone calls? Look for a company with live chat and email support. Do you hate live chat? Are you dyslexic? Look for phone support. You know how you like to communicate with strangers, so pick a host that has what you need. I personally prefer ticket based systems, unless my server is actually on fire. That hasn’t happened much.

    How Do You Add People?

    Let’s say you decide to hire someone to work on your website. Do they need access to the server? Do they need access to your billing? How do you do that without giving them your passwords? Find out how the host handles this. Can you simply add a technical contact or will there be more complicated steps?

    What Is Your Site About?

    Why does this matter? Well, think of it this way. “I want to make a community site where people from my city can come and post news, events, crimes, etc.” Did you just think about BuddyPress? You will likely need a bigger server than Shared. “I want a photoblog!” Okay you will need to seriously look at diskspace, which means SSDs may be a little tricky for you since most limit space. Check if the host allows easy upgrades. “I’m going to run a multisite network for my school!” You need a private server. Knowing what your site is about will help you predict upcoming hurdles.

    Do You Know Any Metrics?

    Most people, especially people with a brand new site, are going to say “No!” here and that’s okay. But if you do know things like how much traffic you get or how often you post or how much disk space you use, talk to the host about it. Pre-sales questions like “What’s the best hosting plan for a site that gets 2000 visits a day, and then 12k on one day a week?” are the bread and butter of a host. If they can’t answer it, move on.

    Does The Host Make You Feel Good?

    If you get a bad feeling from the host at any step along the way, you feel like they’re dismissive or maybe not a good fit, walk away. Look, you need to be comfortable with your host, and if the advertising practices of a host upset you, don’t use them. It’s that simple. Even if they’re the best for your needs, if they make you uncomfortable, you will be miserable. And remember, for every single host there will be people who hate them. That’s okay too. If it works for you, and you feel good about doing business with them, then that is really all that matters.

  • A Simplier Hugo Deploy

    A Simplier Hugo Deploy

    I have a Hugo site that I’ve been deploying by running Hugo on the server. But this isn’t the only way about it.

    If you use Git and it’s on the same server as your site, and owned by the same user, it’s remarkably easy to do this.

    First make sure the public folder in your Hugo repository is being tracked. Yes, this can make your repository a little large but that’s not something to worry about too much. Space is cheap, or it should be. Next make a folder in tmp – I called mine library – to store the Git output in.

    The new post-update code then looks like this:

    #!/bin/sh
    
    SRC_DIR=$HOME/tmp/library/public/
    DST_DIR=$HOME/public_html/library/
    
    export GIT_WORK_TREE=$HOME/tmp/library/
    git checkout -f
    
    rsync -a --delete $SRC_DIR $DST_DIR 
    
    exit
    

    What this does is checkout the Git repository and then copy it over. The format of the sync will delete anything not found. Done.

    The benefit of this method is that you don’t need to install GoLang or Hugo on your server, and everything is pure and simple Git and copy. Rsync is a delightful way to copy everything over as well. You can delete the temp folder when you’re done, but the checkout process handles things for you. Another nice trick is you can specify what branch to checkout, so if you have a special one for publishing, just use that.

    But could this be even easier? Yes and no. You see, what I’m going is checking out the whole thing and then copying over folders. What if I could tell Git to just checkout the code in that one folder?

    There’s a think called a ‘sparse checkout’ where in I can tell Git “Only checkout this folder.” Then all I have to do is go into that folder and checkout the content I wanted. The problem there is it literally checked out the folder ‘public’ and what I wanted was the content of the public folder. Which means while it’s ‘easier’ in that I’ve only checked out the code I need, I can’t just checkout it out into where I want. I will always have to have a little extra move.

    To set up my folder, I did this:

    cd ~/tmp/library/
    git init
    git remote add -f origin ~/repositories/library.git
    git config core.sparsecheckout true
    echo public/ >> .git/info/sparse-checkout
    git checkout master
    

    And then my script remains the same. But! This is going to be a faster checkout since it’s only ever going to be exporting and seeing the folders it needs.

  • Why I Don’t Use Git Flow Anymore

    Why I Don’t Use Git Flow Anymore

    Please don’t get me wrong. I love git-flow. I think it’s great. But it was great to teach me how to use git. It taught me not to use master for my development, and how to make branches and all that. Git Flow got be in the habit of doing good things and testing and showed me how to work with multiple projects. It was a great crutch to get comfortable with the ideas of Git that (for a long time) confounded me.

    But I don’t need it anymore. Instead, I do things very, very simply and my flow is as follows.

    $ git checkout master ; git pull

    I always start by assuming I’ve forgotten something and need to sync up. This works for me, since I run on two computers.

    $ git checkout NewProject

    Once I’m in the new project, I start making all my edits, add my code, etc. Now here’s where I get a little silly. If I’m working on my own stuff, it’s Coda, always, so I’ll constantly ‘commit all changes’ and fill in my commit messages and then cancel out. I do this over and over until I’ve reached a point where I think “This code is ready to be tested.” Then I commit for real.

    This means my commit logs look like this:

    Convert Font Icon to SVGs
    
     - Add new images for social media
     - Optimize CSS for pagespeed
     - Remove unused function.php file
     - Add shortcodes
    
    Fixes #1234
    

    There are other ways to do this, of course. I’m a huge proponent of keeping change logs but a commit message should be useful too.

    It’s too easy to put in this: git commit -m "Adding new icons"

    While it’s more time consuming, just use git commit and put in a good message like I did up at the top. Now, this is not new. A hundred people have all said this before, but it bears repeating.

    • The first line is your subject, keep it to 50 characters.
    • Capitalize the subject line but don’t use a period
    • Use the imperative mode – “Add new icons” and not “Adding new icons”
    • Leave an empty line between subject and body
    • Explain what you did in the body, keeping lines to 72 characters
    • Bullet points are okay – use a space before a hyphen for best compatibility
    • Reference any issues at the bottom – “Fixes: #123” or “See Also: #456 #789”

    If, like me, you commit and then, before merge, realize you have changes, use git commit --amend to add your new changes to the existing commit.

  • Encrypting Source Code Doesn’t Make It Safer

    Encrypting Source Code Doesn’t Make It Safer

    I’d love to think that’s all I have to say on the matter, that you all will read the subject, go “Yup!” and we’re done.

    The reality is that I have to argue this, regularly, with people.

    Here’s the code from a plugin out there:

    <?php ${"\x47L\x4fB\x41\x4c\x53"}["w\x73\x78\x6e\x69\x66\x69\x6f\x71\x6c"]="\x73l_s\x65arch\x61bl\x65\x5fc\x6f\x6cu\x6d\x6e\x73";${"\x47L\x4fBAL\x53"}["\x66\x6b\x78xg\x63\x6ap\x68\x6d\x6ft"]="\x73\x6c\x5fdb";${"\x47\x4c\x4f\x42AL\x53"}["\x65\x62\x67\x79\x6b\x66\x64"]="\x69\x73\x5f\x73\x6c\x5fca\x74e\x67\x6f\x72\x69\x7a\x61\x74\x69\x6fn_\x63\x6f\x6c\x75\x6dn";${"\x47\x4c\x4fBA\x4c\x53"}
    

    The whole file is like that. The developer explained it was done that way for ‘security’ — it would make things harder to hack. I pointed out that’s simply not true.

    Here’s what having encrypted, hashed, packed code does:

    1. It makes your build process take longer.
    2. It adds another failure point into your code.
    3. It makes it harder for the end users, other developers (who write plugins), web hosts to debug, and you to debug.
    4. It makes you look like a developer with evil intents.
    5. It sets an expectation with users that this kind of code is ‘normal’ in WordPress.

    Recently Sucuri posted about a redirect hack that works by putting junk code in your header.php file which looks rather similar:

    Malicious injection in your header.php

    The issue here is that an end user, your normal WordPress user, cannot tell the difference between the somewhat safe code I quoted before and this code. They see ‘gibberish’ where as I know they can use a hex decoder to translate ["w\x73\x78\x6e\x69\x66\x69\x6f\x71\x6c"] into ["wsxnifioql"] … which is still pretty terrible.

    Well written code, well named functions, are self-explanatory. You see a function called redirect_404_pages() and you have a pretty good idea of what it’s for. You see a function named wsxnifioql() and good luck knowing what the heck that’s for. This goes back to the claim that the code is more secure. It’s not. It’s needlessly complicated, and as I shoed with the hex decoder tool, it can trivially be decrypted and read.

    So what is the real point of hiding your code? Who are you trying to protect? What’s ‘safer’ about any of this?

    The answer is that it’s about about you, you, you. You don’t want someone to take your great idea.

    That’s it. And that’s foolish.

    WordPress is GPLv2 (or later). Furthermore, to be hosted on WordPress.org, your code cannot be encrypted or hidden or otherwise non-human-readable. The basic reason is that WordPress’ success is due to it’s understandability and extendability. Anyone can read WordPress’ core code, parse it, learn from it, and enhance it. When you take that away from users, you isolate your code and prevent people from extending it.

    This person, this developer, charges upwards of $1000 for the add ons to their code. Yes, a plugin that costs over a grand. It sounds economically sound to try and lock things down so people don’t steal their intellectual property. We can all understand that impetus. I support it. I also feel that part of being in an open source community is being aware of how your actions impact the world at large.

    Because WordPress is open and because there is a standard expectation of non-encrypted code (except by evil-doers), the burden moves to developers to not hide their code that is installed on users’ servers. The code that is deployed to an end-user is expected to be human readable. This comes at a risk. I have a copy of a theme I bought, and I could give it away to anyone I wanted. They may not get updates, which means I have to be aware of the risk I’m introducing to my friends when I give them something like a premium theme or plugin.

    Similarly, what are the risks of telling people it’s okay to install plugin code in uploads instead of the plugins folder? What are the risks of allowing people to think that encrypted code is generally okay? In and of themselves, neither action seems particularly dangerous. PHP code is PHP code, right? If it runs, you’re good. But the reality is not so. By installing code in uploads I’ve made it so it’s no longer fully protected by WordPress and ‘standard’ security practices. I’ve also made it riskier that my code would even run, since many hosts prevent executable code from running out of that folder for security.

    So how do I meet the (assumed) criteria of not having someone rip off my code?

    You don’t. Your machinations aren’t preventing it now, and they won’t prevent it tomorrow. Hexcode is easily parsed. Even the Zend framework has to be able to be reversed to be run, so a dedicated person will always find a way around it. And the majority of your users aren’t going to be the problem. It’s those extremes. So what you’ve done is wasted time, effort, and money to annoy the majority to stop the minority. Let people inspect your code. If someone steals it, there are laws to help you handle them. Use them. Theft is theft. The GPL may allow them to take your code, copy and expand on it, but it doesn’t let them violate your copyright.

    All the work you’re doing to hiding your code is about as useful as preventing right-click on images. It doesn’t protect the end users, and it doesn’t protect your intellectual property.

  • PSA: DreamObjects URL Changes

    PSA: DreamObjects URL Changes

    If you use my DreamObjects plugins, don’t worry, I’ll have them updated before September.

    What Changed?

    As part of ongoing service improvements, DreamHost made a subtle, but critical, change to how everyone accesses DreamObjects. Specifically, they changed the DreamObjects hostname to prepare for upcoming enhancements.

    Old hostname: objects.dreamhost.com
    New hostname: objects-us-east-1.dream.io

    I’m a developer who uses DreamObjects. What do I need to do?

    If you were ever using the URL objects.dreamhost.com in your site, you need to change it to objects-us-east-1.dream.io and everything will be awesome.

    Is it really that simple?

    Not always. For some plugins and code, it is that simple. For my backup plugin, I added in a new option:

    if ( !get_option('dh-do-hostname')) {update_option( 'dh-do-hostname', 'objects-us-east-1.dream.io' );}

    This way I can later make it a dropdown for people to select as they want.

    But for my CDN plugin there’s a major issue. You see, the way it works is that it updates the URLs of images within your posts. It, literally, edits your post content. And that means a change has to go back and fix all the URLs. I had to write some fairly weird code to do this. I’m still testing it fully, and I think it’ll do everything I need, except it’s not going to perfect.

    Right now it does it’s utmost best to fix any URLs on a site, however it will only fix the ones for images it can detect are on DreamSpeed. I am aware that some people with a phenomenal number of images manually copied their uploads folder to DreamObjects and ran a command like this:

    wp search-replace 'example.com/wp-content/uploads' 'objects.dreamhost.com/bucketname/wp-content/uploads'

    Or maybe they used the InterconnectDB script or another plugin. Those people you’re going to have to watch out for and inform in a useful way as to how to fix it.

    I’m an end user. Do I need to care?

    Yes. You do. Do not try to fix it on your own just yet. Not until the plugins you use have updated and said they’ve corrected the issue. Then read the FAQ or any alerts to make sure you don’t need to do anything else. If you’re not using DreamObjects as a CDN, this should be pretty painless.

    If you did, or if you moved it all manually, you will have to do it again before September 5th, 2016 or all images will break.

    Thankfully, if you’re on DreamHost, you can run the following command:

    wp search-replace objects.dreamhost.com objects-us-east-1.dream.io

    That will upgrade everything for you. If you’re not, you’ll need to use the search/replace tool of your choice.

    Again: WAIT until any plugin you use for this has been updated. I personally contacted everyone in the WordPress.org repository and informed them about it, but since I know that’s not perfect, I’m doing this too.

    This sucks!

    I know. It does. But I don’t see a better way around it.

    When will your plugins be updated?

    Before the end of May. I just want to test it on as many large sites as I can find.