Half-Elf on Tech

Thoughts From a Professional Lesbian

Category: How It Works

  • Genericons Neue

    Genericons Neue

    In 2013 I made a silly little plugin called Genericon’d which let you include Genericons on your site in a theme independent way, complete with shortcodes and flexibility for other plugins and themes that might be using it. In 2016, Generico became Genericons Neue.

    The changes were small but huge:

    1. SVG instead of font icons
    2. No more social icons

    The problem I faced was equally small but huge:

    1. How to seamlessly transition from font icons to SVGs
    2. How to handle social!!?!?!

    Thankfully Automattic actually did the hardest work for me, with Social Logos. I can’t design logos. I didn’t want to abandon people. So for me, to be able to just include a second library in the plugin was a fast and easy fix.

    The long and drawn out one was how to make the plugin magically transition. It took me a month, fiddling with it off and on, but as of version 4.0, Genericon’d defaults to using modern SVGs instead of fonts and combines the Genericon Neue icon pack as well as Social Logos to ensure your old code keeps working. If SVGs won’t work for your site, you can either use classic Genericons or the legacy font packs.

    Genericon'd default settings

    Yeah, I gave everyone ‘options’ while still making default decisions. For the most part, no one needs the old legacy stuff unless they’re supporting IE, so this should work right out of the box for everyone, new and upgrades. My only ‘beef’ is that Social Logos doesn’t have a release strategy, so I’m going to have to randomly check for updates.

    A lot of the work I did to figure this out was just testing variations. I knew that by default I wanted everyone to use the minified, super fast SVG sprites, and by default you do. There are hidden options that would let you use the slower images, but I didn’t build out that interface because of the annoying complexity with setting up “if you have Genericons Neue, make sure you don’t have Genericons Classic!” That was a surprisingly large amount of ifs and elses to make it logically flow. I wanted to have it magically flip things over for you, but in the end I went with an alert if the plugin is active and you haven’t selected things.

    You can also make your load even lighter by not including the social icons, but one thing that’s nice about SVGs over Font Icons is that if you’re not using them, there’s no extra load on the site.

  • OnPage is On Fleek

    OnPage is On Fleek

    I’ve gotten back into using Yoast SEO (I like to say, I don’t always use SEO plugins, but when I do I use Yoast), and I noticed they’d added integration with OnPage.org. Of course I did remember this being added to the plugin in general, having chatted about it with Yoast and his crew, but at the time it really didn’t seem like anything I needed.

    Flash forward, and it’s 2016 and while I certainly can do all the things it takes to set up a site and make it’s SEO sing, a person can always learn more. So I set up an OnPage account and went to town.

    Actually I stared at it.

    You know how people say the WordPress dashboard can be overwhelming when you first look at it? Yeah, so was this. There were a lot of options. A lot of options. And a lot were sales things. Then there were a hundred pages of ‘metrics’ which I confess I didn’t understand what it fully meant. While I was still capable of churning out page after page of valuable content, and having context be my SEO queen, some of the nuances of what made pages ‘good’ to search engines was clearly being lost to me.

    Like I said, a person can always learn more, so after feeling dejected at myself, I decided to sit and study up. How was I going to master all this and understand how to make my site better and the content more search engine friendly? I knew once I got people to the site, the content would be useful, but getting them there was the nature of advertising (something difficult to target when no one aims ads for lesbians) and word of mouth.

    Thankfully OnPage has a tool Navigator, and once I started using it everything became clear. It walked me through the basics of setting up my site. It detected I used WordPress, it directed me to install an SEO plugin (Yoast of course) and then gave me step by step tasks to get those simple, basic wins.

    The first 18 tasks were things I’d already done, like hooking up Google Analytics, signing up for Google and Bing Webmasters, submitting my sitemaps, etc etc.

    OnPage Navigator makes me feel successful

    And then I hit the section about Keyword Pages.

    Create Keyword Pages

    That confused me since I knew the page that the keyword “kima greggs” should point to is https://lezwatchtv.com/characters/kima-greggs/ (pretty obvious when you see it, right?). I clicked on the link to complete the task and was surprised.

    Define a subpage that should be ranked for the keyword kima greggs:

    The bottom part of the page was a list of pages on my site, and none of them was the right URL. Oh dear. I entered it in manually and hit save. That presented me with new tasks:

    • Add the keyword kima greggs in the meta title
    • Add the keyword kima greggs in the meta description
    • Add the keyword kima greggs to the content of your page
    • Add the keyword kima greggs in the headlines (<h1> to <h6>)

    And then there were a lot more which were pretty easy but those… Well those led me down a path that will be continued later.

    The lesson to take here is that if you’re trying to figure out how to optimize SEO past the easy basics, OnPage can teach you about how you’re doing it wrong. It can teach you how to do it right. And eventually you can optimize things with the conscious understanding of why the code does what it does.

  • Chronic Infections: Blacklisted

    Chronic Infections: Blacklisted

    If you use Chrome, you may be used to those warnings about how a site is dangerous (or hacked) and maybe you shouldn’t visit it. If that happened to your site, you’d get an email if you use Google Webmasters (which I recommend you do), and then after you clean it up you can ask for a rescan. Or if you don’t, Google will rescan the site after a while and if it’s clean, carry on.

    That ends.

    Google found out something we’ve all known for a while, and that’s people can be evil and malicious. And what they’ve done is created a ‘repeat offenders’ blacklist, for sites that clean up only to allow themselves to be reinfected. As they say, “Sites that repeatedly switch between compliant and noncompliant behavior within a short window of time will be classified as Repeat Offenders.”.

    This is dangerous for users when a hack is outside their control.

    The number one cause of reinfections is not plugging the hole. In the case of things like WordPress, it’s down to upgrading everything, deleting anything with a known hack or backdoor, and locking down users. Hacks like Pharma, where the database becomes vulnerable and repeatedly re-infects a site, are thankfully rare for WordPress, but the same cannot be said of other CMS applications.

    And far worse than that is this. By which I mean what happens when your ad network is the cause of a hack?

    Recently, a friend of mine was hacked and got upset that his webhost’s scan of his site said it was clean, while Google did not. In looking at the site, I pointed out the hack was from his ads and not the files on the webhost. His webhost’s scanner didn’t hook into Google’s Safe Browsing service so of course it didn’t come up. He was pissed off about the host missing it, but once I explained why, he realized the magnitude of the issue.

    By adding an ad service to your site, you’re effectively trusting their behavior. And some ads are pretty scummy. While Google Adsense (and others) are usually pretty quick to kick-ban those idiots, the damage will be pretty hardcode. It takes but a small moment for a high-traffic site to serve up enough malware to make that attacker’s plan worthwhile. And worse, if the same kind of person get in again and again (which happens) and your site is infected multiple times, you will end of on the shit-list.

    Thats enough FUD on it. Let’s talk about mitigations.

    We’re all going to need to get better at figuring out where the malware is from. All of us. Security companies are going to lose money if they can’t stop repeat attacks, and since even the best firewall can’t stop shitty ads, all our scanner tools are going to need to be better about detecting what the cause is and where it’s from. This is going to be hard, since the ad may be gone by the time the site scan runs.

    Google will need to tell us what they know a lot better. I don’t know if they will, but they’ll need to figure something out. At the same time, I get why they may not want to. It tips the hand to tell malicious people exactly how you caught on to them, but at the same time telling people “Your ads are serving up malware” would be impactful and hopefully not too harmful. I’m on the fence there.

    Finally, we all know ads on the internet are shit. We’re all barely making money off them. So if you get infected by an ad vendor twice, it’s time to turn those ads off and look for something new. If that ad vendor is Google, open a ticket with them and provide evidence that they’re hurting your SEO and could cause you to get on that repeat offender list.

    Yes, this is making a hard decision, but it’s one you must make. If you’re being betrayed by your ads, you need to quit them.

  • Torrenting Cache

    Torrenting Cache

    There’s a new cache in town, CacheP2P.

    The basic concept is that you can use BitTorrent to seed your cache across the world, making it even faster for everyone. Setting it up is fairly simple. Configuring it is not. At least not in an automated fashion.

    Traditional web browsing is a direct connection between user and server. Traditional caching is done by having the server (or a proxy of the server) create a static copy of the page and display that. In the case of WordPress and any other dynamic CMS, that works by taking the load off of PHP and MySQL having to generate a new page on every visit.

    By using BitTorrent, this is changed so that you would instead be getting a cached copy not from a server but from someone else’s computer. If you and I were on the same network, I might get the page from you instead of the server. That sounds really weird, doesn’t it? Via two javascript files combine to signal the torrent’s API, and a third file uses the unique page hash to determine freshness. Keep your eye on that last part, it’s what makes the idea of a plugin for WordPress such a pain.

    To get the content for that last file, you have to look at your page in dev tools to grab the security hash:

    [CacheP2P] this page's security hash: (2)
    "c72d19b8ed03be98ceebd06f7c93dc06410b4de4"
    "(http://www.cachep2p.com/api.html)"
    

    On Safari it looks like this:

    Example of what the hash looks like

    Now if it works, and you can see an example on the cachep2p.com domain, it would show results similar to this:

    Example of the cache working

    This did not actually work for me on Safari. At all. It was fine on Chrome, but Safari never served up the cache which is odd.

    My first concern was about cache contamination. That is, if someone downloads the page and messes with it, could they have my site show content I didn’t want it to show? By using hashes, this is minimized. I have a file that defines the valid hashes, and if the copy doesn’t match, it downloads my content, not the bad one.

    However the greater concern is that of accidentally releasing content I shouldn’t. Take this example. I accidentally publish something I shouldn’t, like the plan to tear down the Berlin Wall. Without caching, I can quickly redact it and if Google didn’t scrape my page, it’s like it never happened. With caching (and Google…) the bad content (my destruction plans) remain out there unless I visit the cache provider and flush things. If you’ve ever used a 3rd party proxy like Cloudflare to cache your content, this is the situation when you update your CSS files and have to go force them to refresh.

    With the BitTorrent situation this becomes worse, because the cache is in the hands of the masses. If you were a politician and I your rival, I would have someone constantly visiting your site and saving the cache. Then I could go through it and look for accidental leaks.

    Now of course this could happen today. I could set up a simple content scraper and have it ping your site every so often to save the data. You could, in turn, block my IP, and I would retaliate by setting up a Tor connection to do it from obfuscated IPs. The difference here is that you’re actually encouraging me to cache your data with this plugin.

    An additional concern is the dynamic aspect of WordPress. The only way to grab the hash right now is to view the page. That hash will change when I save a page. In fact, it might change on every page load, in some situations. I didn’t get too far into testing at this point, since I realized that in order for this to work I would have to load a page, grab a hash, edit a file, save that file up on the server, and then it would cache…

    That would be terrible on WordPress. For this to work on any large site, the generation of that hash file would have to be automated. No matter if the site is dynamic or not, to make people manually do that is preposterous. A vaguely WordPress solution I dreamed up was to somehow catch the cache has as the page is saved, store it in a post-meta value, and then use WordPress to generate a ‘fake’ page with the URL and the hash for the cache tool to use.

    It might be easier to do that via something like WP Super Cache of W3TC, and have it save the file as it saves the cached page (and point to the static page instead of the dynamic one) but even then, the rapid changing of WordPress content would make it difficult for a cache to seed far enough out.

    Right now, I think this is something that might only be useful for a small, mostly static, site.

  • stdin: is not a tty

    stdin: is not a tty

    That was the error.

    stdin: is not a tty

    I use rsync to make a backup of my files via this shell script that runs every time I log in to my laptop:

    #!/bin/sh
    
    cd $(dirname $0)
    
    TODAY=$(date)
    echo "
    -----------------------------------------------------
    Date: $TODAY
    Host: example.com
    -----------------------------------------------------\n" > log.txt
    
    echo "Backup files..." >> log.txt
    rsync -aCv --delete --exclude-from 'backup-exclude.txt' -e ssh me@example.com:/home/me/public_html/ public_html >> log.txt
    
    echo "\nEnd Backup. Have a nice day." >> log.txt
    

    It’s a nice little script. It downloads everything into a folder called example.dev which I then use with DesktopServer to have a copy of my site. The database? That’s handled by another file which pulls down the DB backups from Amazon S3 (something built in to cPanel) which I may cover at a later point.

    Today though, let’s talk about what that error is, what it means, how we fix it, and why that fix works.

    The error is caused by having mesg at the top of a .bashrc file on my server. In my case, the line is not in the user’s file, but the root file. The message, on login, tells you when your last login was, where it was from, and what the IP was. It also tells you how many failed logins happened since your last login, a report that amuses me when I sudo into root now and then.

    Why I get the error is because when I log in via rsync, the message is trying to show on the rsync output, which can’t parse it, and thus errors. The fix means I need to tell it not to show the output. And to do that we put this at the top of the .bashrc file:

    [ -z "$PS1" ] && return
    

    Another option would be this:

    if `tty -s`; then
       mesg n
    fi
    

    It depends on your flavor of Linux of course.

    The final question we have with this is why does it work?

    The second fix is simple. It checks for tty, which is Teletype. If you’ve ever wondered how deaf people use the phone, it’s via a teletype machine. For the purposes of computers, it just means “This is text and we are going to talk in text interactively.” The tty setting is handled by your terminal of choice. If it doesn’t get tty, the server will just not show the message.

    The first fix is a little more weird. PS1 stands for Prompt String 1 and is one of the prompts you get when logging in. Normally it just shows username and password. Using -z is checking if the prompt is interactive or not. If not, return (aka exit out and do nothing else).

  • Google Knowledge Blindspot

    Google Knowledge Blindspot

    With the new release of Yoast SEO 3.6, I decided to test out their new walk-through setup.

    As someone who’s been studying SEO to one degree or another since the 1990s, SEO in and of itself is no great mystery to me. The bare bones of how it works, how you make good content, is understandable. But what Yoast SEO does, and what I like it for, is it makes it obvious to people without my experience what is needed. It also takes the generally good SEO of WordPress (and a good theme) and optimizes it for the myriad, constant changes that Google does.

    For that, the walk-through is a wonderful idea. I like it. I think new users will love it. I think it’ll lessen the barrier to those people who are told “You need a website!” but no one is willing to (or able to) sit with them and help them get started.

    Initially I was super impressed. Yoast had clearly studied the aspects of walk-throughs that had worked and that didn’t, lifting pages from other complex plugins that needed to be used by, perhaps, non-technical savvy people.

    Yoast Walk Through: What kind of site is this?

    Being asked what kind of site I was running was brilliant. For the purposes of this test, I decided to use my community/wiki/library – LezWatchTV. And right away I ran into a problem.

    Am I a company or a person?

    The tool wanted me to say if I was a company or a person.

    Well … Neither. I’m a community site. Or maybe a group? Either way, the two designations didn’t really apply properly. Where was “Other”?

    This couldn’t be Yoast making a boneheaded maneuver, I realized. Few people know better than Joost and his crew what WordPress is used for. They’re smart people. They’ve seen more of the Internet than most of the rest of us and they know well how it’s used. So could the screwup be Google or Schema.org?

    I went to Schema.org to look up how they would classify the site, and determined that DataCatalog was the most appropriate. Alright, knowing there was a good classification, I looked back at Google’s Knowledge Graph.

    Google’s Knowledge Graph is a weird thing. It’s Google’s attempt to figure out how to answer your questions. You know how you can type in “How do I do X?” into Google and you get that interesting formatted answer?

    Example: How do I bake a pie?

    That’s from their Knowledge Graph. But more importantly, so is this:

    Example: Knowledge Graph data of Root from Person of Interest

    The more you dig into it, the more you realize that the only boxes like that are for people or companies. So the breakdown is that Google has not yet figured out how to flag non-people non-companies.

    This means my ultimate question of ‘what I am?’ has become a little more existential than I’d wanted, and a little simple. It’s not a person, therefore it must be a company. And while that is entirely, totally, daftly incorrect, it’s also less incorrect that a person.

    Thanks, Google.