Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • The Problem with Renaming

    The Problem with Renaming

    If you caught my talk in Seattle last week, I talked about names, versions, and SVN.

    Jeff Atwood: There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

    One of the things I touched on with names was their problematic nature. And believe me, I know about that. You see, I’m a Cleveland Indians fan.

    And yes, I think the name (and the logo) are racist.

    You can’t rename things, but you can rebrand

    When I said this, I meant that you can’t rename a plugin slug. Yoast SEO will forever have the URL of wordpress-seo because we do not have a way to rename the slug and properly redirect everyone. We just don’t. And even if we did, the old URL would need to remain in perpetuity in order for everyone who upgraded super late to still get the new code.

    Names are really important. Your name is (often) your brand, and your brand is how people know you and how to find you. When you consider a name like the Cleveland Indians, today we can see the problems with it. Racism. But in 1914, we were a little simpler, a little more naive…. A little stupider. Okay a lot stupider.

    The problems that Cleveland faces with renaming are related to the problems you would face in renaming your product.

    Rebranding has a cost, and it could be everything

    The crux of all issues with renaming is that if people don’t like the new name, or can’t find you because of it, they will walk away. If you’re a small company with a few sales, and people can’t find your name anymore, you could go bankrupt. For Cleveland, it would be worse. If the baseball team went belly up, it would translate to thousands of people going out of work.

    This is not to say the renaming or rebranding isn’t important. Cleveland’s reasons are obvious. Yours may be less so. You may be asked to rebrand to prevent a potential legal issue. Or you may decide that Mailpoet is a better name than Wysija Newsletters. But the rebranding can come at a cost.

    Losing history can loose users

    In baseball, one of the rationales for not renaming a team is the team history. A team is known by it’s name and its mascot (and logo). The logos of most team are fairly mutable over time, you can see the growth and development when you look at it historically. There’s a reason most redesigns are actually not dramatic, but careful and planned. That can not be said of the names, which rarely (if ever) change unless a team moves.

    The same goes for your name. If Cleveland renames their baseball team to, say, the Lancers or the Blues, how do you handle the change? You have to make sure everyone knows (this is easier for baseball than the rest of us) and you have to make sure they know why.

    Warner Bros' 'we were racist' disclaimer

    When a similar warning was put up before Tom & Jerry cartoons, the Internet lost their shit. Go figure. And yet that’s the problem here. People react in unpredictable ways to being told “A thing you liked and empathized with is bad” because they think it means they were bad. Generally no, they weren’t.

    That’s a much bigger issue for baseball than for your code though. Unless you decided to name your theme Mien Kampf, or decided to present your plugin as ‘The Final Solution.’ That’s because a name is not isolated. You are not isolated. You live in a world where the implications and uses of a word and a logo can have far reaching effects. People who, for whatever reason, connect with your plugin name can feel left out when you rebrand.

    No matter how deep your pockets are, you will pay

    Thankfully you are way luckier than baseball (or football, hello). You have the opportunity to know the world you’re in. Today we are more aware of the implications of our words, but also we now listen to other cultures and viewpoints about how our words and actions are perceived.

    The cost of renaming yourself is high, but the ultimate question is not to ask how much the cost is of the renaming. The question is which cost is higher: The loss that stems from renaming, or the loss that stems from defending a name. If you’re being sued by Microsoft for copyright infringement over a name, and yes it happens, it doesn’t matter what your intentions were. What matters is you’re probably going to lose.

    If you’re baseball, you generally have a lot more money than the average joe. You might be able to win a lawsuit. But your reputation will be tarnished, and that too will impact your bottom line.

    If you know what to do and you don’t do it, there you bloody well are, aren’t you.

    — Lord Buckley

  • Torrenting Cache

    Torrenting Cache

    There’s a new cache in town, CacheP2P.

    The basic concept is that you can use BitTorrent to seed your cache across the world, making it even faster for everyone. Setting it up is fairly simple. Configuring it is not. At least not in an automated fashion.

    Traditional web browsing is a direct connection between user and server. Traditional caching is done by having the server (or a proxy of the server) create a static copy of the page and display that. In the case of WordPress and any other dynamic CMS, that works by taking the load off of PHP and MySQL having to generate a new page on every visit.

    By using BitTorrent, this is changed so that you would instead be getting a cached copy not from a server but from someone else’s computer. If you and I were on the same network, I might get the page from you instead of the server. That sounds really weird, doesn’t it? Via two javascript files combine to signal the torrent’s API, and a third file uses the unique page hash to determine freshness. Keep your eye on that last part, it’s what makes the idea of a plugin for WordPress such a pain.

    To get the content for that last file, you have to look at your page in dev tools to grab the security hash:

    [CacheP2P] this page's security hash: (2)
    "c72d19b8ed03be98ceebd06f7c93dc06410b4de4"
    "(http://www.cachep2p.com/api.html)"
    

    On Safari it looks like this:

    Example of what the hash looks like

    Now if it works, and you can see an example on the cachep2p.com domain, it would show results similar to this:

    Example of the cache working

    This did not actually work for me on Safari. At all. It was fine on Chrome, but Safari never served up the cache which is odd.

    My first concern was about cache contamination. That is, if someone downloads the page and messes with it, could they have my site show content I didn’t want it to show? By using hashes, this is minimized. I have a file that defines the valid hashes, and if the copy doesn’t match, it downloads my content, not the bad one.

    However the greater concern is that of accidentally releasing content I shouldn’t. Take this example. I accidentally publish something I shouldn’t, like the plan to tear down the Berlin Wall. Without caching, I can quickly redact it and if Google didn’t scrape my page, it’s like it never happened. With caching (and Google…) the bad content (my destruction plans) remain out there unless I visit the cache provider and flush things. If you’ve ever used a 3rd party proxy like Cloudflare to cache your content, this is the situation when you update your CSS files and have to go force them to refresh.

    With the BitTorrent situation this becomes worse, because the cache is in the hands of the masses. If you were a politician and I your rival, I would have someone constantly visiting your site and saving the cache. Then I could go through it and look for accidental leaks.

    Now of course this could happen today. I could set up a simple content scraper and have it ping your site every so often to save the data. You could, in turn, block my IP, and I would retaliate by setting up a Tor connection to do it from obfuscated IPs. The difference here is that you’re actually encouraging me to cache your data with this plugin.

    An additional concern is the dynamic aspect of WordPress. The only way to grab the hash right now is to view the page. That hash will change when I save a page. In fact, it might change on every page load, in some situations. I didn’t get too far into testing at this point, since I realized that in order for this to work I would have to load a page, grab a hash, edit a file, save that file up on the server, and then it would cache…

    That would be terrible on WordPress. For this to work on any large site, the generation of that hash file would have to be automated. No matter if the site is dynamic or not, to make people manually do that is preposterous. A vaguely WordPress solution I dreamed up was to somehow catch the cache has as the page is saved, store it in a post-meta value, and then use WordPress to generate a ‘fake’ page with the URL and the hash for the cache tool to use.

    It might be easier to do that via something like WP Super Cache of W3TC, and have it save the file as it saves the cached page (and point to the static page instead of the dynamic one) but even then, the rapid changing of WordPress content would make it difficult for a cache to seed far enough out.

    Right now, I think this is something that might only be useful for a small, mostly static, site.

  • Hot Hands And Playoffs

    Hot Hands And Playoffs

    Today I’m wandering off topic into a world of baseball and statistics.

    My family have been Cleveland Indians fans since they came to the United States and settled in the city. My grandmother was an accountant, my father a mathematician, and I a web developer who works on software used by 26% of the Internet. Give or take. I’m also a third (and probably final) generation Clevelander. Yes, I root for my home team.

    October of 2016 marked the first time since 2007 that Cleveland was in the American League Championship Series (ALCS). In the intervening years, my family had all migrated to iPhones and iMessage, allowing us to converse in real time across two continents, two countries, four time-zones, and five cities.

    My father, the mathematician and risk analyst, kept a close watch on Nate Silver’s FiveThirtyEight project, especially the MLB Predictions, as Mr. Silver has been quite spot on for things for a while, understanding the implications of probability and chance.

    On October 18th, FiveThirtyEight gave Cleveland a 53% chance of winning the ALCS game 4, a 94% chance of making the World Series, and a 38% chance of winning it all for the first time since 1948. The time Cleveland won before that? 1920. Not quite Cubs level of history, but it’s been a long enough time than my grandmother Taffy never got to see them win a third time (she was born July 7, 1920).

    The game on the 17th was nothing short of incredible. The starting pitcher was yanked after 2 outs because his cut pinky was dripping blood. There are, you see, a bevy of incredible rules about what pitchers can and cannot wear. More than the normal player. And we’re talking about a sport than demands all players use a glove that has colors only within a PANTONE® color set lighter than the current 14-series. These guys are nuts. And one of the rules is no bandages on the pitchers’ hands.

    a pitcher’s person cannot include any unessential or distracting thing (including jewelry, adhesive tape, or a batting glove), especially on his arm, wrist, hand, or fingers.

    Bauer’s 11 stitches in his pinky split and was incredibly nasty, so he was replaced. Cleveland used seven pitchers, pretty much their entire relief bullpen, to get through the game. My family began to argue the intelligence of the move. Instead of using the rookie Merritt to start game 4, possible win-it-all game, Manager Terry Francona decided to start his ‘ace,’ Kluber.

    To understand this, you have to start with the odd fact that Cleveland is down three of their best pitchers to injuries. This is including Drone Boy Bauer. Such a situation is rare for the playoffs, if not unheard of. That means they are more reliant than ever on their bullpen, so using every single pitcher possible on Monday meant they would all be a little tired on Tuesday. And Kluber would be starting 3 days rest when a pitcher normally gets 4 or 5.

    Clearly Francona was banking on the team not needing to use the bullpen much on Game 4, but why would he make that decision knowing that the odds of winning on Tuesday were insanely low. As my dad said:

    Winning 7 games straight is an outlier. They won 6 in a row twice, of course the 14 streak, 4 games three times. I’m betting they will lose the next two in Toronto.

    Then he started emailing us all homework.

    Before we get to the math, let’s look at the baseball logic. The reason you would play Bauer is that the odds are Cleveland will lose on the 18th, and a good manager would know that and bet on it, like my father. Teams winning 7 games in a row is crazy. It’s rare. It’s risky. By playing Kluber, an experienced pitcher, you solve two problems. First, Merritt is a rookie. Him losing will have a deep psychological impact on the young guy. Kluber can take a hit and keep going. Second, it means if Cleveland does win, Kluber will be well rested for the World Series. If Game 4 is lost, Merritt will pitch the safer Game 5.

    The psychology of math is exactly why no one would discount the Cleveland Indians winning seven games in a row in the post season, however.

    […] what Terry is seeing is momentum, the intangible. You gotta measure the odds with numbers, but making good decisions goes beyond the odds … beyond just the odds. Like CoCo’s diving catch.

    This is where the homework comes into play. Nine papers about hot streaks later, I came to the conclusion I had always felt had to be true. There is no such thing as a winning streak. They are nothing more than standard deviations from the mean. Models of the math have told us that there is only one event in baseball that has happened outside of the frequency of said models. Everything, the longest runs of losses and wins, are exactly as they should be and happen as often as they ought.

    Except for one: Joe DiMaggio. Joltin’ Joe’s 56–game hitting streak in 1941 doesn’t make any sense. As we read in Streak of Streaks by Jay Gould, in order to make it mathematically probably to have a run of 50 games with a hit, we should have had four batters with a lifetime average of .400, and 52 with .350 or higher over 1000 games. Instead, three players have achieved a batting average over .350 and not one has managed .400 lifetime.

    You’re thinking “But Ty Cobb!” right now, and guess what? His lifetime is .367, followed by Hornsby at .358, and Shoeless Joe Jackson brings up the rear at .356 for his short career.

    DiMaggio’s streak does not make sense.

    Most MLB records we consider to be unbreakable are only that way due to changes in the way the game is played. Pitchers no longer play complete games on the regular, nor do they start 60+ games a season. The weirdness of DiMaggio is that his numbers are off the charts for that year, and actually the entirety of MLB history.

    The Hot Hand: A New Approach to an Old “Fallacy”. Notice the quotes? The theory behind the Sloan paper is that the Hot Hand (or streak) is a fallacy because we’ve always been working under bad assumptions. To whit:

    However, prior research hinges on the assumption that player shot selection is random, independent of player-perceived hot or coldness. Said differently, it assumes that players will take the same types of shots, with the same level of defensive coverage, regardless of whether they have just made or missed three shots in a row. We find this assumption difficult to believe – if players have been shooting well, it seems logical that they would begin to attempt more difficult shots and opposing defenses would begin to cover them more tightly. This would potentially counteract the Hot Hand effect.

    To make this more obvious to the conversation at hand, basketball is not baseball and men are not potatoes. Baseball is a rarity in sports. The defense has control of the ball and, barring injury, everyone who plays will have an at-bat (designated hitter rules aside). Basketball has no promise that everyone who plays will have a chance to shoot a basket, or even touch a ball. Baseball hitting streaks come down to one person versus a hundred. The batter versus every pitcher they face. Provided they’re not walked, the batter remains in control of their destiny.

    All of this is quite fanciful. There are hundreds of articles, like Phil Birnbaum’s quest for evidence of the Hot Hand effect and Tangotiger’s Sabremetric blog on the impact of the Zone on streaks. The best we can say is ‘Streaks exist, but generally they do so within the expected norm of percentages.’

    None of this considers the psychological impact of a streak. The longer a streak goes on, the more stress and nerves are put on a player. At the same time, the more ease is given a player, as the expectation of winning becomes a short-term norm.

    Per FiveThirtyEight, the Cleveland Indians had a 53% chance of winning Game 4 of the ALCS on October 18, 2016. The Epstein family gave it much less of a chance. We were right.

  • Not Spilling SEO Juice

    Not Spilling SEO Juice

    Once in a while I still get people who ask me to do things for them. Some offer to pay, most don’t, but a common thread lately has been “How do I redirect and not screw up my SEO?”

    They tend to ask it nicer, but that’s what they mean. And to be honest, the question surprises me in 2016. I’ve read email after email about people who swear “I renamed my domain/page and now my SEO crapped out!” And to each and every one I ask “Did you redirect it properly?”

    I think what’s happening is that the problem is so clear cut, people are overthinking. That is, bar none, the biggest mistake I feel anyone ever makes. They overcomplicate a problem and mire themselves in the hell of debugging. I do it all the time.

    Let me tell you a story. When we started LezWatchTV, we had some pretty non-SEO friendly URLs. Our taxonomy for TV stations was “show-tags” and it went on and on with worse and worse names. Finally I sat down and drew out a map of what the ‘best’ SEO names would be for what I needed, and it was very short:

    • TV Stations: stations
    • Character Traits: cliches
    • Show Tropes: tropes
    • Characters: characters
    • Shows: shows

    Five items. And to do the recdirects, I put this in the .htacess file:

    RedirectMatch 301 ^/show-tags/(.*) /stations/$1
    RedirectMatch 301 ^/character-tags/(.*) /cliches/$1
    RedirectMatch 301 ^/lez_cliches/(.*) /tropes/$1
    RedirectMatch 301 ^/lez_tropes/(.*) /cliches/$1
    RedirectMatch 301 ^/lez_chars/(.*) /characters/$1
    RedirectMatch 301 ^/show-characters/(.*) /characters/$1
    RedirectMatch 301 ^/tv-shows/(.*) /shows/$1
    

    That’s all I needed. I took all the old, bad names and I redirected them to the good names. But I had to throw in an added wrinkle. I’d originally decided shows would have cliches and characters tropes, but I realized that was wrong and flipped them a week later. That meant I had things like /cliche/law-enforcement which had to be redirect to /trope/law-enforcement and for that, there was nothing to be done save a massive section like this:

    # BEGIN Great Big Redirect Section
    RedirectMatch 301 ^/cliches/bisexual-love-triangle(.*) /tropes/bisexual-love-triangle$1
    RedirectMatch 301 ^/cliches/coming-out(.*) /tropes/coming-out$1
    RedirectMatch 301 ^/cliches/gay-for-ratings(.*) /tropes/gay-for-ratings$1
    RedirectMatch 301 ^/cliches/law-enforcement(.*) /tropes/law-enforcement$1
    [...]
    RedirectMatch 301 ^/tropes/athlete(.*) /cliches/athlete$1
    RedirectMatch 301 ^/tropes/cross-dresser(.*) /cliches/cross-dresser$1
    RedirectMatch 301 ^/tropes/firefighter(.*) /cliches/firefighter$1
    RedirectMatch 301 ^/tropes/teacher(.*) /cliches/teacher$1
    

    This goes on for 24 character clichés and 18 show tropes. It had to be done manually.

    Now let’s ask the big question. Did I lose any SEO Juice by doing this?

    Not a drop. Because I used a 301 redirect and I used it properly (calling it well above my redirect for WordPress). This is exactly what Google’s webmaster docs tell you to do:

    If you need to change the URL of a page as it is shown in search engine results, we recommend that you use a server-side 301 redirect. This is the best way to ensure that users and search engines are directed to the correct page. The 301 status code means that a page has permanently moved to a new location.

    Still not sure? Here’s what Joost (of Yoast SEO) says when asked if you should use a 301 or a 302 redirect:

    The answer is very simple. You should never use a 301 redirect if you ever want to use that url again. So if you’re redirecting one url to another and you’re sure that the old url is never going to be used again, it’s a 301. If you’re not sure, it’s a 302. That’s it. Good luck!

    This is, again, straightforward. If you’re never using the page again (which I’m not), you use the 301 redirect.

    By the way, if you’re using Yoast SEO Premium, then you should still have access to their redirect tool. Personally I feel that’s a change that should be on the server level, not a plugin.

    Okay, so what happens if you forget to do this? Your SEO will tank. However this can be fixed! Again, if you’re using Yoast SEO and you set up Google Analytics and webmaster tools with it, you have a handy Search Console which will let you see what the errors are. If you don’t, but you use Google (and really this is where it’s quite helpful), you’ll go to your Search Console > Crawl > Crawl Errors page and check out what’s listed.

    Remember to check desktop and smart-phone pages. As long as you keep those old URLs redirecting properly to their new home, your juice will be safe and sound.

  • stdin: is not a tty

    stdin: is not a tty

    That was the error.

    stdin: is not a tty

    I use rsync to make a backup of my files via this shell script that runs every time I log in to my laptop:

    #!/bin/sh
    
    cd $(dirname $0)
    
    TODAY=$(date)
    echo "
    -----------------------------------------------------
    Date: $TODAY
    Host: example.com
    -----------------------------------------------------\n" > log.txt
    
    echo "Backup files..." >> log.txt
    rsync -aCv --delete --exclude-from 'backup-exclude.txt' -e ssh me@example.com:/home/me/public_html/ public_html >> log.txt
    
    echo "\nEnd Backup. Have a nice day." >> log.txt
    

    It’s a nice little script. It downloads everything into a folder called example.dev which I then use with DesktopServer to have a copy of my site. The database? That’s handled by another file which pulls down the DB backups from Amazon S3 (something built in to cPanel) which I may cover at a later point.

    Today though, let’s talk about what that error is, what it means, how we fix it, and why that fix works.

    The error is caused by having mesg at the top of a .bashrc file on my server. In my case, the line is not in the user’s file, but the root file. The message, on login, tells you when your last login was, where it was from, and what the IP was. It also tells you how many failed logins happened since your last login, a report that amuses me when I sudo into root now and then.

    Why I get the error is because when I log in via rsync, the message is trying to show on the rsync output, which can’t parse it, and thus errors. The fix means I need to tell it not to show the output. And to do that we put this at the top of the .bashrc file:

    [ -z "$PS1" ] && return
    

    Another option would be this:

    if `tty -s`; then
       mesg n
    fi
    

    It depends on your flavor of Linux of course.

    The final question we have with this is why does it work?

    The second fix is simple. It checks for tty, which is Teletype. If you’ve ever wondered how deaf people use the phone, it’s via a teletype machine. For the purposes of computers, it just means “This is text and we are going to talk in text interactively.” The tty setting is handled by your terminal of choice. If it doesn’t get tty, the server will just not show the message.

    The first fix is a little more weird. PS1 stands for Prompt String 1 and is one of the prompts you get when logging in. Normally it just shows username and password. Using -z is checking if the prompt is interactive or not. If not, return (aka exit out and do nothing else).

  • Bundling – Not What We Wanted

    Bundling – Not What We Wanted

    There is a problem with bundling.

    We like to bundle things together, to say “If you buy X, get Y as well for less!” And the simple problem is that our customers don’t actually always want X. We treat bundling like it solves our problems, but it doesn’t. It just makes angry customers who have more than they need, and pay more than they wanted for things they don’t use.

    Let’s think of it like a coffee shop for a moment.

    You go to Mooncoins and you want a latte and a gluten free donut. When you get there, you look for your $5 latte and see that you can’t buy it anymore but you can spend $10 for a latte and a vegan donut, non-gluten-free. If you want the gluten free bundle that’s another $15 and it comes with a muffin. You don’t want the muffin. You want the option to name your own bundle.

    Okay, so how does this relate to software?

    If you live in the US, you’ve probably heard about the Progressive insurance company. Yes, the Flo ones. Since they own Jacob’s Field in Cleveland, I hear a lot more of their commercials than I care about, but they have a weird deal with a ‘name your own price bundle’ where you pick what you want, and how much you want to pay for it. Within reason.

    This means we ask “What are you bundling?”

    I get waxed once a month or so and they offer ‘packages.’ There’s a ‘whole face’ package and a then separate services for lip, chin, and eyebrows. What I want is lip and chin, which they don’t offer, so instead of paying less for two services (which is what you get in a package), I have a choice of paying more for a package I don’t want (whole face) or more for two services I want (lip and chin).

    The company wants me to pay for a package, which would save me quite a bit more than buying the lip, chin, and eyebrows as separates would cost. But they don’t have a ‘pick two’ option.

    When you decide what you want to put in a bundle, you presume you know more about what the customer wants than the customer. And the flaw in this plan is that you don’t know more than the customers. You presume you know what works best, but you don’t.

    You may have an idea of what works best for the people you’ve run into, but some of them would look at the price difference between paying for two services and paying for a package, see than it works out to less than $50 a year, and go for it. Then you have someone who thinks that $50 a year isn’t nothing, and would like to save it and not use services they don’t really want.

    Bundling is meant to reduce options and let people pick what they need. What it more often does is require people to make different decisions of what they want and need, but also what the value is on those things in time and money.

    Maybe we should start rethinking what we bundle and instead consider how we bundle. Let the customers have options. Use features like “People who bought X also bought Y.” Add in discounts “If you buy 3 products, get 5% off your entire purchase.” Offer them example bundles that are your current deals “Most people buy these 3 together. Purchase them now and save 5%, or mix and match your own.”

    But start looking at how people purchase your products as a whole, and give them discounts not on the bundle you invent, but on the bundle they create. Guide them to what they need, not what you think they want.