Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: development

  • Lesbians Eat Data

    Lesbians Eat Data

    The original title was “Lesbians Broke Jetpack” but it turned out to be even more complicated than all that. And thankfully more rare.

    This concerns three things.

    1) The website lezwatchtv.com
    2) Jetpack for WordPress
    3) ElasticSearch

    On Sept 6th, Jetpack released a new version – 4.3 – and I promptly upgraded. When I did, I started getting weird emails from my server of a “Suspicious process running under user lezwatchtv” and the content looked like this:

    Executable:
    
    /usr/bin/php-cgi
    
    Command Line (often faked in exploits):
    
    /usr/bin/php-cgi /home/lezwatchtv/public_html/wp-cron.php
    
    
    Network connections by the process (if any):
    
    tcp: MYIP:39734 -> JETPACKIP:443
    

    Being a proper code-nerd, I backed out a few things and tried again. Same error. I went into my process watcher and saw five processes calling wp-cron.php for that domain, but no others on the server. I killed the processes and turned off WordPress cron. Everything was fine. Then I installed WP Crontrol and manually kicked off cron jobs until it happened again.

    The culprit was a ‘runs everyone one minute’ job by Jetpack, which struck me as bewildering.

    		if ( ! wp_next_scheduled( 'jetpack_sync_cron' ) ) {
    			// Schedule a job to send pending queue items once a minute
    			wp_schedule_event( time(), '1min', 'jetpack_sync_cron' );
    		}
    

    The sync job is meant to update your data on Jetpack’s servers, which makes sense, and running every minute will copy up everything that changed in each minute. It seemed a little heavy to me, and disabling it stopped my run-away cron jobs. That meant the sync was failing. I reached out to a Jetpack tech and explained the situation. He re-ran the sync manually and it stalled.

    We determined the likely issue was that the job was, for some reason, hanging and unable to finish, so it would just stay active forever. And ever. And since it would see that the sync had never done, it would start up all over again until, finally, my server killed the five (yes, five) processes and sent me an angry text about it. Yes, my server texts me.

    At this point I emailed support with full details and got a very insightful reply from Brandon Kraft:

    I’m interested in if there’s an issue with the server connecting with WP.com (seems unlikely given your other sites sound fine), if there’s a large amount of postmeta or something like that that is throwing a wrench into the system, or something to that effect. We’ve isolated some odd cases where when there is either a lot of postmeta or something yet undetermined in postmeta breaks things in a way similar to what you saw.

    DING!

    See there are 40 posts, 22 pages and then 1246 Custom Post posts on LezWatchTV.

    906 posts are ‘characters’ and all characters have three separate taxonomies, two plain text post-meta values, and two serialized. 340 posts are ‘shows’ with two taxonomies, three plain text post-meta values, three integer (plain text) post-meta values, one true/false, six HTML, and one serialized data.

    So if I was going to point at “a site with lot of weird post meta” I would pick this site.

    I spent a few hours on the 7th (the day after the release) beta testing their 4.3.1 version. We tried a patch for the bug where full sync wasn’t giving up on wp error. That helped a little, but the error kept happening, limiting itself to two or three processes. I pointed to a special API, I ran some weird wp shell commands, and all we came up with was that at 190 or so ‘chunks’ out of 443, my server would stop sending messages to Jetpack’s servers.

    Eventually I zipped up a copy of the theme and plugins and a sanitized DB (all secret information removed) and sent it over for them to play with. And they reproduced it! That was good. It meant it wasn’t my server, but it was my setup and the way Jetpack’s sync worked.

    Like everything that has to sync, Jetpack plays the game between ‘sync it all super fast’ and ‘don’t kill the server.’ The way they sync the posts, they apply filters to render the content, including embeds. Because it does that with embeds, it triggers update_post_meta to update the _oembed_time_{long_base64_string} value, so it can know when to update the embed code for best caching.

    Wasn’t I just talking about post meta the other day? Why yes! I was talking about optimizing post meta for search! The interesting thing about that is, since I’m using ElasticPress, it scans all my post meta for updates so it knows what to save as searchable data. That means when Jetpack triggers the update, it triggers ElasticPress, and all hell breaks loose.

    But why did this happen now? Because I turned on “Sitemaps” for Jetpack. And when you enable (or disable) a Jetpack Module, it triggers a full sync. This happened to be the first time I’d done that since installing ElasticPress.

    I did what any responsible person would do, and wrote this all up and submitted a bug report with ElasticPress. Sadly for now I’ve disabled ElasticPress until this can be resolved. I can probably turn it back on safely, since I won’t be triggering a full sync any time soon, but since I don’t want to accidentally crash things, I’ve left it off.

    And how was your week?

  • CMB2 And The Dropdown Years

    CMB2 And The Dropdown Years

    At WordCamp Montreal, I mentioned the database of dead lesbians that Tracy and I maintain. The camper looked at it and said “You know it would be awesome if you showed the shows airdates.”

    Good point! Except I just plain struggled with the concepts and how to do them in CMB2. I knew I could make multiple fields in one ‘metabox’ as I read up on the snippet for an address field, but try as I might, I couldn’t make it work.

    I tweeted my headache and ended up talking to Justin Sternberg who asked me if I could explain my use case better.

    I have 300+ posts, all of which have a start and end date. Some may have an end date of “current” however.

    Examples of valid data:

    • 1977-1979
    • 2016-current
    • 2000-2016

    I also need to sort by start and end year. So I can search for all posts with a start of 2014.

    I could have two year-sorts, easily, but that makes for a clunky interface as it would be separate fields. I know CMB2 can have a combined field (like addresses) but while I got it to save, it wouldn’t properly display on the edit page.

    This only needs to be editable on the WP admin edit post.

    That night, he replied and asked if this year-range field type would work.

    Mind? Blown. It works exactly how I need it to. I tweaked the code (and threw in a pull request) to set up a way to reverse the years (show newest first) which is more useful for my needs.

    Now? Editing 319 show entries.

  • Long Term Vision

    Long Term Vision

    Say what you will about Jetpack, the plugin serves a great purpose in a few major ways.

    1. Once you register for the API, you never have to again.
    2. Everything is easy to find to update and configure (Menu -> Jetpack).
    3. New Features are added and you don’t need to install a new plugin.

    Now look at something else. A company released over a dozen Facebook plugins. All the plugins required you to connect via their API (a separate connection in each). All the plugins required you to use their admin panel to set up a per-plugin configuration. All the plugins deleted those settings on deactivation. Or how about a WooCommerce related set of plugins that all required the use of their API (legitimately) but all the plugin did was connect you and send you to where that specific plugin part was configured?

    Got that in your head? Good. Now what if Jetpack did that? What if to enable aspect of Jetpack you had to install Jetpack Stats, Jetpack Comment Form, Jetpack Subscriptions, etc etc etc.

    You’d hate Jetpack. And worse, the Jetpack developers would too. They’d have to work extra hard to ensure all the suite of plugins conformed to style and protocol. Shared libraries? Gotta update them in all of the plugins. Oh and don’t forget to make sure they’re all backwards compatible in case someone updates one but not another. Figure out which one takes priority, make sure someone else’s changes on Stats doesn’t break Comment Form, and on and on and on.

    There’s a reason Jetpack works as well as it does, and it’s not just because Automattic is behind it. Jetpack has one sign up, one registration, one setup for the connection. Each sub-app is toggled via Jetpack. New additions, when the main plugin is updated, are all easily checked for backcompat and everyone tests together before pushing out.

    So why do I call this the long view?

    Because the long view considers not just adding new users to your system, but keeping them in a way that makes them happy. The long view looks at the reality that your developers will leave. The long view thinks about the easiest way to maintain a lot of code. The long view makes sure that introducing old users to new things is easy.

    And that means, the long view would look at your 15 or 20 plugins that all use the same ‘base library’ and tell you it’s a shitty plan. It’s more hours on more code with more potential conflicts. It’s less cross-code checking. It’s more testing. It’s more unit tests that have to be repeated over and over.

    The biggest reason I see people argue that 18 plugins is better than 1 is ‘SEO.’ The quotes are there on purpose. Because it’s bullshit. Anyone who thinks 18 plugins will net you better SEO than one, well written, well curated document file on the master plugin has failed at SEO school and needs to meet Ted. Ted is a 12 inch lead pipe that the boss keeps in the top drawer of his desk at DreamHost. No, not really. But the point remains, they’re not an SEO Expert.

    Content is king. Remember that? Duplicate content is bad.

    However, in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results.

    That applies to your code too. Duplicate code, duplicate functionality, is bad.

    Now there is always a time and a place for multiple separate plugins. I only want to use Easy Digital Downloads extension for Stripe, not any other payment gateway. So I don’t need the extra plugins in a ‘payment gateway suite.’ But there, EDD cleverly has all the base code in their plugin and the add-ons just enable more features. Yoast’s Video SEO is similarly an add-on. They didn’t waste time making a dupe of their main SEO plugin just to add in videos.

    I hope the point is made. You can make your code simpler, easier to maintain, and easier for your users to find the new things if you keep it all in one. And that is a win.

  • Why I Don’t Use Git Flow Anymore

    Why I Don’t Use Git Flow Anymore

    Please don’t get me wrong. I love git-flow. I think it’s great. But it was great to teach me how to use git. It taught me not to use master for my development, and how to make branches and all that. Git Flow got be in the habit of doing good things and testing and showed me how to work with multiple projects. It was a great crutch to get comfortable with the ideas of Git that (for a long time) confounded me.

    But I don’t need it anymore. Instead, I do things very, very simply and my flow is as follows.

    $ git checkout master ; git pull

    I always start by assuming I’ve forgotten something and need to sync up. This works for me, since I run on two computers.

    $ git checkout NewProject

    Once I’m in the new project, I start making all my edits, add my code, etc. Now here’s where I get a little silly. If I’m working on my own stuff, it’s Coda, always, so I’ll constantly ‘commit all changes’ and fill in my commit messages and then cancel out. I do this over and over until I’ve reached a point where I think “This code is ready to be tested.” Then I commit for real.

    This means my commit logs look like this:

    Convert Font Icon to SVGs
    
     - Add new images for social media
     - Optimize CSS for pagespeed
     - Remove unused function.php file
     - Add shortcodes
    
    Fixes #1234
    

    There are other ways to do this, of course. I’m a huge proponent of keeping change logs but a commit message should be useful too.

    It’s too easy to put in this: git commit -m "Adding new icons"

    While it’s more time consuming, just use git commit and put in a good message like I did up at the top. Now, this is not new. A hundred people have all said this before, but it bears repeating.

    • The first line is your subject, keep it to 50 characters.
    • Capitalize the subject line but don’t use a period
    • Use the imperative mode – “Add new icons” and not “Adding new icons”
    • Leave an empty line between subject and body
    • Explain what you did in the body, keeping lines to 72 characters
    • Bullet points are okay – use a space before a hyphen for best compatibility
    • Reference any issues at the bottom – “Fixes: #123” or “See Also: #456 #789”

    If, like me, you commit and then, before merge, realize you have changes, use git commit --amend to add your new changes to the existing commit.

  • Encrypting Source Code Doesn’t Make It Safer

    Encrypting Source Code Doesn’t Make It Safer

    I’d love to think that’s all I have to say on the matter, that you all will read the subject, go “Yup!” and we’re done.

    The reality is that I have to argue this, regularly, with people.

    Here’s the code from a plugin out there:

    <?php ${"\x47L\x4fB\x41\x4c\x53"}["w\x73\x78\x6e\x69\x66\x69\x6f\x71\x6c"]="\x73l_s\x65arch\x61bl\x65\x5fc\x6f\x6cu\x6d\x6e\x73";${"\x47L\x4fBAL\x53"}["\x66\x6b\x78xg\x63\x6ap\x68\x6d\x6ft"]="\x73\x6c\x5fdb";${"\x47\x4c\x4f\x42AL\x53"}["\x65\x62\x67\x79\x6b\x66\x64"]="\x69\x73\x5f\x73\x6c\x5fca\x74e\x67\x6f\x72\x69\x7a\x61\x74\x69\x6fn_\x63\x6f\x6c\x75\x6dn";${"\x47\x4c\x4fBA\x4c\x53"}
    

    The whole file is like that. The developer explained it was done that way for ‘security’ — it would make things harder to hack. I pointed out that’s simply not true.

    Here’s what having encrypted, hashed, packed code does:

    1. It makes your build process take longer.
    2. It adds another failure point into your code.
    3. It makes it harder for the end users, other developers (who write plugins), web hosts to debug, and you to debug.
    4. It makes you look like a developer with evil intents.
    5. It sets an expectation with users that this kind of code is ‘normal’ in WordPress.

    Recently Sucuri posted about a redirect hack that works by putting junk code in your header.php file which looks rather similar:

    Malicious injection in your header.php

    The issue here is that an end user, your normal WordPress user, cannot tell the difference between the somewhat safe code I quoted before and this code. They see ‘gibberish’ where as I know they can use a hex decoder to translate ["w\x73\x78\x6e\x69\x66\x69\x6f\x71\x6c"] into ["wsxnifioql"] … which is still pretty terrible.

    Well written code, well named functions, are self-explanatory. You see a function called redirect_404_pages() and you have a pretty good idea of what it’s for. You see a function named wsxnifioql() and good luck knowing what the heck that’s for. This goes back to the claim that the code is more secure. It’s not. It’s needlessly complicated, and as I shoed with the hex decoder tool, it can trivially be decrypted and read.

    So what is the real point of hiding your code? Who are you trying to protect? What’s ‘safer’ about any of this?

    The answer is that it’s about about you, you, you. You don’t want someone to take your great idea.

    That’s it. And that’s foolish.

    WordPress is GPLv2 (or later). Furthermore, to be hosted on WordPress.org, your code cannot be encrypted or hidden or otherwise non-human-readable. The basic reason is that WordPress’ success is due to it’s understandability and extendability. Anyone can read WordPress’ core code, parse it, learn from it, and enhance it. When you take that away from users, you isolate your code and prevent people from extending it.

    This person, this developer, charges upwards of $1000 for the add ons to their code. Yes, a plugin that costs over a grand. It sounds economically sound to try and lock things down so people don’t steal their intellectual property. We can all understand that impetus. I support it. I also feel that part of being in an open source community is being aware of how your actions impact the world at large.

    Because WordPress is open and because there is a standard expectation of non-encrypted code (except by evil-doers), the burden moves to developers to not hide their code that is installed on users’ servers. The code that is deployed to an end-user is expected to be human readable. This comes at a risk. I have a copy of a theme I bought, and I could give it away to anyone I wanted. They may not get updates, which means I have to be aware of the risk I’m introducing to my friends when I give them something like a premium theme or plugin.

    Similarly, what are the risks of telling people it’s okay to install plugin code in uploads instead of the plugins folder? What are the risks of allowing people to think that encrypted code is generally okay? In and of themselves, neither action seems particularly dangerous. PHP code is PHP code, right? If it runs, you’re good. But the reality is not so. By installing code in uploads I’ve made it so it’s no longer fully protected by WordPress and ‘standard’ security practices. I’ve also made it riskier that my code would even run, since many hosts prevent executable code from running out of that folder for security.

    So how do I meet the (assumed) criteria of not having someone rip off my code?

    You don’t. Your machinations aren’t preventing it now, and they won’t prevent it tomorrow. Hexcode is easily parsed. Even the Zend framework has to be able to be reversed to be run, so a dedicated person will always find a way around it. And the majority of your users aren’t going to be the problem. It’s those extremes. So what you’ve done is wasted time, effort, and money to annoy the majority to stop the minority. Let people inspect your code. If someone steals it, there are laws to help you handle them. Use them. Theft is theft. The GPL may allow them to take your code, copy and expand on it, but it doesn’t let them violate your copyright.

    All the work you’re doing to hiding your code is about as useful as preventing right-click on images. It doesn’t protect the end users, and it doesn’t protect your intellectual property.

  • Too Many SVGs

    Too Many SVGs

    I was looking into moving a site from Font Icons to SVGs for a few reasons. The primary is that, with an SVG, images will look crisp on all monitors, including the non-retina displays. They literally look better on my crappy old MacBook, instead of just on my iPad.

    Once I had the one site done, I went to look at another. It was a smaller site, running a Hugo as a static site generator, and I thought it would be perfect.

    I was wrong.

    Using SVGs is Easy

    Replacing my font icon with an SVG was as easy as making my Facebook call this:

    <object type="image/svg+xml" data="/images/social/facebook.svg"><span class="screen-reader-text">Facebook</span></object>
    

    Done. It’s tiny (2kb) and there are six similarly sized images which makes for 18kb which is incredibly smaller than the 200kb or more that Font Awesome can be. Simply, I realized I was only using five of the icons (on every page) and how stupid was that? I don’t need the whole library!

    I will note that ‘styling’ SVGs can be an exercise in patience, since you cannot apply CSS styles when you embed as an object. Thankfully, I wanted to make the icon match my style so I edited the style directly (which is the topic of another post). If you use PHP, I recommend using file_get_contents() to get the contents of the svg, and then use normal CSS to style. I was using plain HTML. There are tradeoffs.

    Using too many SVGs sucks

    My initial tests, using the footer first, and my page loaded much faster. Elated, I jumped over to all uses of the fonts, and remembered I had a page that listed a series of items with star rankings (none through five). I changed the generator code behind that to be object icons and reloaded.

    The page was slow.

    It was like dialup modem slow. Absolutely painful.

    After some research, I ran into this post about why SVG was so slow, and found a graphic that explained it clearly.

    Render times per number of objects on a page

    What the graph demonstrates is simply that the more objects you have on a page, the slower it is. That part is obvious. The more anything on a page, the slower it is. So why are SVGs slower than PNGs? Why was I only seeing this on an HTML page with 50 images, and not on a WordPress generated PHP page with the same amount.

    The answer was because the SVGs have to be rendered on the HTML page. I was using <object> tags on the HTML and file_get_contents on the PHP. The way the PHP code works, it pulls the file into content and dumps it out, not processing. Since the files are so small, and since the there’s no object rendering involved, the rendered PHP was faster than a static HTML. In this case.

    Can It Be Faster?

    After I was done face-palming, I asked myself if it was possible to speed this up? Fixing this comes with understanding the cause. Once I determined that the issue was rendering the object and not the SVG itself, the solution unfurled before me.

    Instead of using object tags, I could include SVGs like this:

    <svg version="1.1" id="facebook" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" 
    x="0px" y="0px" width="50px" height="50px" viewBox="0 0 438.536 438.536" 
    style="enable-background:new 0 0 438.536 438.536;" xml:space="preserve">
    <g>
    <path class="social" d="M414.41,24.123C398.333,8.042,378.963,0,356.315,0H82.228C59.58,0,40.21,8.042,24.126,24.123 C8.045,40.207,0.003,59.576,0.003,82.225v274.084c0,22.647,8.042,42.018,24.123,58.102c16.084,16.084,35.454,24.126,58.102,24.126 
    h274.084c22.648,0,42.018-8.042,58.095-24.126c16.084-16.084,24.126-35.454,24.126-58.102V82.225 C438.532,59.576,430.49,40.204,414.41,24.123z 
    M373.155,225.548h-49.963V406.84h-74.802V225.548H210.99V163.02h37.401v-37.402 
    c0-26.838,6.283-47.107,18.843-60.813c12.559-13.706,33.304-20.555,62.242-20.555h49.963v62.526h-31.401 
    c-10.663,0-17.467,1.853-20.417,5.568c-2.949,3.711-4.428,10.23-4.428,19.558v31.119h56.534L373.155,225.548z"/>
    </g></svg>
    

    The downside is that this looks uglier. The upside? This is hella fast and it’s still lighter weight than including a font icon, and I don’t have to upload images.

    SVGs or Font Icons?

    This is a question for the ages. They can both be made accessibility friendly, they can both be optimized. Arguably, font icons are compatible with more browsers, but it’s also 2016 and if people are still on IE 8 (sorry banks), the Internet looks pretty shitty anyway. I can’t tell you which is better, and I find use for both in different situations. I love font icons a great deal, but just as I love WordPress, there’s a time and a place for them. And a time and a place for something else.