Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: development

  • Open Source Doesn’t Mean Public

    Open Source Doesn’t Mean Public

    Someone made a vague implication that my post about licenses were shots fired from someone who doesn’t ‘do’ but is only an ‘observer.’

    This is quite inaccurate, though I don’t blog about it here and I don’t talk about it anywhere for one simple reason. I can’t. I signed a paper, years ago, that agreed the work I did for them would be private. I would neither reuse the code (which I can’t anyway) nor would I discuss it. In fact, I had to make a phone call to ask if I could blog about it in general. I understand why someone might assume I’m not speaking from experience, but that just makes an ass of you and me.

    This isn’t about me refuting or dismissing allegations from someone who, for whatever reason, dislikes me and likes to make their hate public. No, this is about the interesting predicament about what happens when you can’t release information about your code.

    Half Open

    Here’s your scenario. The front end is an open system, a plugin say, that one installs on WordPress. It’s GPLv2 (or later) compatible because it’s distributed code I want to put on WordPress.org. That right there is a requirement. Alright, so I have one half of a product that is GPL and Open Source. The other half lives on a server somewhere in the world and does all the backend work. The plugin? It just passes API data too and fro as needed.

    I just described Akismet.

    You and I know very little about how Akismet works on the backend. And here’s the thing, that’s how it should be. We have a lot of information on how to interact with the Akismet API but none about how it actually calculates what is and isn’t spam on the back end. I repeat – this is the way it should be.

    Look at what Akismet does. It magically identifies spam. While it’s all well and good to be open source, the very first thing that would happen if they opened up all their code is we would see spammers read it and subvert it.

    But then again, we have things like SpamAssassin, an open source product I use on my email servers. Does this mean SpamAssassin is too dangerous to use? Does it mean it should be avoided? No, absolutely not! While it’s far from perfect, SpamAssassin does a phenomenal job at catching and stopping spam. But at the same time, it’s imperfect and being public, it’s more likely to be subverted by clever spammers. Thankfully the things it checks for are parts of email that a clever server admin can protect from and, all in all, it’s useful.

    Half Closed

    If we accept the fact that having a code base open or closed actually has very little impact on it’s usability, then why do we lock down our systems? That’s easy. Security and profit.

    Profit is the easy answer. If a system is closed then you can’t download it and install it for yourself. This means if you want to use it, you have to pay. Again, we can look at Akismet and VaultPress, which I would wager actually are built on open source code, as examples. They don’t have to be free, after all. There’s nothing wrong with being closed, either.

    By making a system a closed system that no one sees the backend code for, we create a product where only people who have access to the source code can easily infiltrate. This, of course, offers no assurance that it will never be hacked, only it raises the bar and makes it harder to deal with when it does get hacked. But at the same time, it is harder to hack an unknown than a known, and it does make things somewhat safer.

    Of course, if I told you all the ATM code in the world was not only open source but freely distributable and it was out there right now, how would you feel? That probably filled you with a little dread, thinking about how much trouble we already have with card skimmers and ATMs. If we have people who already know how to jack in, how much worse could it be if they knew how to encode software into the fake cards they make, and use them to backdoor your accounts?

    Have Your Cake And Eat It Too

    Just because the code you work on is open source doesn’t mean you can talk about it in public. Just because the code is closed doesn’t mean you can’t.

    I’m not talking about licenses here, though, I’m talking about contracts. I signed a paper about certain code I’ve written that prevents me from discussing it. So while I’d love to tell you everything about everything I’ve worked on, I can’t. But that’s not a bad thing. I’ve been privileged to work on the open and the closed, and it’s given me a greater appreciation and understanding of when we should and shouldn’t open our work. And this comes down to understanding the nature of the risks involved.

    Things like ATMs, financial trading, and mortgages should be secured and private. Why? Because the risk is much too high. A license? Well a worst case scenario is that someone figures out how to backdoor a free license for themselves. Another is they figure out how to use someone else’s license to gain access to their information. Those are pretty bad. So if you want to make your license API open but the code behind it not, I support that call.

    But. I do think you should have a way to manage your licenses and updates. That’s just business sense.

  • On The Fly Checklists

    On The Fly Checklists

    When you do the same thing over and over, the best thing to do is automate it. We all know that.

    However there are simply some things you cannot automate. As horrible as that is to say in today’s world, you cannot automate things like “Proofread content for errors.” Why? Well a computer only knows to look for what you tell it to look for. And if you habitually typo homophones, you’re in for a long day.

    That’s why we have things like checklists. Do X, check Y, etc etc. They’re there to ensure when we run the automated steps, we do everything properly.

    I spent the first half of April generating Pre-Flight Checklists for a lot of things. How to upgrade things at DreamHost, how to update things at WordPress.org, how to migrate from X to Y, and so on and so forth. The sad thing is that I’ve had to do all these for long standing processes which exist mostly in someone’s head.

    Here are some tips to how I do it.

    Write Your Checklist As You Do It

    If it’s a process you know, have a document open (or a piece of paper) and write as you go, enumerating every single step. Be pedantic.

    On Your Computer:

    1. Generate list of X
      1. Go to Y and click BUTTON
    2. Copy list to DOC
    3. Run TOOL to output new file
    4. Add new file to repository
      1. $ git commit -a FILENAME
      2. $ git push

    On Remote Server:

    1. SSH to 123.45.67.89 as your ID
    2. Go to folder BAR
      1. cd /home/blah/foo/bar
    3. Dryrun sync command
      1. sync bar baz –dry-run

    You get the idea. Like I said, be specific and get every single thing you do. Don’t worry about getting everything perfect. This is about getting all the information.

    Walk Through Your Checklist

    Once you have it up, walk through it and do everything (except the actual code push parts). This will show you how you can clean it up and what needs to be explained better. I find it helpful to ask myself “If I was new, would I know what ‘the remote server’ is?” If the answer is no (which it often is), I become more pedantic and specify what server to connect to and exactly how.

    For example instead of “Connect to Servername” I’ll put in this:

    • Connect to production server
      • ssh servername — use your ID and password

    Using the different font helps me to know it’s a command and not directions. Any time you feel something is vague, explain it. Put in examples of output. Don’t be afraid to break the ‘flow’ of a checklist to ensure clarity.

    Ask Someone Who Knows to Proof

    If this is a process someone else has done, grab them and have them skim it for anything obviously wrong. Often I’m writing and refining a checklist for something I don’t actually do, but I watch regularly. With that mindset, I’m able to write from a place where I know enough to know what has to be clear to someone new. A place of no assumptions.

    That person should question your claims. if you say “Check X” and they come back and ask you “Why are you doing X in step 15? You do the work in step 4. Why not do them together?” That’s a good thing! You want to be clear, but if there’s work that’s duplicated, you can save yourself and the future-you time. Also that person is going to be the one who says “We should explain why we do this…” They’re going to teach you more about the process.

    Ask Someone Who Doesn’t Know to Proof

    This is harder. You need someone who knows ‘enough’ that you’re not explaining what SSH is, but doesn’t automatically make assumption. This may be why I end up on checklists a lot. I pestered the hell out of my coworker, Mike, and asked him to explain things in broad terms and then nitty gritty. I asked him to walk me through the steps, then I wrote them down and turned around to someone else and asked “Does this make sense?”

    Why? Because Mike was too close to it to see the forest for the trees, which is a good thing! He knows it all! I was fairly close to it, enough to possibly get in over my head. By grabbing a total novice, we had the trifecta of brilliance. That third person asked “What is this?” and noted multiple things that might be wrong.

    Release and Iterate

    You’re going to miss things, so every time you do the process, have the red pen ready.

  • Null and Zero

    Null and Zero

    This is an amusing anecdote.

    When I was working on my cross-check of shows and characters, I got everything working right except one thing. I noticed my count of ‘shows with death’ made a weird pie chart. You see, the number of shows was off by one compared to the total number of shows. I went back and forth, trying to figure out why it was doing that, and in the end I decided that I’d output the list and manually count to see what I was missing.

    I counted, by hand, the shows that I got with the list of ‘all characters are dead’ and that number matched what the code output. Then I subtracted that from the number of ‘shows with any death’ to get the ‘some characters are dead’ count, and that too was okay. This meant that, for some reason, the list of shows where no one died was wrong. But it was subtraction! How could it be wrong!

    Well as it turns out, I forgot that null and zero aren’t the same to a computer, but they tend to be to a human’s brain.

    Now I know this! I know that zero is a number, it’s a value of the known quantity of zero. And I know that ‘null’ is a non value, meaning that it’s a data point that cannot yet be known.

    You can’t math on null. And I knew this. Earlier in my code I’d written $foo = 0 outside of my loop where I check and, as needed, increment it with $foo++ specifically because you can’t add to null.

    Nothing from nothing, carry the nothing…

    But. In my calculations, I checked the following:

    1. If a show has been flagged with ‘death,’ count all the characters.
    2. For each character, if they’re dead, add 1 to the death count.
    3. If the number of characters is the same as the death count, add this show to the list of ‘kill ’em all.’
    4. If the number of characters is greater than the number of dead, and the value of either isn’t 0, add the show to the list of ‘kill some.’

    Then, separately, I checked “If the show has not been flagged with death, add the show to the ‘kill none’ list, because that’s a 1/0 check. And when you don’t think too deeply about that, it sounds fine, right? If they aren’t marked with death they must kill none. And if they don’t kill them all, and they don’t kill some, then the number should be the same as the ‘kill none’ list.

    The problem was that I had one show, just one, where all the characters were alive but it had been flagged with death. This was not an error. The show did kill off a character, but I’d not added the dead character yet. Which meant it ran through the checks like this:

    1. Flagged with death, we count all the characters (8).
    2. If the character is dead, add one to death count (death count is 0).
    3. If the number of characters (8) is the same as the dead (0), add to ‘kill them all’ (false).
    4. If the number of characters (8) is more than the dead (0) (true), add the value of either isn’t 0 (false), add them to ‘kill some.’ (false).

    Which isn’t wrong at all. It just meant that the show didn’t get listed on ‘kill none’ because it was flagged with dead, and it didn’t get listed on ‘kill them all’ because 8 is more than 0, and it didn’t get listed on ‘kill some’ because while 8 is more than 0, dead was 0.

    Oh silly me. The value of death characters was ‘null.’ They simply didn’t exist in a place where they should have.

    The fix? I added the dead character to the site and everything was fixed. But code wise, could I prevent this? I certainly should, since I know better than many that you can’t predict what users are going to do. So it begs the question of why was I checking if character count and dead count were not 0 to begin with? It was, originally, because my ‘no deaths’ list was screwed up and I threw that check in to omit cases where a show was flagged as death but didn’t have death.

    The accuracy of these things depends entirely on your data. Garbage in, garbage out. The best fix would be to check ‘if a show has death and it has no dead characters, adjust the dead count down by one’ but also ‘if a show has no death, but it has dead characters, adjust the live show count down by one.’ I haven’t done that yet. I will soon.

  • Combining Data from Multiple CPTs

    Combining Data from Multiple CPTs

    I wanted to get a list of all TV shows where 100% of the listed characters were dead.

    @Ipstenu … Did I just read that you’re using WordPress to compose a list of dead lesbians in media? I have to say, that’s kind of unique.
    Otto42

    Yes, yes I am. Television media, excluding reality TV.

    The problem is I store the information in two Custom Post Types (shows and characters). And while both shows and characters have a way to track if there are dead, getting that list was frustratingly complicated.

    We wanted, originally, a way to mark a show as having death and a separate way to track each dead character. Since they were separate CPTs, we made two custom taxonomies: dead-character and death. Logically then, I could use WP Query to get a list of all shows with the cliche taxonomy field of ‘death’ checked:

    $dead_shows_query = new WP_Query ( array(
    	'post_type'       => 'post_type_shows',
    	'posts_per_page'  => -1,
    	'post_status'     => array( 'publish', 'draft' ),
    	'tax_query' => array(
    		array(
    			'taxonomy' => 'cliches',
    			'field'    => 'slug',
    			'terms'    => 'death',
    			),
    		),
    	)
    );
    

    I know I could use a different way to get terms, but I don’t want to since I need to count these shows:

    $count_dead_shows = $dead_shows_query->post_count;
    

    But I also need to continue processing. Once I have a list of shows with death, I need to get a list of all characters who are on the show. That data is stored as Custom Meta Data and those don’t have a quick and easy sort method. Worse, the shows are listed as an array.

    Originally it was just a number representing the post ID for the show, and that was pretty easy to check. While we have posts in the dead show query, get a list of all characters with this ‘show ID’ which looks like this:

    	while ( $dead_shows_query->have_posts() ) {
    		$dead_shows_query->the_post();
    		$show_id = $post->ID;
    
    		$character_death_loop = new WP_Query( array(
    			'post_type'       => 'post_type_characters',
    			'post_status'     => array( 'publish', 'draft' ),
    			'orderby'         => 'title',
    			'order'           => 'ASC',
    			'posts_per_page'  => '-1',
    			'meta_query'      => array( array(
    				'key'     => 'the_show',
    				'value'   => $show_id,
    				'compare' => '=',
    			),),
    		) );
    
    		if ($character_death_loop->have_posts() ) {
    			[... do all the magic here ...]
    			wp_reset_query();
    		}
    	}
    

    The problem is that compare line. Once you stop looking for “Does 123 == 123” you have to use ‘IN’ or ‘LIKE’ and, since this is an array of show IDs, we need 'compare' => 'LIKE' for this check. That sounds simple, but there’s one more small problem. If the show ID is 1, then every single show with a 1 in it shows up. In addition, I actually wanted to get a list of all the shows where some characters died, so I couldn’t just check for characters with the meta_query of the show and the tax_query of death.

    My first step in all this was to convert the shows to an array:

    if ( !is_array (get_post_meta($post->ID, 'lezchars_show', true)) ) {
    	$shows_array = array( get_post_meta($post->ID, 'lezchars_show', true) );
    } else {
    	$shows_array = get_post_meta($post->ID, 'lezchars_show', true);
    }
    

    Now I can process everything as an array and check if the show ID is really in the array. If it is, we’re going to record that there is a character in a show with death.

    if ( in_array( $show_id, $shows_array ) ) {
    	$chardeathcount++;
    }
    

    Next we’re going to check if the character has the tag ‘dead-character’ (listed as a custom taxonomy for ‘show cliches’) or not and is in the show:

    if ( has_term( 'dead-character', 'cliches', $post->ID ) && in_array( $show_id, $shows_array ) ) {
    	$fulldeathcount++;
    }
    

    Once we have that data, I need two arrays. The first is for shows where the count of all characters on the show is the same as the ones who are dead. The second is for shows where some characters are dead:

    if ( $fulldeathcount == $chardeathcount ) {
    	$fullshow_death_array[$show_id] = array(
    		'url'    => get_permalink( $show_id ),
    		'name'   => get_the_title( $show_id ),
    		'status' => get_post_status( $show_id ),
    	);
    } elseif ( $fulldeathcount > '0' && $fulldeathcount <= $chardeathcount ) {
    	$someshow_death_array[$show_id] = array(
    		'url'    => get_permalink( $show_id ),
    		'name'   => get_the_title( $show_id ),
    		'status' => get_post_status( $show_id ),
    	);
    

    The full code for that magical snippage looks like this:

    			$fulldeathcount = 0;
    			$chardeathcount = 0;
    
    			while ($character_death_loop->have_posts()) {
    				$character_death_loop->the_post();
    
    				if ( !is_array (get_post_meta($post->ID, 'lezchars_show', true)) ) {
    					$shows_array = array( get_post_meta($post->ID, 'lezchars_show', true) );
    				} else {
    					$shows_array = get_post_meta($post->ID, 'lezchars_show', true);
    				}
    
    				if ( in_array( $show_id, $shows_array ) ) {
    					$chardeathcount++;
    				}
    
    				if ( has_term( 'dead', 'tropes', $post->ID ) && in_array( $show_id, $shows_array ) ) {
    					$fulldeathcount++;
    				}
    			}
    
    			if ( $fulldeathcount == $chardeathcount ) {
    				$fullshow_death_array[$show_id] = array(
    					'url'    => get_permalink( $show_id ),
    					'name'   => get_the_title( $show_id ),
    					'status' => get_post_status( $show_id ),
    				);
    			} elseif ( $fulldeathcount >= '1' && $fulldeathcount <= $chardeathcount ) {
    				$someshow_death_array[$show_id] = array(
    					'url'    => get_permalink( $show_id ),
    					'name'   => get_the_title( $show_id ),
    					'status' => get_post_status( $show_id ),
    				);
    			}
    

    And yes, it works just fine.

  • Because Developers Never Typo

    Because Developers Never Typo

    It’s not a problem. Only admins can use this.

    I’d pushed back on a plugin that wasn’t validating their post input wisely. Instead they just slapped sanitize_text_field() around everything and called it a day.

    One of the myriad reasons I’ll push back on a plugin is improper sanitization. When I say that, I mean they need to sanitize, validate, and escape the input. If I see things like update_option('my_cool_options', $_POST['my_cool_input']); I’ll push back and tell them to please sanitize. But really I tell them this:

    SANITIZE: All instances where generated content is inserted into the database, or into a file, or being otherwise processed by WordPress, the data MUST be properly sanitized for security. By sanitizing your POST data when used to make action calls or URL redirects, you will lessen the possibility of XSS vulnerabilities. You should never have a raw data inserted into the database, even by a update function, and even with a prepare() call.

    VALIDATE: In addition to sanitization, you should validate all your calls. If a $_POST call should only be a number, ensure it’s an int() before you pass it through anything. Even if you’re sanitizing or using WordPress functions to ensure things are safe, we ask you please validate for sanity’s sake. Any time you are adding data to the database, it should be the right data.

    ESCAPE: Similarly, when you’re outputting data, make sure to escape it properly, so it can’t hijack admin screens. There are many esc_*() functions you can use to make sure you don’t show people the wrong data.

    I say it often. Sanitize everything (but especially what you save or process), validate input, escape output.

    I understand though, why someone might naively assume that since only admins can do a thing, it’s ‘safer.’ The truth is admins screw up as much as anyone else. Worse, probably, since admins have more power and often think they know better.

    But the point I was trying to make to this guy was that it doesn’t matter who is inputting the data.

    I’ve told this story before. I used to have a job testing software packages for software I didn’t use. We replied on ‘scripts’ from people to know what to test. One day, John C and I were trying to test some new software and every time we hit a certain screen, we’d crash the box. We tried over and over and it failed. So John called the vendor and explained what we were doing. They walked us through it and it crashed. Since we were a VIP, they said they’d send over a couple developers. When the two guys showed up, one watched us very carefully and was shocked.

    The young dev exclaimed, “Why would you ever input wrong data there?!”

    I eyeballed him. “Why would putting in wrong data crash the computer?”

    The older dev chuckled. “We’ll put in an error check there.”

    The lesson I learned is simple. Never trust input.

  • Dependency Disaster

    Dependency Disaster

    Over the last few weeks and months, the nightmare that is WordPress plugin dependency hell has waxed and waned with the ire of a thousand burning suns. It flared up when I, perhaps naively, posted a reminder of a rarely called upon guideline to not submit Frameworks for hosting on WordPress’s official plugin repository.

    This brought up the perfectly valid arguments that these repositories are useful as they allow for deployment and updates that were hitherto unavailable when rolled up into one package. And at its heart, I understand their point and agree with them as to the benefits. But since WordPress core doesn’t have true plugin dependencies at this time, it’s exceptionally complicated and a hassle to include them.

    Personally, I feel that the NPM or Composer style development of plugin is the way to go. Development. Not deployment. With few exceptions, I feel the best way to release a plugin is all in one, rolled up together. The exceptions are, of course, plugins that are add-ons to others, like a plugin that adds a new email integration hook to MailChimp, or a new payment gateway for WooCommerce.

    The rest of our plugins should be self contained, reliant on naught but ourselves. I have two plugins which contain the same AWS SDK libraries, written in a way to check if the library is already loaded and to make sure it only loads what’s needed. Why did I do that? Because then someone can have one or the other or both without conflicts.

    The user doesn’t have to care about dependancies. They are invisible. That is as it should be. Users don’t have to care.

    But there’s also a danger with dependancies, as recently came to light in the JS world. Azer Koçulu liberated his modules from NPM after the threat of a trademark based lawsuit had NPM remove one of his projects.

    Sidebar: Open Source has no right to impinge on trademark law. Period. However some lawsuits are frivolous and daft and should be contested. Sadly most communities (NPM, WordPress.org, etc) do not have the money or resources to fight that battle for a developer. If you chose to fight, please contact the EFF.

    As is his right, Azer pulled all his packages from NPM. The fall out from this package removal is that a lot of automated builds failed. This has to do with the way Composer is often bundled. Instead of a wrapped up package like an exe or a dmg, it’s a stub that reaches out and downloads all its requirements on the fly. Just like the TGM Plugin Installer. Right? And what happens when those requirements are missing? The build fails.

    Perhaps worse, by unpublishing the name slugs used can be taken over by anyone else, and used to push more nefarious code. This did not happen. NPM checked everyone and verified they were as decent as one can before handing over the old names to new owners.

    My first thought was “Could this happen to WordPress?”

    Yes and no. First up, we don’t reuse names. If you have the plugin ‘foobar’ and ask to have it closed, the name is still reserved. In extremely rare cases we’ve turned over names to trademarked owners (like how Facebook owns the facebook slug) but always with communication of both parties and always with awareness to how many users might be impacted.

    But could we pull a plugin for trademark infringement and screw up package dependancies? You bet your ass.

    We’ve been lucky that all legal parties involved in similar arguments have been accepting of the ‘grandfathered’ ruling. That’s why we still have plugins with slugs like ‘google-analytics-by-joe’ out there. But it’s also why we don’t allow any more. And yes, when a plugin with a unique name is submitted, we do take the time to check if it’s already trademarked. To spare people this drama.

    But yes. It could totally happen. And since we have to name our dependancies and rely on those slugs, there’s no easy way out.

    I suggest reading the Hacker News thread on the matter. They weigh both sides of the issue, and show you how the pros and cons make this even more complex.