Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • Linear Regressions in PHP

    Linear Regressions in PHP

    Sometimes math exists to give me a headache.

    In calculating the deaths of queer females per year, my wife wondered what the trend was, other than “Holy sweat socks, it’s going up!” That’s called a ‘trendline’ which is really just a linear regression. I knew I needed a simple linear regression model and I knew what the formula was. Multiple the slope by the X axis value, and add the intercept (which is often a negative number), and you will calculate the points needed.

    Using Google Docs to generate a trend line is easy. Enter the data and tell it to make a trend line. Using PHP to do this is a bit messier. I use Chart.js to generate my stats into pretty graphs, and while it gives me a lot of flexibility, it does not make the math easy.

    I have an array of data for the years and the number of death per year. That’s the easy stuff. As of version 2.0 of Chart.js, you can stack charts, which lets me run two lines on top of each other like this:

    var myChart = new Chart(ctx, {
        type: 'bar',
        data: {
            labels: ['Item 1', 'Item 2', 'Item 3'],
            datasets: [
                {
                    type: 'line',
                    label: 'Line Number One',
                    data: [10, 20, 30],
                },
                {
                    type: 'line',
                    label: 'Line Number Two',
                    data: [30, 20, 10],
                }
            ]
        }
    });
    

    But. Having the data doesn’t mean I know how to properly generate the trend. What I needed was the most basic formula solved: y = x(slope) + intercept and little more. Generating the slope an intercept are the annoying part.

    For example, slope is (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2) where,

    • x and y are the variables.
    • b = The slope of the regression line
    • a = The intercept point of the regression line and the y axis.
    • N = Number of values or elements
    • X = First Score
    • Y = Second Score
    • ΣXY = Sum of the product of first and Second Scores
    • ΣX = Sum of First Scores
    • ΣY = Sum of Second Scores
    • ΣX2 = Sum of square First Scores

    If that made your head hurt, here’s the PHP to calculate it (thanks to Richard Thome ):

    	function linear_regression( $x, $y ) {
    
    		$n     = count($x);     // number of items in the array
    		$x_sum = array_sum($x); // sum of all X values
    		$y_sum = array_sum($y); // sum of all Y values
    
    		$xx_sum = 0;
    		$xy_sum = 0;
    
    		for($i = 0; $i < $n; $i++) {
    			$xy_sum += ( $x[$i]*$y[$i] );
    			$xx_sum += ( $x[$i]*$x[$i] );
    		}
    
    		// Slope
    		$slope = ( ( $n * $xy_sum ) - ( $x_sum * $y_sum ) ) / ( ( $n * $xx_sum ) - ( $x_sum * $x_sum ) );
    
    		// calculate intercept
    		$intercept = ( $y_sum - ( $slope * $x_sum ) ) / $n;
    
    		return array( 
    			'slope'     => $slope,
    			'intercept' => $intercept,
    		);
    	}
    

    That spits out an array with two numbers, which I can plunk into my much more simple equation and, in this case, echo out the data point for each item:

    foreach ( $array as $item ) {
         $number = ( $trendarray['slope'] * $item['name'] ) + $trendarray['intercept'];
         $number = ( $number <= 0 )? 0 : $number;
         echo '"'.$number.'", ';
    }
    

    And yes. This works.

    Trendlines and Death

  • (Slightly) More Performant WP Queries

    (Slightly) More Performant WP Queries

    One of the things 10up lists as a best engineering practice is this:

    Do not use posts_per_page => -1.
    This is a performance hazard. What if we have 100,000 posts? This could crash the site. If you are writing a widget, for example, and just want to grab all of a custom post type, determine a reasonable upper limit for your situation.

    This is a very valid point, but I found myself stymied at how to work around it in a case where I knew I needed to check all posts in a custom type. And worse that post type was growing every week by 10 to 20. In my case, the reasonable upper limit was an unknown that was also unpredictable. But an endless loop would also be bad.

    One of the other recommendations from 10up is not to run more queries than needed. I was already using no_found_rows => true to prevent counting the total rows, as it’s really only necessary for pagination. I also force in the post type I’m scanning, which again limits how many possible items will be queried. And yes, I have update_post_meta_cache and update_post_term_cache set to false as in most cases those aren’t needed either.

    But what could I do to make the actual query of getting how many posts of type A had a post meta value that matched post type B? And to make it worse, I could have multiple values in the post metas. It’s really a case where, in retrospect, making them into a custom taxonomy might have been a bit wiser.

    What I decided to do was limit the number of posts queried based on how many posts were in the post type.

    posts_per_page => wp_count_posts( 'custom_post_type' )->publish;

    I’m not quite concerned with 100,000 posts, and I have some database caching installed to mitigate the load. But also I have set an upper limit and this feels less insane than the -1 value. Since I had to generate the count anyway for displaying statistics, I moved that check to a variable and called it twice.

  • Access: Denied or Granted?

    Access: Denied or Granted?

    One of the topics I discuss regularly with developers it that of access. When writing any code, we must always consider who should have access to it. Not in the ‘who can look at my code?’ aspect, but that of who can run the code.

    This happens a lot with plugins and themes when people create their own settings pages. While I’m the first to admit that the settings API in WordPress is a bag of wet hair that makes little logical sense, using it gives you access to a lot of built in security settings, such as nonces.

    But as I often tell people, a nonce is not bulletproof security. We cannot rely on nonces for authorization purposes.

    What is a Nonce?

    A nonce is a word or expression coined for or used on one occasion. But in security, a nonce is an arbitrary number or phrase that may only be used once. Due to it’s similarity to a nonce word, it reuses the name. In WordPress, it’s a pseudo-random number created on the click of (say) a save button, to ensure that someone had to press the button in order to run the function.

    For example. If your settings page has a nonce, then the nonce is generated on click and passed to the function(s) that validate and sanitize and save. If the nonce doesn’t check out, then WordPress knows someone didn’t press a button, and it won’t run the save. This prevents someone from just sending raw data at your code.

    Why isn’t that enough?

    With just a nonce, anyone who has access to the page can save data. This is okay when you think about something like a comment form or a contact form. Those are things that need to be accessible by non logged in users, right? What about the WordPress General settings page? The one that lets you change the URL of the website? Right. You only want that to be accessible to admins. Imagine if all logged in users could change that one. Yikes!

    In order to protect those pages, you have to consider who should have access to your code.

    1. Are the users logged in or out?
    2. What user permissions should be required?
    3. How much damage can be caused if the data is changed?

    That’s it. That’s all you have to ask. If it’s a contact form, then a nonce is all you need. If it’s a site definition change, then you may want to restrict it to admins or editors only.

    The Settings API

    When you use the settings API, some of this is made pretty straightforward for you:

    add_action( 'admin_menu', 'my_plugin_admin_menu' );
    function my_plugin_admin_menu() {
        add_options_page( 'My Plugin', 'My Plugin', 'manage_options', 'my-plugin', 'my_options_page' );
    }
    

    In that example, the third value for the options page is manage_options which means anyone who can manage site options (i.e. administrators) can access the settings page.

    The problem you run into is if you want a page to do double duty. What if you want everyone to see a page, but only admins can access the settings part? That’s when you need to use current_user_can() to wrap around your code. Just check if a user can do a thing, and then let them in. Or not.

    What Permissions are Best?

    In general, you should give admins access to everything, and after that, give people as little access as possible. While it may be annoying that an editor can’t, say, flush the cache, do they really need to? You have to measure the computational expense of what’s happening before you give access to anyone. It’s not just “Who should do a thing?” but “What are the consequences of the thing?”

    Look at the cache again. Flushing a cache, emptying it, is easy. But it takes time and server energy to rebuild. By deleting it, you force your server to rebuild everything, and that will slow your site down. An admin, who has access to delete plugins and themes, should be aware of that. An editor, who edits posts and menus, may not. And while some editors might, will all?

    This gets harder when you write your code for public use. You have to make hard decisions to protect the majority of users.

    Be as prudent as possible. Restrict as much as possible. It’s safer.

  • You Are Not Psychic

    You Are Not Psychic

    The other day I hear someone mention that they were securing software that they didn’t even have the words for 5 years ago. That reminded me of how fast technology moves. Things that didn’t exist last year are vulnerable today, and the discovery of those things only happens faster as we invent weirder and weirder ways of reinventing the wheel.

    The Myth of Perfection

    A well known saying is that the perfect is the enemy of the good. We take that to mean that if we wait until a thing is perfect, we will never feel it is done enough and ready enough for the world. In Open Source technology we’re fond of releasing and iterating, accepting the lack of perfection and aiming instead for the minimum. What is ‘good enough’ to let people use our work and improve on it?

    Perfection doesn’t exist. There is nothing on the planet that is perfect and there never will be. But accepting this doesn’t mean we allow imperfection and danger into our lives.

    The Balance of Reality

    Everyone screws up code, no matter how awesome a professional you are. Accept it 🙂

    I said that nearly three years ago, and I’ve argued before that it’s okay to screw up. I really do believe that making mistakes isn’t a bad thing. But at the same time, many people understood that to mean I was alright with shipping code that I knew was bad. This is not at all the case.

    If you know your code is bad or insecure, you fix it. Period. You don’t let things you know are bad out the door. But you do release things that work and perhaps lack all the features you want. There’s a difference between releasing bad code and releasing imperfect code.

    The Attack of Security

    To turn this on its ear a little, if someone comes up to you and says “This code is wrong, you should do this other thing.” then you have a choice. You can listen and believe them and study if they’re right and test it and fix it, or you can ignore it.

    When someone you respect tells you those things, you’re inclined to believe them. But at the same time, your heart takes a hit because “this code is wrong” sounds a lot like “this code is bad.” And that sounds like “you wrote bad code” and that feels like “you’re a bad coder.”

    That slope slipped right down, didn’t it? It’s a reality. It’s in our nature to take admonishments of our work as a personal attack, even if they are rarely meant that way. I can count on my hands the number of times I’ve actually told someone they were a bad coder. I can count on one hand the number of times I’ve told someone who wasn’t me that they’re a bad coder. I’ve certainly said it to myself a lot. I believe I’ve told one person directly that I thought they were a bad coder.

    I don’t think they actually understood what I meant… Which says something.

    The Shield of Arrogance

    We are not psychic. If you are, please let me know who to bet on for the World Series. Being humans, and therefore fallible, we cannot be so arrogant as to presume we know all the possible ways our code might be vulnerable. Technology moves so fast that what looks safe today may turn out to be terrible dangerous tomorrow.

    Knowing this, knowing we are imperfect, we know that our fellow humans are also imperfect. The greatest danger to our security is ourselves. And that means, as developer writing code to be used by others, it’s incumbent upon ourselves to protect our fellow humans from innocent mistakes.

    You are not psychic

    You don’t know what will be insecure next. You can’t. So secure your code as best you can and secure it better if people point out your shortcomings. Learn. Improve. Protect.

  • Datepicker and a Widget

    Datepicker and a Widget

    Last week, I worked on making a plugin that would safely and smartly allow for a date selection, and not permit people to put in junk data. I had my code ‘clean’ and only accepted good data, there was one question remaining. How do I make it look GOOD?

    Let’s be honest, pretty data matters. If things look good, people use them. It’s that simple. This let me play with another aspect of coding that I don’t generally look at. Javascript.

    What Code Do I Need?

    There are a lot of ways to tackle this kind of problem. If you wanted to just list the months and days, and have those be drop-downs, you could do that. The option I went with was to have a calendar date-picker pop up, as I felt that would visually explain what the data should be.

    To do that I needed a date picker jQuery script (which is included in WordPress core) and a second script to format the output.

    My Script

    This part is really small:

    jQuery(function() {
        jQuery( ".datepicker" ).datepicker({
            dateFormat : "mm-dd"
        });
    });
    

    All it does is force the format to be “mm-dd” – so if you picked the date, that’s what it would be.

    Enqueuing the Scripts

    In order to make sure the scripts are only loaded on the widgets page, my enqueue function looks like this:

    	public function admin_enqueue_scripts($hook) {
    		if( $hook !== 'widgets.php' ) return;
    		wp_enqueue_script( 'byq-onthisday', plugins_url( 'js/otd-datepicker.js', __FILE__ ), array( 'jquery-ui-datepicker' ), $this->version, true );
    		wp_enqueue_style( 'jquery-ui', plugins_url( 'css/jquery-ui.css', __FILE__ ), array(), $this->version );
    	}
    

    The CSS is because, by default, WordPress doesn’t include the jquery UI CSS.

    Calling the Scripts

    In the widget class, I have a function for the form output. In there, I have an input field with a class defined as datepicker, which is used by the jquery I wrote above, to know “I’m the one for you!”

    	function form( $instance ) {
    		$instance = wp_parse_args( (array) $instance, $this->defaults );
    		?>
    		<p>
    			<label for="<?php echo esc_attr( $this->get_field_id( 'title' ) ); ?>"><?php _e( 'Title', 'bury-your-queers' ); ?>: </label>
    			<input type="text" id="<?php echo esc_attr( $this->get_field_id( 'title' ) ); ?>" name="<?php echo esc_attr( $this->get_field_name( 'title' ) ); ?>" value="<?php echo esc_attr( $instance['title'] ); ?>" class="widefat" />
    		</p>
    
    		<p>
    			<label for="<?php echo esc_attr( $this->get_field_id( 'date' ) ); ?>"><?php _e( 'Date (Optional)', 'bury-your-queers' ); ?>: </label>
    			<input type="text" id="<?php echo esc_attr( $this->get_field_id( 'date' ) ); ?>" name="<?php echo esc_attr( $this->get_field_name( 'date' ) ); ?>" class="datepicker" value="<?php echo esc_attr( $instance['date'] ); ?>" class="widefat" />
    			<br><em><?php _e( 'If blank, the date will be the current day.', 'bury-your-queers' ); ?></em>
    		</p>
    		<?php
    	}
    

    Making it Pretty

    To be honest, once I got the JS working, I left the default CSS alone. Why? Because I’m a monkey with a crayon when it comes to design. The default worked fine for me:

    The Default Date Picker

    It does make me think that it would be nice if WordPress included their own customize datepicker colors in the admin colors, but I understand why they don’t. Not everyone or even most people will ever need this.

  • Names, Short Names, and More Names

    Names, Short Names, and More Names

    I was working on a side project with Tracy Levesque of Yikes! and she lamented at my readme. I had totally half-assed it and I knew it, so she cleaned it up and asked me “What are the shortcodes?” I told her and about a second later she suggested two new names. Hers were better.

    Functionality Based Names

    I have a habit to name things what they are. I made a shorcode, for example, for a number of posts in a custom post type, and I called it [numposts] because that made sense. But when I added in a new one for number of posts in a taxonomy, I made a second shortcode named [numtax] which is kind of silly isn’t it?

    My problem is that I think about each shortcode as it’s own, stand-alone entity. It’s a functional thing. It does a thing. A function should be named uniquely to be clear what it’s for.

    Usage Based Names

    Perhaps without meaning to, Tracy jolted my brain into thinking about not the developer but the end user. Now, in my head, I thought “The user would know which shortcode to use and doesn’t have to think about more.” But. She suggested this: [plugin-name data="type"]

    I stared at that for a moment and felt the light slap me in the face. I’d named the shortcode for their function, but not for the plugin they were in, which is akin to all those terribly named functions and classes I’m always ranting about. In short order, the plugin was fixed and I turned back to my posts code.

    One Name to Bind Them

    [numposts data="posts" posttype="post-type" term="term-slug" taxonomy="taxonomy-slug"]
    

    The code defaults to a data set of posts and a post type of post since those are the most common usages. After that it’s a fast check “Is this a posts data set or a taxonomy one?” and runs the same code it used to run, passing the data along.

    What’s Really The Difference?

    “But Mika, now people have to type in more!” I hear you say.

    They do. [numposts] defaults to posts like it always did. [numposts posttype="page"] is four characters longer. But with the four extra characters (and really I could have left that out) comes something simpler: people only have to think of one shortcode.

    If they want to count the number of posts, then they just count the number of posts and call it a day. There’s no headache of realizing they meant to get the number of posts with a specific taxonomy. It all just works with one. Remember your terms, and those didn’t change except for needing [numposts data="taxonomy" term="wordpress"] which actually makes it more obvious what you’re doing.

    I have no idea if Tracy meant for me to get this deep into this, but she also knows I spent an hour contemplating the fact that the word ‘read’ exists in multiple tenses, and it’s only by context that I actually know which one anyone meant.