Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: charts

  • Stacked Charts Part 2: Rebuilding the Array

    Stacked Charts Part 2: Rebuilding the Array

    I’ve talked about this before in category statistics, but in order to get the data from a simple array into a Chart.js consumable one, we have to rebuild the array.

    All Arrays are not Equal

    In order to save the data in a way I could use and reuse, I had to aim at the lowest common denominator. But also I had to save the arrays at a per show basis, which is not the same as what I was going to need to output.

    Instead of just outputting the averages for the show, I needed to combine all this into a ‘by nation’ statistic. That is, I needed to get a list of all shows that were associated with a taxonomy value for that country (easy) and combine all their arrays (not quite easy) and order the data in a way that would make sense (not easy).

    So again we start with understanding the array. Here’s a show that happens to air in Argentina:

    Array
    (
        [cisgender]    => 2
        [trans-woman]  => 0
        [trans-man]    => 0
        [non-binary]   => 0
        [gender-fluid] => 0
        [gender-queer] => 0
        [agender]      => 0
    )
    

    This is the data for one show. Argentina has 2, oddly both with the same stats breakdown by gender identity. What I need to do is loop through both those shows and add the arrays to be this:

    Array
    (
        [cisgender]    => 4
        [trans-woman]  => 0
        [trans-man]    => 0
        [non-binary]   => 0
        [gender-fluid] => 0
        [gender-queer] => 0
        [agender]      => 0
    )
    

    Get the Base Arrays

    Just like before, we make an array of the base data as we have it in the gender, sexuality, and romantic orientations. In this case, we’re adding in a query to change the order to be largest to smallest overall from the taxonomy. While this may not be true for all nations in the future, it is today:

    $taxonomy = get_terms( 'lez_nations' );
    foreach ( $taxonomy as $the_tax ) {
    }
    

    I need to pause here. Everything from here out goes in that foreach. We’re going to be looping for each nation in the list of nations. Now… I re-use this code for multiple taxonomies, so lez_nations is actually lez_' . $data and it dynamically changes based on how I call this function.

    On we go!

    	$characters = 0;
    	$shows      = 0;
    	
    	// Create a massive array of all the character terms we care about...
    	$valid_char_data = array( 
    		'gender'    => 'lez_gender',
    		'sexuality' => 'lez_sexuality',
    		'romantic'  => 'lez_romantic',
    	);
    
    	if ( isset( $subdata ) && !empty( $subdata ) ) {
    		$char_data = array();
    		$terms     = get_terms( $valid_char_data[ $subdata ], array( 'orderby' => 'count', 'order' => 'DESC' ) );
    
    		if ( ! empty( $terms ) && ! is_wp_error( $terms ) ) {
    			foreach ( $terms as $term ) {
    				$char_data[ $term->slug ] = 0;
    			}
    		}
    	}
    

    Now that we have those base arrays, again set to zero,

    By the way, $subdata and $data are parameters sent to the function that runs this. $subdata is for the taxonomy we’re calculating (sexuality etc) and $data is for the overall taxonomy (Nations or perhaps Stations or genres – we use a lot of those).

    This gets us started.

    Queery the Posts

    Next we need a WP_Query of all the posts in the taxonomy.

    	$count = wp_count_posts( 'post_type_shows' )->publish;
    	$queery = new WP_Query ( array(
    		'post_type'              => 'post_type_shows',
    		'posts_per_page'         => $count,
    		'post_status'            => array( 'publish' ),
    		'tax_query'              => array( array(
    			'taxonomy' => 'lez_' . $data,
    			'field'    => 'slug',
    			'terms'    => $the_tax->slug,
    			'operator' => '=',
    		),),
    	) );
    	wp_reset_query();
    

    Remember, this is still within that foreach above. And once we have the posts, let’s query all the shows:

    	if ( $queery->have_posts() ) {
    		foreach( $queery->posts as $show ) {
    
    			$shows++;
    			// Get all the crazy arrays
    			$gender = get_post_meta( $show->ID, 'lezshows_char_gender' );
    			if ( isset( $subdata ) ) { 
    				$dataset = get_post_meta( $show->ID, 'lezshows_char_' . $subdata );
    			}
    
    			// Add the character counts
    			foreach( array_shift( $gender ) as $this_gender => $count ) {
    				$characters += $count;
    			}
    
    			if ( !empty( $dataset ) ) {
    				foreach( array_shift( $dataset ) as $this_data => $count ) {
    					$char_data[ $this_data ] += $count;
    				}
    			}
    
    		}
    	}
    

    The weird section you see, // Add the character counts is there because every character has a gender, but not everyone has a sexuality or romantic orientation. Because of that, I decided it was safest to use that as my baseline count.

    The second section that checks if ( !empty( $dataset ) ) {...} is what adds things up for the array.

    Speaking of…

    Output the New Array

    Once I have those counts, I generate different arrays depending on what I’m outputting. The basic barchart is different from a percentage, which is different from the stacked bar.

    	// Determine what kind of array we need to show...
    	switch( $format ) {
    		case 'barchart':
    			$array[] = array (
    				'name'  => $the_tax->name,
    				'count' => $shows,
    			);
    			break;
    		case 'percentage':
    			$array = self::taxonomy( 'post_type_shows', 'lez_' . $data );
    			break;
    		case 'count':
    			$array = count( $taxonomy );
    			break;
    		case 'stackedbar':
    			$array[$the_tax->slug] = array(
    				'name'       => $the_tax->name,
    				'count'      => $shows,
    				'characters' => $characters,
    				'dataset'    => $char_data,
    			);
    	}
    

    And all of this is so I could get that silly stacked bar, which will have the count of total characters, shows, and the data.

    Whew.

  • Stacked Charts Part 1: Understanding Your Data

    Stacked Charts Part 1: Understanding Your Data

    There are a few different type of charts. Actually there are a lot. I find a nice bar chart fairly easy to read and understand. So when Tracy said we should generate some nice stats about nations, like how many shows there were per nation, I was able to do that pretty easily:

    An excerpt of shows by nation - USA has the most. Yaaaay.
    An excerpt of shows by nation

    And as far as that goes, it’s pretty cool. It’s really just the same code I use to generate category statistics already. This is, by the way, why using WordPress to generate your data is useful. It’s easy to replicate code you’ve already got.

    But then Tracy, who I think derives some perverse joy out of doing this to me, says “Can we find out how many trans characters there are per nation?”

    Use WordPress First

    If you heard my talks about Sara Lance, you’ve heard me tout that data based sites should always use WordPress functions first. By which I mean they should use taxonomies and custom post types when possible, because accessing the data will be consistent, regular, and repeatable.

    Ironically, it’s because I chose to use WordPress than I was in a bit of a bind.

    You see, we have three post types on the site right now: shows, characters, and actors. The shows have the taxonomy of ‘nation’ so getting that simple data was straightforward. The characters store the taxonomies of gender identity and sexual preference. That sounds pretty logical, right?

    So how, you may wonder, do we get a list of characters on a show? A query. Basically we search wp_post_meta for all characters with the array of lezchars_show_group and, within that multidimensional, have a show of the post ID of the show saved. Which means the characters are dynamically generated every single time a page is loaded. And yes, that is why I use The L Word as my benchmark for page speed.

    However by doing all this dynamically, generating the stats for characters per nation would look like this:

    1. Use get_terms to get a list of all shows in a nation to …
    2. Loop through all those shows and …
    3. Loop through all the characters on each show to extract the data to …
    4. Store the data per nation

    Ouch. Talk about slow.

    Solution? Use WordPress!

    Thankfully there was a workaround. One of the other odd things we do with shows is generate a show ‘score’ – a value calculated by the shows relative awesomeness, our subjective enjoyment of it, and the number of characters, alive or dead, it has.

    In order to make that generation run faster, every time a show or character is saved, I trigger the following post_meta values to be saved:

    • lezshows_characters – An array of character counts alive and dead
    • lezshows_the_score – The insane math of the score

    So I added three more:

    • lezshows_sexuality
    • lezshows_gender
    • lezshows_romantic

    All of those are generated when the post is saved, as it loops through all the characters and extracts data.

    Generate The Base

    In order to get the basics, we start by generating an array of everything we’re going to care about. I do this by listing all the taxonomies I want to use and then loop through them, adding each slug to a new array with a value of 0:

    $valid_taxes = array( 
    	'gender'    => 'lez_gender',
    	'sexuality' => 'lez_sexuality',
    	'romantic'  => 'lez_romantic',
    );
    $tax_data = array();
    
    foreach ( $valid_taxes as $title => $taxonomy ) {
    	$terms = get_terms( $taxonomy );
    	if ( ! empty( $terms ) && ! is_wp_error( $terms ) ) {
    		$tax_data[ $title ] = array();
    		foreach ( $terms as $term ) {
    			$tax_data[ $title ][ $term->slug ] = 0;
    		}
    	}
    }
    

    That gives me a multidimensional array which, I admit, is pretty epic and huge. But it lets move on to step two, of getting all the characters:

    $count          = wp_count_posts( 'post_type_characters' )->publish;
    $charactersloop = new WP_Query( array(
    	'post_type'              => 'post_type_characters',
    	'post_status'            => array( 'publish' ),
    	'orderby'                => 'title',
    	'order'                  => 'ASC',
    	'posts_per_page'         => $count,
    	'no_found_rows'          => true,
    	'meta_query'             => array( array(
    		'key'     => 'lezchars_show_group',
    		'value'   => $post_id,
    		'compare' => 'LIKE',
    	),),
    ) );
    

    Next I stop everything as a new array. Which is where we get into some serious fun. See, I have to actually double check the character is in the show, since the ‘like’ search has a few quirks when you’re searching arrays. The tl;dr explanation here is that if I look for shows with a post ID of “23” then I get “23” and “123” and “223” and so on.

    Yeah. It’s about as fun as you’d think. If I wasn’t doing arrays, this would be easier, but I have Sara Lance to worry about.

    if ($charactersloop->have_posts() ) {
    	while ( $charactersloop->have_posts() ) {
    		$charactersloop->the_post();
    		$char_id     = get_the_ID();
    		$shows_array = get_post_meta( $char_id, 'lezchars_show_group', true );
    
    		if ( $shows_array !== '' && get_post_status ( $char_id ) == 'publish' ) {
    			foreach( $shows_array as $char_show ) {
    				if ( $char_show['show'] == $post_id ) {
    					foreach ( $valid_taxes as $title => $taxonomy ) {
    						$this_term = get_the_terms( $char_id, $taxonomy, true );
    						if ( $this_term && ! is_wp_error( $this_term ) ) {
    							foreach( $this_term as $term ) {
    								$tax_data[ $title ][ $term->slug ]++;
    							}
    						}
    					}
    				}
    			}
    		}
    	}
    	wp_reset_query();
    }
    

    You’ll notice there’s a quick $tax_data[ $title ][ $term->slug ]++; in there to increment the count. That’s the magic that gets processed all over. It tells me things like “this show has 7 cisgender characters” which is the first half of everything I wanted.

    Because in the end I save this as an array for the show:

    foreach ( $valid_taxes as $title => $taxonomy ) { 
    	update_post_meta( $post_id, 'lezshows_char_' . $title , $tax_data[ $title ] );
    }
    

    How Well Does This Run?

    It’s okay. It’s not super awesome, since it has to loop so many times, this can get pretty chunky. See The L Word and it’s 60+ characters. However. It only updates when the show is saved, or a character is added to the show, which means the expensive process is limited. And by saving this data in an easily retrievable format, I’m able to do the next phase. Generate the stats.

  • Linear Regressions in PHP

    Linear Regressions in PHP

    Sometimes math exists to give me a headache.

    In calculating the deaths of queer females per year, my wife wondered what the trend was, other than “Holy sweat socks, it’s going up!” That’s called a ‘trendline’ which is really just a linear regression. I knew I needed a simple linear regression model and I knew what the formula was. Multiple the slope by the X axis value, and add the intercept (which is often a negative number), and you will calculate the points needed.

    Using Google Docs to generate a trend line is easy. Enter the data and tell it to make a trend line. Using PHP to do this is a bit messier. I use Chart.js to generate my stats into pretty graphs, and while it gives me a lot of flexibility, it does not make the math easy.

    I have an array of data for the years and the number of death per year. That’s the easy stuff. As of version 2.0 of Chart.js, you can stack charts, which lets me run two lines on top of each other like this:

    var myChart = new Chart(ctx, {
        type: 'bar',
        data: {
            labels: ['Item 1', 'Item 2', 'Item 3'],
            datasets: [
                {
                    type: 'line',
                    label: 'Line Number One',
                    data: [10, 20, 30],
                },
                {
                    type: 'line',
                    label: 'Line Number Two',
                    data: [30, 20, 10],
                }
            ]
        }
    });
    

    But. Having the data doesn’t mean I know how to properly generate the trend. What I needed was the most basic formula solved: y = x(slope) + intercept and little more. Generating the slope an intercept are the annoying part.

    For example, slope is (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2) where,

    • x and y are the variables.
    • b = The slope of the regression line
    • a = The intercept point of the regression line and the y axis.
    • N = Number of values or elements
    • X = First Score
    • Y = Second Score
    • ΣXY = Sum of the product of first and Second Scores
    • ΣX = Sum of First Scores
    • ΣY = Sum of Second Scores
    • ΣX2 = Sum of square First Scores

    If that made your head hurt, here’s the PHP to calculate it (thanks to Richard Thome ):

    	function linear_regression( $x, $y ) {
    
    		$n     = count($x);     // number of items in the array
    		$x_sum = array_sum($x); // sum of all X values
    		$y_sum = array_sum($y); // sum of all Y values
    
    		$xx_sum = 0;
    		$xy_sum = 0;
    
    		for($i = 0; $i < $n; $i++) {
    			$xy_sum += ( $x[$i]*$y[$i] );
    			$xx_sum += ( $x[$i]*$x[$i] );
    		}
    
    		// Slope
    		$slope = ( ( $n * $xy_sum ) - ( $x_sum * $y_sum ) ) / ( ( $n * $xx_sum ) - ( $x_sum * $x_sum ) );
    
    		// calculate intercept
    		$intercept = ( $y_sum - ( $slope * $x_sum ) ) / $n;
    
    		return array( 
    			'slope'     => $slope,
    			'intercept' => $intercept,
    		);
    	}
    

    That spits out an array with two numbers, which I can plunk into my much more simple equation and, in this case, echo out the data point for each item:

    foreach ( $array as $item ) {
         $number = ( $trendarray['slope'] * $item['name'] ) + $trendarray['intercept'];
         $number = ( $number <= 0 )? 0 : $number;
         echo '"'.$number.'", ';
    }
    

    And yes. This works.

    Trendlines and Death

  • Post Meta, Custom Post Meta, and Statistics

    Post Meta, Custom Post Meta, and Statistics

    I do a lot with weird stats. I do a lot with CMB2 and custom meta. I do a lot with CMB2 and weird post meta.

    Is it any surprise I wanted to add my new year dropdown data to my site for some interesting statistics?

    Of course not. But I started small. I have a saved loop called $all_shows_query

    // List of shows
    $all_shows_query = new WP_Query ( array(
    	'post_type'       => 'post_type_shows',
    	'posts_per_page'  => -1,
    	'no_found_rows'   => true,
    	'post_status'     => array( 'publish', 'draft' ),
    ));
    

    This really just gets the list of all the shows and holds on to it. The meat of the code runs with this:

    $show_current = 0;
    
    if ($all_shows_query->have_posts() ) {
    	while ( $all_shows_query->have_posts() ) {
    		$all_shows_query->the_post();
    		$show_id = $post->ID;
    		if ( get_post_meta( $show_id, "shows_airdates", true) ) {
    			$airdates = get_post_meta( $show_id, "shows_airdates", true);
    			if ( $airdates['finish'] == 'current' ) {
    				$show_current++;
    			}
    		}
    		// More IF statements are here.
    	}
    	wp_reset_query();
    }
    

    As mentioned, there are more if statements, like if ( get_post_meta( $show_id, "shows_stars", true) ) {...} which gets data for if a show has a star or not. And if ( get_post_meta( $show_id, "shows_worthit", true) ) {...} which checks if a show is worth it or not. I was already doing a lot of this, so slipping in one more check doesn’t hurt.

    To show the data, without Chart.js, it looks like this:

    	<h3>Shows Currently Airing</h3>
    
    	<p>&bull; Currently Airing - <?php echo $show_current .' total &mdash; '. round( ( ( $show_current / $count_shows ) * 100) , 1); ?>%
    	<br />&bull; Not Airing - <?php echo ( $count_shows - $show_current )  .' total &mdash; '. round( ( ( ( $count_shows - $show_current ) / $count_shows ) * 100) , 1); ?>%
    	</p>
    

    With Chart.js, it looks like this:

    <canvas id="pieCurrent" width="200" height="200"></canvas>
    
    <script>
    // Piechart for Currently Airing Data
    var pieCurrentdata = {
        labels: [
            "On Air (<?php echo $show_current; ?>)",
            "Off Air (<?php echo ($count_shows - $show_current ); ?>)",
        ],
        datasets: [
            {
                data: [ <?php echo '
    	            "'.$show_current.'",
    	            "'.($count_shows - $show_current ).'",
    	            '; ?>],
                backgroundColor: [
    	            "#4BC0C0",
    	            "#FF6384",
    	            "#E7E9ED"
                ]
            }]
    };
    
    var ctx = document.getElementById("pieCurrent").getContext("2d");
    var pieTriggers = new Chart(ctx,{
        type:'doughnut',
        data: pieCurrentdata,
        options: {
    		tooltips: {
    		    callbacks: {
    				label: function(tooltipItem, data) {
    					return data.labels[tooltipItem.index];
    				}
    		    },
    		},
    	}
    });
    </script>
    

    Right now the data’s a bit off, since I’ve only updated 200 of the 325 shows, but it’s enough to get the information.