Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • What Is The Measure of a Site?

    What Is The Measure of a Site?

    After you think about where you’re saving your data, internally or externally, you’re going to be faced with the biggest problem known to exist.

    What do you do with your data?

    Common Data is (Mostly) Obvious

    Some data, as I’ve said before, is obvious. That is, you know what you want to do with statistics of visits. The base outset is ‘figure out how many people visit my site.’ Right? Not too hard. But that isn’t all you want to know. You want to know when your site is busiest, what content people read, and maybe you want to know on what device.

    You want to know these things because they can help you optimize what you do next. If, for example, your Monday posts are super popular, then you want to make sure you post them at the time the most people are going to visit your site. If you know only 2 people view your site on an iPad, maybe fixing that little annoyance can wait a bit.

    Rare Data is A Headache

    On the other hand, when you look at statistics for your complex data, like a site with TV shows and characters and actors, you have a completely different problem. What public stats are both relevant and meaningful? And how do you represent them in ways that people can understand?

    Like, do you use piecharts?

    An example of two pie charts

    They can be helpful but only if you don’t have a large number of data slices.

    I made a pie chart with 28 slices and it was unreadable. Though that was mostly because everyone had between 1-5% except for one that had 75%.

    The Question Is Usage

    This is a problematic question because it has no easily defined answer before you start building out your site. We’ve all seen an image of a paved path and then a foot-trail cutting away from it, or winding around an obstacle. People like to joke about how it’s design vs usage. While our goal when making any product is to avoid people walking off the paths, it’s unavoidable. And in the case of public statistics, it’s even harder to predict usage.

    A large reason for the problem is what is called a failure of imagination. This is, in part, the fault of the designers. That is, they didn’t predict things properly. Which requires metrics. Which can’t be gathered until people have used the site a little.

    You see the problem, I hope.

    Start With The Easy

    When I built out stats on my site, the ones I wanted people to use, I made sure to start with some easy things. Like those pie charts. Those are just pulled from a custom taxonomy which every character has. They’re simple. They’re easy. And they let people visualize.

    After I released it, someone asked “Could we have a chart to show how many actors a character has?”

    Actors per Character

    That was actually not easy, but the point is that by starting with something ‘easy’ I was able to inspire people to ask what they wanted to see.

    Don’t Be Afraid to Be Wrong

    Remember I mentioned that evil pie chart? You’re going to be wrong. You’re going to assume that the best way to show a specific data point is a pie chart when it really should be a bar chart. If you pick the right chart systems, it shouldn’t be too horrible to switch between them. But sometimes it will be.

    Just remember, it’s okay to make mistakes. You can dig up a path and repave it after all.

  • Processing Numbers with WordPress

    Processing Numbers with WordPress

    The very idea of ‘I should make statistics’ or ‘what are the metrics of this’ starts from the same place. We have a desire to understand what a thing is. Statistics, like traffic, and metrics, like speed, can tell us obviously important information about our sites. Faster sites do better. More traffic gets you more… whatever.

    But those are the obvious things. There are easy to understand numbers and there are difficult to process numbers. And it all matters where you save the data.

    Getting At The Data

    When I set about making statistics for LezWatchTV, the biggest problem I faced was determining what I wanted to show. Some things were simple. How many characters died and what percent of all characters was that? How many shows have dead characters?

    Since I chose to use WordPress features, like custom taxonomies, for the majority of the aspects of the site, getting those numbers was simple. There were, of course, some that were very difficult to get at, and this is fully of my own design. Sometimes there will be data you want to use that is just harder to get at than others.

    This means the question of understanding your numbers begins with understanding where they belong.

    Save Data in Smart Places

    I say this over and over. Use WordPress’ native features first.

    I mean use the taxonomies and the custom post types and the post meta wisely. But. When you’ve got a lot of data that needs to be cross related, consider saving it someplace else. For example, the reason FacetWP is so damn fast is that it doesn’t query WordPress all the time, and instead uses it’s own tables.

    Having it’s own table means there’s less overhead as they can make direct SQL calls to pull the data. When you have data spread across three post types, this becomes pretty much an imperative. You just have to script the code to save it properly.

    External Data

    While FacetWP does save data to it’s own tables, there is another option, and that is external locations. You’re most familiar with this with regards to Google Analytics. Some data makes sense to keep local, but keep in mind what you’re doing and what you’re generating with the data. When it’s just posts, local is perfectly logical. When you get into statistics… Well. Maybe you should export it.

    That brings up the next question. What data to you export, and to where.

  • Customizing Which Random Post

    Customizing Which Random Post

    Back in December, I posted about how I generated a random post of the day.

    After running it for 60 days, I realized I needed to exclude three things:

    1. Posts with a specific ‘placeholder’ image
    2. Posts with content ‘TBD’
    3. Posts with one of two specific meta values

    So today we will talk about how awesome WP_Query is.

    The Basic Query

    As a reminder, your basic query for a random post is this:

    $args = array( 
    	'post_type'      => 'posts',
    	'orderby'        => 'rand', 
    	'posts_per_page' => '1'
    );
    $post = new WP_Query( $args );
    

    Now, let’s extend it!

    Posts With An Image

    In this example, I have a very specific default image I use – the mystery person – to indicate the post doesn’t have it’s own image yet. I went and found the image in my media library and took note of the value – 949. Then I added a meta query which said “If the _thumbnail_id does not equal 949.”

    	'meta_query' => array( 
    		array(
    			'key'     => '_thumbnail_id',
    			'value'   => '949', // Mystery Person
    			'compare' => '!=',
    		),
    

    Seriously. It’s magic.

    Posts Without ‘TBD’

    We also have a standard convention for when we have a pending data post, but we need it for statistical reasons. Since, as of WP 4.4, you can use negatives in searches, just add this to the basic query:

    	's'              => '-TBD',
    

    This could be useful for your stores, if you wanted to list a product of the day but perhaps not ones with “Coming Soon” in the description. Of course, you should also have some meta flag but you get the idea.

    Posts With One of Two Values

    Okay. Here’s fun. Let’s say you have a post meta field called example_site_group and there are six choices in it but you only want one and two. Well, for that you need to use an array and a LIKE:

    	'meta_query' => array( 
    		array(
    			'key'     => 'example_site_group',
    			'value'   => array ( 'baseone', 'basetwo' ),
    			'compare' => 'LIKE',
    		),
    

    This is a little messier, but it certainly does work. Even with serialized data.

    Put It All Together

    Here’s the real code:

    // Grab a random post
    $args = array( 
    	'post_type'      => 'post_type_characters',
    	'orderby'        => 'rand', 
    	'posts_per_page' => '1',
    	's'              => '-TBD',
    	'meta_query' => array( 
    		array(
    			'key'     => '_thumbnail_id',
    			'value'   => '949', // Mystery woman
    			'compare' => '!=',
    		),
    		array(
    			'key'     => 'lezchars_show_group',
    			'value'   => array ( 'regular', 'recurring' ),
    			'compare' => 'LIKE',
    		),
    	)
    );
    $post = new WP_Query( $args );
    

    And voila.

  • Stacked Charts Part 3: The Javascript

    Stacked Charts Part 3: The Javascript

    Finally!

    We have our data in a properly consumable array. It’s formatted the way we need. Now we just need to script the java.

    Take a deep breath.

    What We Want

    What we want is simple. A stacked bar chart that shows the values of all possible permutations. It looks like this:

    A stacked chart that shows how many characters per gender orientation there are per country
    A stacked chart

    That shows how many characters there are per gender orientation, and stacks it for a total count (which is why we needed that count you see).

    Send In The Clowns

    Since I’m already using Chart.js, I just need to have a function to output the javascript. But. Since I also have to loop through the arrays to get the collective data, I need a bit of PHP:

    /*
     * Statistics Display Barcharts
     *
     * Output the list of data usually from functions like self::meta_array
     * It loops through the arrays and outputs data as needed
     *
     * This relies on ChartJS existing
     *
     * @param string $subject The content subject (shows, characters)
     * @param string $data The data - used to generate the URLs
     * @param array $array The array of data
     *
     * @return Content
     */
    static function stacked_barcharts( $subject, $data, $array ) {
    
    	// Defaults
    	$data       = ( $data == 'nations' )? 'nations' : substr( $data, 8 );
    	$title      = ucfirst( substr($subject, 0, -1) ) . ' ' . ucfirst( $data );
    	$height     = '550';
    
    	// Define our settings
    	switch ( $data ) {
    		case 'gender':
    		case 'sexuality':
    		case 'romantic':
    			$title    = 'Character per Nation by ' . ucfirst( $data );
    			$datasets = array();
    			$terms    = get_terms( 'lez_' . $data, array( 'orderby' => 'count', 'order' => 'DESC', 'hide_empty' => 0 ) );
    			if ( ! empty( $terms ) && ! is_wp_error( $terms ) ) {
    				foreach ( $terms as $term ) $datasets[] = $term->slug;
    			}
    			$counter  = 'characters';
    			$height   = '400';
    			break;
    	}
    	?>
    	<h3><?php echo $title; ?></h3>
    	<div id="container" style="width: 100%;">
    		<canvas id="barStacked<?php echo ucfirst( $subject ) . ucfirst( $data ); ?>" width="700" height="<?php echo $height; ?>"></canvas>
    	</div>
    
    	<script>
    	// Defaults
    	Chart.defaults.global.responsive = true;
    	Chart.defaults.global.legend.display = false;
    
    	// Bar Chart
    	var barStacked<?php echo ucfirst( $subject ) . ucfirst( $data ); ?>Data = {
    		labels : [
    		<?php
    			foreach ( $array as $item ) {
    				if ( $item[$counter] !== 0 ) {
    					$name = esc_html( $item['name'] );
    				}
    				echo '"'. $name .' ('.$item[$counter].')", ';
    			}
    		?>
    		],
    		datasets: [
    		<?php
    		foreach ( $datasets as $label ) {
    			$color = ( $label == 'undefined' )? 'nundefined' : str_replace( ["-", "–","-"], "", $label );
    			?>
    			{
    				borderWidth: 1,
    				backgroundColor: window.chartColors.<?php echo $color; ?>,
    				label: '<?php echo ucfirst( $label ); ?>',
    				stack: 'Stack',
    				data : [<?php
    					foreach ( $array as $item ) {
    						echo $item[ 'dataset' ][ $label ] . ',';
    					}
    				?>],
    			},
    			<?php
    		}
    		?>
    		]
    	};
    	var ctx = document.getElementById("barStacked<?php echo ucfirst( $subject ) . ucfirst( $data ); ?>").getContext("2d");
    	var barStacked<?php echo ucfirst( $subject ) . ucfirst( $data ); ?> = new Chart(ctx, {
    		type: 'horizontalBar',
    		data: barStacked<?php echo ucfirst( $subject ) . ucfirst( $data ); ?>Data,
    		options: {
    			scales: {
    				xAxes: [{ stacked: true }],
    				yAxes: [{ stacked: true }]
    			},
    			tooltips: {
    				mode: 'index',
    				intersect: false
    			},
    		}
    	});
    
    	</script>
    	<?php
    }
    

    The Color

    You may have noticed a strange variable:

    $color = ( $label == 'undefined' )? 'nundefined' : str_replace( ["-", "–","-"], "", $label );
    

    Which was then called in the javascript here:

    backgroundColor: window.chartColors.<?php echo $color; ?>,
    

    I have this in a javascript file that is loaded on that page:

    // Color Defines
    window.chartColors = {
    	
    	// Gender
    	agender: 'rgba(255, 99, 132, 0.6)', // 'red'
    	cisgender: 'rgba(75, 192, 192, 0.6)', // 'aqua'
    	demigender: 'rgba(255, 205, 86, 0.6)', // 'goldenrod'
    	genderfluid: 'rgba(54, 162, 235, 0.6)', // 'light blue'
    	genderqueer: 'rgba(255, 159, 64, 0.6)', // 'orange'
    	nonbinary: 'rgba(201, 203, 207, 0.6)', // 'grey'
    	transman: 'rgba(0, 169, 80, 0.6)', // 'green'
    	transwoman: 'rgba(153, 102, 255, 0.6)', // 'purple'
    
    	// Sexuality
    	asexual: 'rgba(255, 99, 132, 0.6)', // 'red'
    	bisexual: 'rgba(75, 192, 192, 0.6)', // 'aqua'
    	heterosexual: 'rgba(255, 205, 86, 0.6)', // 'goldenrod'
    	homosexual: 'rgba(54, 162, 235, 0.6)', // 'light blue'
    	pansexual: 'rgba(255, 159, 64, 0.6)', // 'orange'
    	nundefined: 'rgba(201, 203, 207, 0.6)', // 'grey'
    	queer: 'rgba(0, 169, 80, 0.6)', // 'green'
    	demisexual: 'rgba(153, 102, 255, 0.6)', // 'purple'
    }
    

    The reason it’s ‘undefined’ is that things got weird when I had a variable with a name of undefined.

  • Stacked Charts Part 2: Rebuilding the Array

    Stacked Charts Part 2: Rebuilding the Array

    I’ve talked about this before in category statistics, but in order to get the data from a simple array into a Chart.js consumable one, we have to rebuild the array.

    All Arrays are not Equal

    In order to save the data in a way I could use and reuse, I had to aim at the lowest common denominator. But also I had to save the arrays at a per show basis, which is not the same as what I was going to need to output.

    Instead of just outputting the averages for the show, I needed to combine all this into a ‘by nation’ statistic. That is, I needed to get a list of all shows that were associated with a taxonomy value for that country (easy) and combine all their arrays (not quite easy) and order the data in a way that would make sense (not easy).

    So again we start with understanding the array. Here’s a show that happens to air in Argentina:

    Array
    (
        [cisgender]    => 2
        [trans-woman]  => 0
        [trans-man]    => 0
        [non-binary]   => 0
        [gender-fluid] => 0
        [gender-queer] => 0
        [agender]      => 0
    )
    

    This is the data for one show. Argentina has 2, oddly both with the same stats breakdown by gender identity. What I need to do is loop through both those shows and add the arrays to be this:

    Array
    (
        [cisgender]    => 4
        [trans-woman]  => 0
        [trans-man]    => 0
        [non-binary]   => 0
        [gender-fluid] => 0
        [gender-queer] => 0
        [agender]      => 0
    )
    

    Get the Base Arrays

    Just like before, we make an array of the base data as we have it in the gender, sexuality, and romantic orientations. In this case, we’re adding in a query to change the order to be largest to smallest overall from the taxonomy. While this may not be true for all nations in the future, it is today:

    $taxonomy = get_terms( 'lez_nations' );
    foreach ( $taxonomy as $the_tax ) {
    }
    

    I need to pause here. Everything from here out goes in that foreach. We’re going to be looping for each nation in the list of nations. Now… I re-use this code for multiple taxonomies, so lez_nations is actually lez_' . $data and it dynamically changes based on how I call this function.

    On we go!

    	$characters = 0;
    	$shows      = 0;
    	
    	// Create a massive array of all the character terms we care about...
    	$valid_char_data = array( 
    		'gender'    => 'lez_gender',
    		'sexuality' => 'lez_sexuality',
    		'romantic'  => 'lez_romantic',
    	);
    
    	if ( isset( $subdata ) && !empty( $subdata ) ) {
    		$char_data = array();
    		$terms     = get_terms( $valid_char_data[ $subdata ], array( 'orderby' => 'count', 'order' => 'DESC' ) );
    
    		if ( ! empty( $terms ) && ! is_wp_error( $terms ) ) {
    			foreach ( $terms as $term ) {
    				$char_data[ $term->slug ] = 0;
    			}
    		}
    	}
    

    Now that we have those base arrays, again set to zero,

    By the way, $subdata and $data are parameters sent to the function that runs this. $subdata is for the taxonomy we’re calculating (sexuality etc) and $data is for the overall taxonomy (Nations or perhaps Stations or genres – we use a lot of those).

    This gets us started.

    Queery the Posts

    Next we need a WP_Query of all the posts in the taxonomy.

    	$count = wp_count_posts( 'post_type_shows' )->publish;
    	$queery = new WP_Query ( array(
    		'post_type'              => 'post_type_shows',
    		'posts_per_page'         => $count,
    		'post_status'            => array( 'publish' ),
    		'tax_query'              => array( array(
    			'taxonomy' => 'lez_' . $data,
    			'field'    => 'slug',
    			'terms'    => $the_tax->slug,
    			'operator' => '=',
    		),),
    	) );
    	wp_reset_query();
    

    Remember, this is still within that foreach above. And once we have the posts, let’s query all the shows:

    	if ( $queery->have_posts() ) {
    		foreach( $queery->posts as $show ) {
    
    			$shows++;
    			// Get all the crazy arrays
    			$gender = get_post_meta( $show->ID, 'lezshows_char_gender' );
    			if ( isset( $subdata ) ) { 
    				$dataset = get_post_meta( $show->ID, 'lezshows_char_' . $subdata );
    			}
    
    			// Add the character counts
    			foreach( array_shift( $gender ) as $this_gender => $count ) {
    				$characters += $count;
    			}
    
    			if ( !empty( $dataset ) ) {
    				foreach( array_shift( $dataset ) as $this_data => $count ) {
    					$char_data[ $this_data ] += $count;
    				}
    			}
    
    		}
    	}
    

    The weird section you see, // Add the character counts is there because every character has a gender, but not everyone has a sexuality or romantic orientation. Because of that, I decided it was safest to use that as my baseline count.

    The second section that checks if ( !empty( $dataset ) ) {...} is what adds things up for the array.

    Speaking of…

    Output the New Array

    Once I have those counts, I generate different arrays depending on what I’m outputting. The basic barchart is different from a percentage, which is different from the stacked bar.

    	// Determine what kind of array we need to show...
    	switch( $format ) {
    		case 'barchart':
    			$array[] = array (
    				'name'  => $the_tax->name,
    				'count' => $shows,
    			);
    			break;
    		case 'percentage':
    			$array = self::taxonomy( 'post_type_shows', 'lez_' . $data );
    			break;
    		case 'count':
    			$array = count( $taxonomy );
    			break;
    		case 'stackedbar':
    			$array[$the_tax->slug] = array(
    				'name'       => $the_tax->name,
    				'count'      => $shows,
    				'characters' => $characters,
    				'dataset'    => $char_data,
    			);
    	}
    

    And all of this is so I could get that silly stacked bar, which will have the count of total characters, shows, and the data.

    Whew.

  • Stacked Charts Part 1: Understanding Your Data

    Stacked Charts Part 1: Understanding Your Data

    There are a few different type of charts. Actually there are a lot. I find a nice bar chart fairly easy to read and understand. So when Tracy said we should generate some nice stats about nations, like how many shows there were per nation, I was able to do that pretty easily:

    An excerpt of shows by nation - USA has the most. Yaaaay.
    An excerpt of shows by nation

    And as far as that goes, it’s pretty cool. It’s really just the same code I use to generate category statistics already. This is, by the way, why using WordPress to generate your data is useful. It’s easy to replicate code you’ve already got.

    But then Tracy, who I think derives some perverse joy out of doing this to me, says “Can we find out how many trans characters there are per nation?”

    Use WordPress First

    If you heard my talks about Sara Lance, you’ve heard me tout that data based sites should always use WordPress functions first. By which I mean they should use taxonomies and custom post types when possible, because accessing the data will be consistent, regular, and repeatable.

    Ironically, it’s because I chose to use WordPress than I was in a bit of a bind.

    You see, we have three post types on the site right now: shows, characters, and actors. The shows have the taxonomy of ‘nation’ so getting that simple data was straightforward. The characters store the taxonomies of gender identity and sexual preference. That sounds pretty logical, right?

    So how, you may wonder, do we get a list of characters on a show? A query. Basically we search wp_post_meta for all characters with the array of lezchars_show_group and, within that multidimensional, have a show of the post ID of the show saved. Which means the characters are dynamically generated every single time a page is loaded. And yes, that is why I use The L Word as my benchmark for page speed.

    However by doing all this dynamically, generating the stats for characters per nation would look like this:

    1. Use get_terms to get a list of all shows in a nation to …
    2. Loop through all those shows and …
    3. Loop through all the characters on each show to extract the data to …
    4. Store the data per nation

    Ouch. Talk about slow.

    Solution? Use WordPress!

    Thankfully there was a workaround. One of the other odd things we do with shows is generate a show ‘score’ – a value calculated by the shows relative awesomeness, our subjective enjoyment of it, and the number of characters, alive or dead, it has.

    In order to make that generation run faster, every time a show or character is saved, I trigger the following post_meta values to be saved:

    • lezshows_characters – An array of character counts alive and dead
    • lezshows_the_score – The insane math of the score

    So I added three more:

    • lezshows_sexuality
    • lezshows_gender
    • lezshows_romantic

    All of those are generated when the post is saved, as it loops through all the characters and extracts data.

    Generate The Base

    In order to get the basics, we start by generating an array of everything we’re going to care about. I do this by listing all the taxonomies I want to use and then loop through them, adding each slug to a new array with a value of 0:

    $valid_taxes = array( 
    	'gender'    => 'lez_gender',
    	'sexuality' => 'lez_sexuality',
    	'romantic'  => 'lez_romantic',
    );
    $tax_data = array();
    
    foreach ( $valid_taxes as $title => $taxonomy ) {
    	$terms = get_terms( $taxonomy );
    	if ( ! empty( $terms ) && ! is_wp_error( $terms ) ) {
    		$tax_data[ $title ] = array();
    		foreach ( $terms as $term ) {
    			$tax_data[ $title ][ $term->slug ] = 0;
    		}
    	}
    }
    

    That gives me a multidimensional array which, I admit, is pretty epic and huge. But it lets move on to step two, of getting all the characters:

    $count          = wp_count_posts( 'post_type_characters' )->publish;
    $charactersloop = new WP_Query( array(
    	'post_type'              => 'post_type_characters',
    	'post_status'            => array( 'publish' ),
    	'orderby'                => 'title',
    	'order'                  => 'ASC',
    	'posts_per_page'         => $count,
    	'no_found_rows'          => true,
    	'meta_query'             => array( array(
    		'key'     => 'lezchars_show_group',
    		'value'   => $post_id,
    		'compare' => 'LIKE',
    	),),
    ) );
    

    Next I stop everything as a new array. Which is where we get into some serious fun. See, I have to actually double check the character is in the show, since the ‘like’ search has a few quirks when you’re searching arrays. The tl;dr explanation here is that if I look for shows with a post ID of “23” then I get “23” and “123” and “223” and so on.

    Yeah. It’s about as fun as you’d think. If I wasn’t doing arrays, this would be easier, but I have Sara Lance to worry about.

    if ($charactersloop->have_posts() ) {
    	while ( $charactersloop->have_posts() ) {
    		$charactersloop->the_post();
    		$char_id     = get_the_ID();
    		$shows_array = get_post_meta( $char_id, 'lezchars_show_group', true );
    
    		if ( $shows_array !== '' && get_post_status ( $char_id ) == 'publish' ) {
    			foreach( $shows_array as $char_show ) {
    				if ( $char_show['show'] == $post_id ) {
    					foreach ( $valid_taxes as $title => $taxonomy ) {
    						$this_term = get_the_terms( $char_id, $taxonomy, true );
    						if ( $this_term && ! is_wp_error( $this_term ) ) {
    							foreach( $this_term as $term ) {
    								$tax_data[ $title ][ $term->slug ]++;
    							}
    						}
    					}
    				}
    			}
    		}
    	}
    	wp_reset_query();
    }
    

    You’ll notice there’s a quick $tax_data[ $title ][ $term->slug ]++; in there to increment the count. That’s the magic that gets processed all over. It tells me things like “this show has 7 cisgender characters” which is the first half of everything I wanted.

    Because in the end I save this as an array for the show:

    foreach ( $valid_taxes as $title => $taxonomy ) { 
    	update_post_meta( $post_id, 'lezshows_char_' . $title , $tax_data[ $title ] );
    }
    

    How Well Does This Run?

    It’s okay. It’s not super awesome, since it has to loop so many times, this can get pretty chunky. See The L Word and it’s 60+ characters. However. It only updates when the show is saved, or a character is added to the show, which means the expensive process is limited. And by saving this data in an easily retrievable format, I’m able to do the next phase. Generate the stats.