Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: data

  • Processing Numbers with WordPress

    Processing Numbers with WordPress

    The very idea of ‘I should make statistics’ or ‘what are the metrics of this’ starts from the same place. We have a desire to understand what a thing is. Statistics, like traffic, and metrics, like speed, can tell us obviously important information about our sites. Faster sites do better. More traffic gets you more… whatever.

    But those are the obvious things. There are easy to understand numbers and there are difficult to process numbers. And it all matters where you save the data.

    Getting At The Data

    When I set about making statistics for LezWatchTV, the biggest problem I faced was determining what I wanted to show. Some things were simple. How many characters died and what percent of all characters was that? How many shows have dead characters?

    Since I chose to use WordPress features, like custom taxonomies, for the majority of the aspects of the site, getting those numbers was simple. There were, of course, some that were very difficult to get at, and this is fully of my own design. Sometimes there will be data you want to use that is just harder to get at than others.

    This means the question of understanding your numbers begins with understanding where they belong.

    Save Data in Smart Places

    I say this over and over. Use WordPress’ native features first.

    I mean use the taxonomies and the custom post types and the post meta wisely. But. When you’ve got a lot of data that needs to be cross related, consider saving it someplace else. For example, the reason FacetWP is so damn fast is that it doesn’t query WordPress all the time, and instead uses it’s own tables.

    Having it’s own table means there’s less overhead as they can make direct SQL calls to pull the data. When you have data spread across three post types, this becomes pretty much an imperative. You just have to script the code to save it properly.

    External Data

    While FacetWP does save data to it’s own tables, there is another option, and that is external locations. You’re most familiar with this with regards to Google Analytics. Some data makes sense to keep local, but keep in mind what you’re doing and what you’re generating with the data. When it’s just posts, local is perfectly logical. When you get into statistics… Well. Maybe you should export it.

    That brings up the next question. What data to you export, and to where.

  • Data Structure

    Data Structure

    At WordCamp US, Tracy turned to me and said “I want to do something about the actors on our site.”

    Her idea was that, based on the traffic on our sites, people wanted to know a little more about the actors. The way I’d built out the site, you could get a list of all of an actor’s characters, but you couldn’t really get at everything. Tracy explained to me what people were searching for (actors who were queer, or not) and I mulled over the possibilities, sketching out three solutions.

    1. Facet and Smart PHP

    We originally added in actors as a plain-text field, saved as an array, for all names associated with a character. In using this, with FacetWP, it was trivial to look for all characters played by Ali Liebert. We also already had in a repeatable field, so we could put the ‘most prominent’ actor on top (see “Sara Lance”).

    However what was not trivial was the idea of identifying if an actor was queer. You see some characters have multiple actors, and while today all are either queer or not, one day they may not be. I pointed this out to Tracy using the Sara Lance conundrum and Tracy cried ‘Whhhyyyyyyyy?’ and lay down on the floor.

    2. Taxonomies

    Characters already have a bunch of custom taxonomies, and I considered extending that to a new taxonomy for actors. That would immediately provide organizations, and adding new actors is easy on the fly. With auto-complete, we could get away from the drama of my inability to spell names (or autocorrect’s inability to believe me that in this case, I is before C).

    But… Taxonomies lack extendability. Even with a mess of custom meta added in for featured images, we wouldn’t have an ‘easy’ way to track all trans actors. And we wouldn’t have enough of a future.

    3. Custom Post Types

    This is what we actually decided to use. A custom page for each actor. We added in two new taxonomies for actor gender identity and sexual orientation, which are then used to determine if the actor is queer or not. It gives us the most control and extendability of the choices, and the nice permalink.

    There are two significant downsides to this. First, you have to add in a page for the actor before you add in the character page. Second, I had to ‘reproduce’ a loop to list all the characters played by an actor. However I was reusing the same logic as I do for shows which made it easier than it might have been.

    What It Means

    Understanding and predicting an unknown future is hard. It’s near impossible. You have to guess what you want things to be, and as I have said many times with the building out of LezWatchTV, you must be alright with being wrong.

  • Data Based Sites

    Data Based Sites

    Designing your website involves understanding the structure of the data within. Designing your data comes down to how you store it. At its base level, everything on your site is a post, but the way you handle the data WITHIN the posts is how you can plan for growth, adaption, adoption, and the future. Building a site today involves making sure the data is easily consumable by multiple formats, like AMP, JSON APIs, and Alexa Echo Skills. And it all starts with understanding the data you’re using. Even if that data is from TV.

    Television is chewing gum for the eyes

    I love television. I love books, I’m always reading and writing, and I love radio. I like movies. But TV is something weird and wonderful. TV is escapism when a story brings you someplace new and amazing.

    One of the reason I love tv is that I see myself reflected in media. I’m a Jewish lesbian. Growing up, I didn’t see a lot of me on TV. Any, really. If I think about it, the first Jewish Lesbian I can remember seeing is Willow Rosenberg from Buffy. Yeah. I was out of college by then.

    Representation Matters

    In the entirety of television, world wide, there are about 2000 queer females. Total. Yeah. 2000. That’s on about 600 shows. There are not a lot of them, and I know the numbers because I run, with Phillys very own Tracy, a site that uses WordPress to record all of them. That’s right, I have the data.

    Data Driven Design

    This isn’t about themes. I can’t design themes. I don’t. I’m bad at it, I know that, sorry Tracy. And when we set about making this site, we had some lofty goals and ideas and we learned some lessons by prejudging the data. Originally we wanted to simply list shows and if you should watch them or not, and list the characters but…

    Somewhere around 100 characters and fifty shows, it became clear that the datasets we’d defined were underestimated and over-complicated and over-simplified. We had to consider what made a show good or bad. We had to consider the situation. And we had to make changes.

    When you build out any site, you have an idea of what you want. Maybe like us you make a massive list of everything you want to track and get dragged down into the weeds fast. Maybe you make a small list and have to go back and edit. In the end, the trick of it all is planning your site based on your data.

    Even after you do all that planning, you’re going to find out you missed things. You’ll overestimate things. You’re going to underestimate others. People are going to use your data in unpredictable ways. This is just how the world works. So you have to design your code to adapt.

    How to Design for Data

    This is about planning code. Designing your site for data comes down to how you store it. At its base level, everything on our site is a post. There are two types, shows and characters, and all posts have a bevy of ‘meta’ data for the various bits of information.

    What information?

    Shows: tropes, airdates, thumb score, why that score, tv stations, nations, genres, formats, gold star, trigger warnings, quality rating, screentime rating, realness rating, timeline, episodes, & ‘ships.

    Characters: clichés, actor, sexuality, gender, date of death, tv show & role on show.

    So all that is what we record. It’s grown and shrunk. We had urls in there and removed them because it was impossible to upkeep when fansites vanish. I removed and restored ‘ships, after figuring out how to store all of it in a searchable way. But with all that data, we started to see the big picture. And then … we realized how the data was stored mattered.

    Understand What Your Data Is (And Isn’t)

    Most of the data is simple. Ratings are a 1-5 option. Trigger warnings were check boxes for a binary on or off … Were. Sexuality and gender are dropdown lists, and so are tropes and cliches. Managing those is easy. Ish. We understand taxonomies, at least, being WordPess developers.

    Most of the time, data is obvious. Again, I have items that are a binary. A yes or a no. And while I personally believe that sexuality and gender are a spectrum, TV hasn’t caught up there, so that can be stored as a one to one dropdown. You are what you are. The same goes for a character being on a TV show. It’s all one or the other.

    Having said all that, let me introduce you to Sara Lance, as played by Caity Lotz. Schrödinger’s bisexual time traveling action hero.

    The Complex Data: Sara Lance

    Schrödinger’s bisexual time traveling action hero assassin pirate captain.

    Sara Lance has two actors. She’s been on three shows in three separate capacities; guest, recurring, and regular. She’s died and came back to life. Sara Lance exists to make me, as a developer, cry. Because for her I have to store all of that data in a searchable manner that plain-text post meta doesn’t make easy. She made me rethink all our data storage.

    Nothing about Sara is simple. Nothing is straightforward. Not even the existing taxonomies and lists stayed the same. Both gender and Sexuality went from a simple dropdown to a taxonomy. We moved from gay, straight, or bi, to include pansexual and asexual and I’m just waiting for Sara to step into pansexuality, y’all.

    Use WordPress First

    As much as possible, use WordPress. Taxonomies are heavily used for ‘lists’ like cliches and tropes and tv stations, because they come with built in sortables. I can easily list all shows on NBC or all characters who are parents, because those are taxonomies. Even short lists like sexuality and gender work well for that. For the rest, it’s all post meta.

    Sara exists beyond taxonomies. Originally, we had death stored as a simple taxonomy item for character cliches. If you were dead, you got the cliche of dead. Sara came along and died. And came back. So suddenly we had to rethink if someone kept the tags if they died and came back. Pro tip? They don’t. That was a quick decision, though. Don’t worry, she made us make harder decisions.

    Use Plugins Second

    I mentioned everything that isn’t a taxonomy is post meta. Adding metadata to posts that aren’t taxonomies sucks. Yes, I said it. It sucks. Plugins like CMB2 or ACF will save your vegan bacon by making it easier to create a check box or a dropdown or a plain text field, like for actors.

    Sara had two actors. While the field for actor is only plain text, and that’s relatively simple, we had to make it repeatable so we could add multiple actors. God help me, date of death had to be repeatable too. What if Sara dies again!? CMB2 has repeatable fields built in.

    Use Third Party Add-Ons Third

    The bigger the data got, the more important admin design became. The more tropes, the worse a dropdown or multi-check section was. By using a select2 addon, and some custom save code, I was able to convert taxonomies into an auto-complete, which is a lot easier to visualize. So are groups. By clumping related data together, the brain makes the right connextions. And when it’s repeatable, your page grows with the data is uses, not with the data total.

    Sara has three TV shows. THREE!

    Sara’s page is much bigger than anyone else’s because she has three shows, and each show section has a dropdown for character role and another for the show name. And I can’t guess if she’s going to show up on another, like Supergirl maybe. CMB2 does have a limit to repeatable fields and groups, though. I hope Sara doesn’t hit it…

    Be Willing to Make Changes Fourth

    You will be wrong. I removed the list of ships, relationships, from shows. Tracy was sad. I added it back. My bad, totally, and my reasoning was that it was not easy to maintain and manage in a searchable way. It wasn’t. But it could be. And I made it. Be willing to be wrong, to make mistakes, and to recover from them.

    Because … Sara is always changing. Alive, dead, new show, new actor… Sara is always changing and always adapting and always being more awesome. She’s anything but static, and that’s a good thing. Sara Lance made me throw my preconceived notions of data storage and organization out the window. And I’m better for it.

  • CMB2: Repeatable Groups

    CMB2: Repeatable Groups

    This is something that the plugin does out of the box, but my reason for doing it was a little odd.

    Background

    Originally, I had a set of TV characters as a custom post type and each one had their own TV show. Since the TV shows are a second post type, the data was saved as a number and that number was used to generate data on the show pages. Look for everyone who has a TV show value of the same ID as the post ID. Yay!

    The problem with it was spinoffs and crossovers. As time went on, certain characters began to appear on other shows. And it only got worse, until at length there were 30 characters on more than one show, and the number was only growing.

    The quick fix was to make the shows value a repeatable field in CMB2, where I could add multiple shows. Done and done. But then we reached critical mass with how we were handling character roles. Was the character a main, a recurring, or a guest?

    Shows and Roles

    Breaking down the problem to it’s most simple, we have one data set:

    • Show (stored as an ID in an array)
    • Role Type (stored as plain text)

    Instead of saving it as a data set together, the shows were one field (an array, as I mentioned) and the role types were another (a text field).

    In order to make this work, I would have to:

    1. Create a field ‘group’ in CMB2 that stored both show and role as related to that show
    2. Make that group repeatable for characters on multiple shows
    3. Migrate the data

    Data Migration

    There are a lot of ways around this. I ended up with going for the super simple route. I exported two CSVs from my database: one of the shows and one of the role types. Each one had the Post ID associated with it, so I opened those up in a spreadsheet app and combined them, for all cases where the Post ID was the same.

    This gave me a new table that looked like this: 123, 456, regular

    More or less. The ones where shows were arrays looked like, obviously, arrays. I then converted that into a file with 1500 lines that looked like this:

    wp post meta add 6957 character_tvshow_group '[{"show":"6951","type":"regular"},{"show":"7009","type":"regular"}]' --format=json
    

    I could have done it differently, grabbing a file with the data and parsing it on the fly, but I like to look at my 1500 lines and make sure I don’t have weird extra quotes lying around.

    Once that was done, I ran the file, having it execute every line one at a time. It took about one episode of House Hunters: International.

    The CMB2 Code

    In case you’re wondering the code to do this in CMB2 looks like this:

    		// Field Group: Character Show information
    		// Made repeatable since each show might have a separate role. Yikes...
    		$group_shows = $cmb2->add_field( array(
    			'id'          => $prefix . 'show_group',
    			'type'        => 'group',
    			'repeatable'  => true,
    			'options'     => array(
    				'group_title'   => 'Show #{#}',
    				'add_button'    => 'Add Another Show',
    				'remove_button' => 'Remove Show',
    				'sortable' => true,
    			),
    		) );
    		// Field: Show Name
    		$cmb2->add_group_field( $group_shows, array(
    			'name'             => 'TV Show',
    			'id'               => 'show',
    			'type'             => 'select',
    			'show_option_none' => true,
    			'default'          => 'custom',
    			'options_cb'       => array( $this, 'cmb2_get_shows_options'),
    		) );
    		// Field: Character Type
    		$cmb2->add_group_field( $group_shows, array(
    			'name'             => 'Character Type',
    			'id'               => 'type',
    			'type'             => 'select',
    			'show_option_none' => true,
    			'default'          => 'custom',
    			'options'          => $this->character_roles,
    		) );
    

    You’ll notice the options are a bit extra custom.

    Get Shows

    This is done in two parts:

    	public function Sitename_get_post_options( $query_args ) {
    	    $args = wp_parse_args( $query_args, array(
    	        'post_type'   => 'post',
    	        'numberposts' => wp_count_posts( 'post' )->publish,
    	        'post_status' => array('publish'),
    	    ) );
    
    	    $posts = get_posts( $args );
    
    	    $post_options = array();
    	    if ( $posts ) {
    	        foreach ( $posts as $post ) {
    	          $post_options[ $post->ID ] = $post->post_title;
    	        }
    	    }
    
    	    asort($post_options);
    	    return $post_options;
    	}
    
    	public function cmb2_get_shows_options() {
    		return SiteName_get_post_options( array(
    				'post_type'   => 'post_type_shows',
    				'numberposts' => wp_count_posts( 'post_type_shows' )->publish,
    				'post_status' => array('publish', 'pending', 'draft', 'future'),
    			) );
    	}
    

    The reason we search for all shows, from draft to future, is that sometimes we like to schedule updates.

    Character Roles

    		$this->character_roles = array(
    			'regular'   => 'Regular/Main Character',
    			'recurring'	=> 'Recurring Character',
    			'guest'	 	=> 'Guest Character',
    		);