Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • Elasicsearch as a Service

    Elasicsearch as a Service

    Search is hard. Searching when you have custom meta data in post is harder. By default, WordPress does not search your custom meta data. And my LezWatchTV site is 75% custom meta data.

    I’d been using Google Search, but that has a lot of issues of it’s own like privacy, ads, accuracy, and most importantly, no way to tune it. I decided to try out ElasticSearch since I knew that was what WordPress.org’s internal search engine was going towards. After I added custom post meta to my search content, this post was going to be about how to install Elasticsearch on an ELK stack on DreamCompute, which turned out to be rather easy if time consuming and messy. And getting WordPress to work with it was as easy as installing the ElasticPress plugin (thank you 10up).

    What was complicated was making Elasticsearch work remotely. By default, it wants to only be accessible locally for your own security. But adding in Shield and still having all the logs and pretty things to understand what was happening and how to manage it when it was all new escalated quickly. It was simply too much all at once for me. Instead I decided to look into Elasticsearch as a service.

    There are a lot of options here,

    Self Managed

    I know I said ‘as a service’ but you really can use DigitalOcean or DreamCompute to do this. And there’s all sorts of documentation about how to do it available (like DigitalOcean’s ‘How to install the ELK Stack on Ubuntu 14.04’ which works on DreamCompute too). And Amazon Elasticsearch is also an option here.

    But… they’re all very self-managed. They require you to jump into servers, run a lot of commands, and they’re not new user friendly. Look, I get that this is complicated stuff, but people aren’t going to know if they want to learn all this if you make it monumental to get into.

    Services

    You can break these down into two main types.

    Enterprise Level:

    Free ‘Trials’:

    I wanted to use something ‘free’ to get started so I could figure out what I wanted to do and how to properly use Elasticsearch before deciding if I wanted to pay. But also I wanted to figure out exactly to do with search. Therefore I needed something ‘free’ to test with, something with logs, that would help me understand it all. I ended up trying both Bonsai and Searchly. While Bonsai gave me more room, Searchly had more information to the interface, but neither had a ‘Hey, here’s how you tune Elasticsearch!’ page.

    Neither had Kibana 4 though, which is a little sad.

    So when you don’t know how to do ‘anything’ with Elasticsearch, what can you test? The same search. I checked which was faster, which was more accurate, and which had the results I wanted. Bonsai was the winner here, so that’s what I went with.

    Integrating WordPress

    Thankfully this is the easy part.

    Install the ElasticPress plugin. Go to Settings -> ElasticPress and add in the URL from your Bonsai panel as your Host. It should look like https://username:password@yourcluster.us-west-2.bonsai.io (with some variation based on location). Save, press the ‘Run Index’ button, and you’re done.

    The nice thing about the plugin is if it breaks (like the service goes down), the plugin reverts to WordPress search! Which isn’t great, but … well.

    Next? How do I tune Elasticsearch?!

  • Search Options for Custom Post Data

    Search Options for Custom Post Data

    I use CMB2 to add in a bunch of custom meta data for my posts on a site. Seeing as I’m using it to allow layouts and formats to be consistent, it’s not surprising that I’ve chosen to split out my data like that. In another world, maybe it would be done differently, but this works.

    Except that search sucks. WordPress doesn’t search custom post meta out of the box which just kills me. That meant all the data I stored in for names and dates was never getting searched. There are two ‘easy’ solutions for this at least.

    Google Search

    Ew. I know. But ew. Since I’m using Genesis as my theme, it’s not super hard, just a little weird. Assuming you already have a Custom Search Engine set up, and you’re using Genesis, here’s what to do next.

    First I added this into my functions-site.php (note: I made a functions-site.php file so I can easily update my functions.php file on the rare occasion I need to update the child theme – it’s really rare – but also so I always know what’s me and what was Genesis):

    /* Google Custom Search Engine */
    add_filter( 'genesis_search_form', 'helf_search_form', 10, 4);
    function helf_search_form( $form, $search_text, $button_text, $label ) {
        $onfocus = " onfocus=\"if (this.value == '$search_text') {this.value = '';}\"";
        $onblur = " onblur=\"if (this.value == '') {this.value = '$search_text';}\"";
        $form = '<form method="get" class="searchform search-form" action="' . home_url() . '/search" >' . $label . '
    <input type="text" value="' . esc_attr( $search_text ) . '" name="q" class="s search-input"' . $onfocus . $onblur . ' /><input type="submit" class="searchsubmit search-submit" value="' . esc_attr( $button_text ) . '" /></form>';
        return $form;
    }
    

    Then I made a custom page template thanks to Rick Duncan:

    <?php
    /*
     * Template Name: Google Custom Search Engine
     *
     * This file adds the Google SERP template to our Genesis Child Theme.
     *
     * @author     Rick R. Duncan
     * @link       http://www.rickrduncan.com
     * @license    http://www.opensource.org/licenses/gpl-license.php GPL v2.0 (or later)
     *
     */
    
    //* Force Full-Width Layout
    add_filter( 'genesis_pre_get_option_site_layout', '__genesis_return_full_width_content' );
    
    //* Add Noindex tag to the page
    add_action( 'genesis_meta', 'lez_noindex_page' );
    function lez_noindex_page() {
    	echo '<meta name="robots" content="noindex, follow">';
    }
    
    //* Insert Google CSE code into <head> section of webpage
    add_action( 'genesis_meta', 'lez_google_cse_meta', 15 );
    function lez_google_cse_meta() { ?>
    
    	<script>
    	  (function() {
    	    var cx = '017016624276440630536:tpoclrwnxyy';
    	    var gcse = document.createElement('script');
    	    gcse.type = 'text/javascript';
    	    gcse.async = true;
    	    gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
    	    var s = document.getElementsByTagName('script')[0];
    	    s.parentNode.insertBefore(gcse, s);
    	  })();
    	</script><?php
    }
    //* Add custom body class
    add_filter( 'body_class', 'lez_add_body_class' );
    function lez_add_body_class( $classes ) {
    
       $classes[] = 'google-cse';
       return $classes;
    
    }
    //* Remove standard Genesis loop and insert our custom page content
    remove_action( 'genesis_loop', 'genesis_do_loop' );
    add_action( 'genesis_loop', 'lez_custom_content' );
    function lez_custom_content() { ?>
    
    	<div itemtype="http://schema.org/SearchResultsPage" itemscope="itemscope">
    		<header class="entry-header">
    			<h1 itemprop="headline">
        			<?php echo get_the_title($ID); ?>
        		</h1>
        	</header>
        	<div class="entry-content" itemprop="text">
        		<?php echo get_the_content();
        		//* Obtain querystring value if present and display on-screen
    			if ((isset($_REQUEST['q'])) && (!empty($_REQUEST['q']))) {
        			$query= $_REQUEST['q'];
        			echo '<strong>You Searched For:</strong> <em>'.$query.'</em>';
    			}
    			else {
    				echo 'Please enter a search phrase.';
    			}
    			if ( is_active_sidebar( 'google-cse' ) ) {
    				dynamic_sidebar( 'google-cse' );
    			}
    			?>
        		<gcse:searchresults-only></gcse:searchresults-only>
        	</div>
        </div>
    
    <?php }
    genesis();
    

    Finally I added a page called “Search Results” and assigned it this template. Done. Google, who searches the whole page content, will get everything. It just looks like Google.

    Having WordPress search your Custom Post Meta

    This was surprisingly annoying, but not as hard as all that. Adam Balee wrote Search WordPress by Custom Fields without a Plugin which, I know, is ‘without a plugin’ and sort of silly, but I put that in as an MU plugin and it worked perfectly!

    <?php
    /**
     * Extend WordPress search to include custom fields
     *
     * http://adambalee.com
     */
    
    /**
     * Join posts and postmeta tables
     *
     * http://codex.wordpress.org/Plugin_API/Filter_Reference/posts_join
     */
    function cf_search_join( $join ) {
        global $wpdb;
    
        if ( is_search() ) {    
            $join .=' LEFT JOIN '.$wpdb->postmeta. ' ON '. $wpdb->posts . '.ID = ' . $wpdb->postmeta . '.post_id ';
        }
        
        return $join;
    }
    add_filter('posts_join', 'cf_search_join' );
    
    /**
     * Modify the search query with posts_where
     *
     * http://codex.wordpress.org/Plugin_API/Filter_Reference/posts_where
     */
    function cf_search_where( $where ) {
        global $pagenow, $wpdb;
       
        if ( is_search() ) {
            $where = preg_replace(
                "/\(\s*".$wpdb->posts.".post_title\s+LIKE\s*(\'[^\']+\')\s*\)/",
                "(".$wpdb->posts.".post_title LIKE $1) OR (".$wpdb->postmeta.".meta_value LIKE $1)", $where );
        }
    
        return $where;
    }
    add_filter( 'posts_where', 'cf_search_where' );
    
    /**
     * Prevent duplicates
     *
     * http://codex.wordpress.org/Plugin_API/Filter_Reference/posts_distinct
     */
    function cf_search_distinct( $where ) {
        global $wpdb;
    
        if ( is_search() ) {
            return "DISTINCT";
        }
    
        return $where;
    }
    add_filter( 'posts_distinct', 'cf_search_distinct' );
    

    This is not the most efficient search, I know. But it works and gets my data where it’s needed.

  • Pagination and Static Front Pages

    Pagination and Static Front Pages

    This is a post skewed towards the Genesis Framework. Actually, if you’re not using the Genesis Metro Pro theme, I don’t know how well this will work.

    My problem was simple. I used the Metro Pro Static Front Page to show some widgets and then custom displays of posts via those widgets. It works pretty darn well and looks like this:

    Metro Pro's Static Front Page

    There was just one small issue. It doesn’t show me pagination at the bottom of the page. Oh and the normal method of example.com/page/2/ just showed me the same front page over and over. Not what I wanted.

    One way I could work around this would be to treat the front page as a static front page and make a “blog” page. Except then my urls would be example.com/blog/page/2 and I’d have duplicate content on example.com/blog/ which is not desirable. Causing me more frustration was the fact that the documentation said this:

    If no widgets are placed in any of the home page specific widget areas, a blog-style home page will be displayed.

    What I wanted was that blog-style page on the sub pages, along with navigation.

    Show Navigation Links

    This part was easy. In the file front-page.php I edited the function function metro_homepage_widgets() to have this at the bottom:

    genesis_posts_nav();

    Really, that was it. Now I had navigation! But (as I already knew) the navigation didn’t work properly.

    Fix Paged Content

    At the top of the front-page.php file is a call to add an action with all the metro_home_genesis_meta content. I wrapped that in a check to see if the page we’re on is a ‘paged’ page using is_paged(), which specifically checks if the query is for paged result and not for the first page.

    if ( !is_paged() ) {
    	add_action( 'genesis_meta', 'metro_home_genesis_meta' );
    }
    

    Again, really, that was it.

  • Greylist, RBLs, and Spam

    Greylist, RBLs, and Spam

    Recently I noticed I had 13 spam emails all from the same ‘company.’ The content was incredibly similar, though subtly different. The from email was always different, but you could tell by looking at it that it was the same. And even more damming, it all had ‘junk’ content and 100+ recipients. But for some reason, SpamAssassin wasn’t catching it!

    After 5 emails came in back to back, I decided to do something about it.

    At first I was trying to find a way to tell Spamassassin or Exim how to auto-turf the emails with 100+ people listed in the ‘To’ field. This proved to be a little more difficult and complicated than I wanted, and I was sure that these spammers would catch on to that sooner or later.

    What I really wanted was for Spamcop to pick up on this, but I’ve been sending them in to no avail for a while. That got me looking into how cPanel handles Spamcop in the first place.

    Real-Time Blackhole Lists

    cPanel uses RBLs, Real-time Blackhole Lists, to determine if an email sent to you is spam or not. By default, it comes with SpamCop and Spamhaus. That means it will reject mail at SMTP time if the sender host is in the bl.spamcop.net or zen.spamhaus.org RBL. Well that was well and good, but could I add more to that list?

    Of course. I pulled up cPanel’s documentation on RBLs and determined I could add as many as I wanted. On the top of the Basic EXIM Editor is a link to Manage Custom RBLs which is what I wanted. All I had to do was figure out what to add.

    After reading through WikiPedia’s comparison of DNS blacklists, I picked a few and tested the latest emails that had come through, looking for ones that caught them. Then I tested known good emails and made sure they weren’t caught. I ended up adding Barracudacentral and IPRange.

    Greylisting

    The next thing I did was introduce Greylisting to my email. They way Greylisting works is that if it doesn’t recognize the email, it will temporarily reject it and tell it to resend. If the email is real, the server tries to send it again after a little while. There are some downsides to this, as it’s possible for a legit email to be trapped for a few hours (or days) if someone’s set up their server poorly. On the other hand, within half an hour, I blocked 11 emails.

    I mean. I’m pretty sure monica@getoffherpes.com is spam. You know what I mean?

    This was super easy to do, too. I turned on Greylisting, I restarted Exim, I walked away.

    Okay no, I didn’t. I sat and watched it to see if anyone legit got caught (one did, it passed itself through properly).

    Result?

    A little less spam. I don’t expect this to work for everything, but it had an immediate impact on many of the spam emails that were annoying me.

  • CMB2 And The Dropdown Years

    CMB2 And The Dropdown Years

    At WordCamp Montreal, I mentioned the database of dead lesbians that Tracy and I maintain. The camper looked at it and said “You know it would be awesome if you showed the shows airdates.”

    Good point! Except I just plain struggled with the concepts and how to do them in CMB2. I knew I could make multiple fields in one ‘metabox’ as I read up on the snippet for an address field, but try as I might, I couldn’t make it work.

    I tweeted my headache and ended up talking to Justin Sternberg who asked me if I could explain my use case better.

    I have 300+ posts, all of which have a start and end date. Some may have an end date of “current” however.

    Examples of valid data:

    • 1977-1979
    • 2016-current
    • 2000-2016

    I also need to sort by start and end year. So I can search for all posts with a start of 2014.

    I could have two year-sorts, easily, but that makes for a clunky interface as it would be separate fields. I know CMB2 can have a combined field (like addresses) but while I got it to save, it wouldn’t properly display on the edit page.

    This only needs to be editable on the WP admin edit post.

    That night, he replied and asked if this year-range field type would work.

    Mind? Blown. It works exactly how I need it to. I tweaked the code (and threw in a pull request) to set up a way to reverse the years (show newest first) which is more useful for my needs.

    Now? Editing 319 show entries.

  • Mobile Ad Detection

    Mobile Ad Detection

    I screwed up not that long ago.

    I got an email from Google Adsense telling me that one of my sites was in violation because it was showing two ads on the same mobile screen, which is not allowed. Until I started using some of Googles whole page on mobile ads (an experiment), this was never an issue. Now it was. Now I had to fix it.

    Thankfully I knew the simpliest answer would be to detect if a page was mobile and not display the ads. Among other things, I know that I hate too many ads on mobile. So all I wanted was to use the Google page level ads – one ad for the mobile page, easily dismissible. Therefore it would be best if I hide all but two other ads. One isn’t really an ad as much as an affiliate box, and one Google responsive ad.

    For my mobile detector, I went with MobileDetect, which is a lightweight PHP class. I picked it because I was already using PHP to determine what ads showed based on shortcodes so it was a logical choice.

    Now the way my simple code works is you can use a WordPress shortcode like [showads name="google-responsive"] and that calls a file, passing a parameter for name into the file to generate the ad via a mess of switches and sanitation. Really you can go to http://example.com/myads.php?name=leaderboard and it would show you the ad.

    The bare bones of the code looks like this:

    <?php
    
    require_once 'Mobile_Detect.php';
    $detect = new Mobile_Detect;
    
    $thisad = trim(strip_tags($_GET["name"]));
    $mobileads = array('google-responsive', 'affiliate-ad');
    
    // If it's mobile AND it's not in the array, bail.
    if ( $detect->isMobile() && !in_array($thisad, $mobileads) ) {
    	return;
    }
    
    echo '<div class="jf-adboxes '.$thisad.'">';
    
    switch ($thisad) {
    	case "half banner":
    		echo "the ad";
    		break;
    	case "line-buttons":
    		echo "the ad";
    		break;
    	default:
    		echo "Why are you here?";
    }
    
    echo '</div>';
    

    The secret sauce is that check for two things:

    1. Is the ad not one I’ve authorized for mobile?
    2. Is this mobile?

    Only if both are false will the script continue to run. It’s simple but I like to have things stop as soon as possible to make loading faster. There’s no css trickery to hide things via mobile size detection. It’s as simple, and as close to a binary check as I can make it.