Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: search

  • Meilisearch at Home

    Meilisearch at Home

    There are things most CMS tools are great at, and then there are things they suck at. Universally? They all suck at search when you get to scale.

    This is not really true fault of the CMS (be it WordPress, Drupal, Hugo, etc). The problem is search is difficult to build! If it was easy, everyone would do it. The whole reason Google rose to dominance was that it made search easy and reliable. And that’s great, but not everyone is okay with relying on 3rd party services.

    I’ve used ElasticSearch (too clunky to run, a pain to customize), Lunr (decent for static sites), and even integrated Yahoo and Google searches. They all have issues.

    Recently I was building out a search tool for a private (read: internal, no access if you’re not ‘us’) service, and I was asked to do it with MeiliSearch. It was new to me. As I installed and configured it, I thought … “This could be a nice solution.”

    Build Your Instance

    When you read the directions, you’ll notice they want to install the app as root, meaning it would be one install. And that sounds okay until you start thinking about multiple servers using one instance (for example, WordPress Multisite) where you don’t want to cross contaminate your results. Wouldn’t want posts from Ipstenu.org and Woody.com showing up on HalfElf, and all.

    There are a couple of ways around that, Multi-Tenancy and multiple Indexes. I went with the indexes for now, but I’m sure I’ll want tenancy later.

    I’m doing all this on DreamHost, because I love those weirdos, but there are pre-built images on DigitalOcean if that floats your goat:

    1. Make a dedicated server or a DreamCompute (I used the latter) – you need root access
    2. Set the server to Nginx with the latest PHP – this will allow you to make a proxy later
    3. Add your ssh key from ~/.ssh/id_rsa.pub to your SSH keys – this will let you log in root (or an account with root access)

    Did that? Great! The actual installation is pretty easy, you can just follow the directions down the line.

    Integration with WordPress

    The first one I integrated with was WordPress and for that I used Yuto.

    It’s incredibly straightforward to set up. Get your URL and your Master Key. Plunk them in. Save. Congratulations!

    On the Indices page I originally set my UIDs to ipstenu_posts and ipstenu_pages – to prevent collisions. But then I realized… I wanted the whole site on there, so I made them both ipstenu_org

    Example of Yuto's search output
    Yuto Screenshot

    I would like to change the “Ipstenu_org” flag, like ‘If there’s only one Index, don’t show the name’ and then a way to customize it.

    I will note, there’s a ‘bug’ in Yuto – it has to load all your posts into a cache before it will index them, and that’s problematic if you have a massive amount of posts, or if you have anti-abuse tools that block long actions like that. I made a quick WP-CLI command.

    WP-CLI Command

    The command I made is incredibly simple: wp ipstenu yuto build-index posts

    The code is fast, too. It took under a minute for over 1000 posts.

    After I made it, I shared it with the creators of Yuto, and their next release includes a version of it.

    Multiple Indexes and Tenants

    You’ll notice that I have two indexes. This is due to how the plugin works, making an index per post type. In so far as my ipstenu.org sites go, I don’t mind having them all share a tenant. After all, they’re all on a server together.

    However… This server will also house a Hugo site and my other WP site. What to do?

    The first thing I did was I made a couple more API keys! They have read-write access to a specific index (the Key for “Ipstenu” has access to my ipstenu_org index and so on). That lets me manage things a lot more easily and securely.

    While Yuto will make the index, it cannot make custom keys, so I used the API:

    curl \
    -X POST 'https://example.com/keys' \
    -H 'Authorization: Bearer BEARERKEY' \
    -H 'Content-Type: application/json' \
    --data-binary '{
    "description": "Ipstenu.org",
    "actions": ["*"],
    "indexes": ["ipstenu_org"],
    "expiresAt": null
    }'
    

    That returns a JSON string with (among other things) a key that you can use in WordPress.

    Will I look into Tenancy? Maybe. Haven’t decided yet. For now, separate indexes works for me.

  • Algolia: Search Faster

    Algolia: Search Faster

    Note: While this post is about using Algolia, the irony is that shortly after I posted it, I removed Algolia. The reason being, InstantSearch counts as a separate search per letter used — that means I was about to skyrocket over my allowance and hit the thousands-a-month. I feel their pricing was quite unclear about this. But hey, now you know!

    My friends, it’s been a while. And if you follow my rants on Twitter (this is not a suggestion you should). you saw I faced off with ElasticSearch, my nemesis.

    Moons ago, I attempted to use it to make a site I run faster by using ElasticSearch. At the time, I struggled with search ranking and all those things. Then it broke with Jetpack and made my server core-dump. So in 2016 I tossed it in the can and walked away. After all, I didn’t need it. WP’s search was sufficient.

    Fastforward 4 years and with around 12k posts to search, guess what isn’t so okay anymore?

    Nothing.

    When you search on WordPress, it uses SQL queries to check in and find all instances of ‘a thing’ (whatever it is you searched for). So logically if you have a lot of posts (or a lot of content, be that in the form of a few huge posts or a high number of smaller ones), you’re going to experience slower searches.

    Also WordPress’ search isn’t customizable. You can’t tell it “Don’t search page X” or even “Prioritize post titles over content.” This leads to some odd results.

    But realistically neither of those issues are ‘wrong.’ Those are broad choices made to support 80% of WordPress users.

    This means your question of “What’s wrong with search?” is really “Are there specific cases wherein the default search won’t be the best choice for me?” And those two issues? They’re why. If your site is large (or getting there) and if you need to ‘weigh’ search results to prioritize A over B, then this post is for you.

    Solving The Right Problem

    While the first thing you always look at is “What do I need to solve?” by the time you get around to ranting about how WP search sucks, you kind of know where to start. That is, either search is too slow or you need to customize it. Or both.

    If you need to customize your search results, I recommend you look at plugins like Relevanssi, which does a great job of handling that. However there are two critical flaws for most (if not all) self-hosted plugin solutions. They’re going to make your database big. And let’s be clear here, a bigger DB is not going to help your speed issues. It becomes harder to back-up and more fragile. Relevanssi is refreshingly honest about this, warning you that your DB will triple in size, but also making sure you know that over 50k posts won’t work.

    Subsequently, a large site means you need to start looking at services. Algolia, Swiftype, ElasticSearch, and Solr are all amazing, viable, services. Some have plugins for easy WordPress integration, some do not. Some are open source, some are not. Some let you build your own… Let me just show you:

    NameOpen
    Source
    ServiceRoll Your
    Own
    Plugin
    AlgoliaNoYesNoUnofficial
    SwitftypeNoYesNoOfficial
    ElasticYesYesYesUnofficial
    SolrYesNoYesUnofficial

    You get the idea. Lots of options. And I did not pick Elastic (who owns Swiftype now, BTW).

    You see, Elastic is more than just a search. It’s really a whole database of your content. This means you can hook into it to speed up WordPress queries for long/large tables. But … That isn’t my problem. My problem is just search.

    Services are Spendy

    I ended up using a service because I was going bonkers. Seriously. The ‘directions’ for both Solr and Elastic are really terrible. They go in with some assumptions that you’ve done similar things (haven’t) and don’t have what I would call an ‘intro doc.’ Solr I got a lot further than Elastic, but WP integration was weird. And Elastic … No.

    Installing it was weirdly easy. The problem was I could not find any information about configuring it. People say “You must secure it.” Okay, sure, I can do that… But no one sat and explained why you want the nodes, what they do together, why you want them on separate servers (or even that you do) and .. Honestly I wanted to throw my laptop out the window.

    It does, mind you, bring up an important note. Search storage and Elastic services are expensive. Even Jetpack, who offers a bundled Elasticsearch integration (yes, that’s what Jetpack Search is) would cost a site with 10,000 posts around $600 a year. Even using Amazon’s Elasticsearch it’s going to run you a lot. How much? Well if you just toss in their defaults and accept the large settings, it’s to the tune of $22 a day. Give or take. Small settings for my site? Around $8 a day.

    ElasticPress (whom I do recommend if you’re using Elastic) starts at $79 a month. Jetpack’s new search is free for small sites, but for mine (again, we’re over 10k posts) it would begin at $60.

    Algolia though … 12k records is about $3 a month. And it’s all search.

    Enter Algolia

    The name meaning delights me:

    inability to speak due to mental deficiency or a manifestation of dementia.

    Because when you are searching, you often feel like you’re losing your mind and you have a problem. Search is hard okay? There’s a reason AltaVista, Lycos, Yahoo, and now Google are important. Searching is crazy weird and hard and sometimes it’s faster to go “lezwatchtv ACTOR NAME” than to search on our site.

    That was not good at all.

    Algolia is one of the more straightforward setups I’ve had in a while.

    1. Register on algolia.com
    2. Spin up a new app
    3. Install WP Search with Algolia
    4. Add in your keys
    5. Tell it what to search
    6. Tell it if you want auto-complete
    7. Tell it if you want a new search-result page
    8. Index
    9. Done

    Oh you know there’s a little more.

    Reducing Records

    I decided to make my records smaller. Algolia only cares about the number of records, not the size, as long as each record is under 10k. I have a lot of meta data and a lot of records. If I was to index everything, I’d be around 15k records, which isn’t bad but I really only needed about 12k of them.

    One of the odd things the plugin does is that it uses separate indexes for Auto-Complete. So I could store all my searchable posts and all my shows, characters, etc etc. Which would make for 50k records, and I didn’t want that. Sure it makes some aspects of search easier, but I knew I could do this a better way.

    I started by making a plugin and removing some records:

    add_filter( 'algolia_should_index_user',  'my_prefix_algolia_never_index' );
    add_filter( 'algolia_should_index_term', 'my_prefix_algolia_never_index' );
    
    function 'my_prefix_algolia_never_index'() {
    	return false;
    }
    

    This tells the plugin “Never index users or taxonomies” Most of you will want the taxonomies! I don’t, mostly because they don’t impact how people search really. And yes, I did study my logs. No one cares who wrote what for my site, and that’s okay.

    Refining Search Results

    Next I needed to make sure that autocomplete (which I use) and the search page both put the right content to the top.

    There is one and only one ‘flaw’ with Algolia, and that’s they don’t make it easy to define a ‘perfect’ match. I have a case where I have 5 post types (posts, pages, shows, actors, characters) and there’s crossover. If I search for “One Day at a Time” I get everything that mentions it. Which is not what I wanted. And while the title of the post I wanted to find was the TV show “One Day at a Time”, it was bringing up my blog posts (and the page!) first.

    This was solvable because the plugin is amazing. I filtered and told it what attributes to remove:

    add_filter( 'algolia_post_shared_attributes', 'my_prefix_algolia_attributes', 10, 2 );
    add_filter( 'algolia_searchable_post_shared_attributes', 'my_prefix_algolia_attributes', 10, 2 );
    
    function algolia_attributes( array $attributes, WP_Post $post ) {
    
    	// Remove things we're not using to make it easier.
    	$remove_array = array( 'taxonomies_hierarchical', 'post_excerpt', 'post_modified', 'comment_count', 'menu_order', 'taxonomies', 'post_author', 'post_mime_type' );
    	foreach ( $remove_array as $remove_this ) {
    		if ( isset( $attributes[ $remove_this ] ) ) {
    			unset( $attributes[ $remove_this ] );
    		}
    	}
    	return $attributes;
    }
    

    This ensured I keep my records small enough because I had some math to do.

    The function algolia_attributes() needed to promote certain posts over others, so I added in a switch using some data I already saved

    // Add Data for individual ranking
    switch ( $post->post_type ) {
    	case 'post_type_shows':
    		// Base score on show score + 50
    		$attributes['score'] = round( get_post_meta( $post->ID, 'lezshows_the_score', true ), 2 );
    		$attributes['score'] = 50 + (int) $attributes['score'];
    		break;
    	case 'post_type_characters':
    		$attributes['score'] = 150;
    		break;
    	case 'post_type_actors':
    		$attributes['score'] = 150;
    		break;
    	default:
    		$attributes['score'] = 0;
    		break;
    }
    

    This adds an attribute of ‘score’ based on post type. I could weigh the up or down as I wanted.

    Then I went into Algolia’s admin, and this is where the search tool becomes a champ. under Indices -> Configuration, I changed up the Ranking and Sorting:

    Their default is: [“typo”,”geo”,”words”,”filters”,”proximity”,”attribute”,”exact”,”custom”]

    Mine is:  [“exact”,”score”,”post_title”,”attribute”,”post_type_label”,”typo”,”proximity”,”words”, “is_sticky”, “post_date”]

    This actually handled 90% of what I needed without any custom tweaks or rules.

    But weight, there’s more!

    Those ‘Attributes’ are the searchable parts of the attributes I was messing with in the refinements section. Most of what they’re used for is helping rank and sort the relevant data to make sure Sara Lance is on top. Which she always is. But. I always wanted to make some related data show up.

    By default, the searchable attributes were title and content. I added in a new attribute called lwtv_meta and in it I added more data. When the index is built for a character (say), it adds a list of all the actors who play the character and all the shows they’re on into that meta attribute. Then I added that attribute to the search. This means if you look for “Legends of Tomorrow” you will see our girl Sara Lance.

    That has a small side effect though… Where’s the show!?

    So I still have some kinks to work out, but the point is that with a couple tweaks and some extra data, I got everything set up in 3 days. Bonus? The plugin came with templates I quickly tweaked to match my theme. And I’m bad at design!

    Algolia? Let’s be demented together!

  • Rolling Your Own Related Posts

    Rolling Your Own Related Posts

    To start out at the top, I did not write a whole ‘related posts’ plugin. As with all things, I started by asking myself “What’s the problem I’m trying to solve?”

    The answer is “I have a custom post type that needs to relate to other posts in the type, but based on my specific criteria which is currently organized into custom taxonomies and post meta.” And from the outset, that certainly sounds like a massive custom job. it was one I was dreading until I remembered that a developer I respected and trusted had once complained to me about the problems with all those other auto-related-posts plugins.

    1. They’re heavy and use a lot of ram
    2. They don’t let you customize ‘weight’ of relations
    3. They’re not extendable

    So I did the next logical thing and I looked up their plugins.

    The Plugin

    The crux of why I chose this plugin was simply that it’s extendable, but also that it started out with what I had:

    Posts with the most terms in common will display at the top!

    Perfect!

    Design The Basics

    Before you jump into coding, you need to know what you’re doing. I chose to isolate what I needed first. I made a list of everything I thought was relative:

    • Taxonomies: Tropes, Genres, Intersectionality, Tags, Stars
    • Post Meta: Worth It, Loved, Calculated Score

    Yes, it’s that site again.

    I read the plugin documentation and verified that for most of that I just needed to list the taxonomies in the shortcode like this:

    [related_posts_by_tax fields="ids" order="RAND" title="" format="thumbnails" image_size="postloop-img" link_caption="true" posts_per_page="6" columns="0" post_class="similar-shows" taxonomies="lez_tropes,lez_genres,lez_stars,lez_intersections,lez_showtagged"]
    

    Initially I didn’t list the stars because the way the code works, it would say “If you have a Gold Star, show other Gold Stars.” And that wasn’t what I wanted to see. I wanted “If you have ANY star, show other shows with a star.” That said, once we got over 12 shows in each ‘star’ category, this became much easier to match and I could add it in.

    The rest of the code, those checks for meta, needed actual code written.

    Meta Checks

    There’s a helpful filter, related_posts_by_taxonomy_posts_meta_query, that lets you filter the meta queries used by the post. Leveraging that, we can make our checks:

    1. Match the ‘worth it’ value of a show
    2. If the show is loved, list other loved show
    3. If the show isn’t loved, use the score to find show with the same relative value

    Both Worth It and Loved are post meta values. Mine happen to be added by CMB2, but the logic remains the same regardless how you add it. Worth It has four possible values (Yes, No, Maybe, TBD), and the check is either the value or false. Loved is a checkbox, a boolean exists or not, which means it’s a true/falsy. The score is a number that’s generated every time the show is saved, and it’s crazy complicated and another story.

    The code I use looks like this:

    add_filter( 'related_posts_by_taxonomy_posts_meta_query', 'MYSITE_RPBT_meta_query', 10, 4 );
    function MYSITE_RPBT_meta_query( $meta_query, $post_id, $taxonomies, $args ) {
    	$worthit = ( get_post_meta( $post_id, 'lezshows_worthit_rating', true ) ) ? get_post_meta( $post_id, 'lezshows_worthit_rating', true ) : false;
    	$loved   = ( get_post_meta( $post_id, 'lezshows_worthit_show_we_love', true ) ) ? true : false;
    	$score   = ( get_post_meta( $post_id, 'lezshows_the_score', true ) ) ? get_post_meta( $post_id, 'lezshows_the_score', true ) : 10;
    
    	// We should match up the worth-it value as well as the score.
    	// After all, some low scores have a thumbs up.
    	if ( false !== $worthit ) {
    		$meta_query[] = array(
    			'key'     => 'lezshows_worthit_rating',
    			'compare' => $worthit,
    		);
    	}
    
    	// If the show is loved, we want to include it here.
    	if ( $loved ) {
    		$meta_query[] = array(
    			'key'     => 'lezshows_worthit_show_we_love',
    			'compare' => 'EXISTS',
    		);
    	}
    
    	// If they're NOT loved, we use the scores for a value.
    	if ( ! $loved ) {
    		// Score: If the score is similar +/- 10
    		if ( $score >= 90 ) {
    			$score_range = array( 80, 100 );
    		} elseif ( $score <= 10 ) {
    			$score_range = array( 10, 30 );
    		} else {
    			$score_range = array( ( $score - 10 ), ( $score + 10 ) );
    		}
    		$meta_query[] = array(
    			'key'     => 'lezshows_the_score',
    			'value'   => $score_range,
    			'type'    => 'numeric',
    			'compare' => 'BETWEEN',
    		);
    	}
    
    	return $meta_query;
    }
    

    More Similar

    But there’s one more thing we wanted to include. When I built this out, Tracy said “There should be a way for us to pick the shows we think are similar!”

    She’s right! I built in a CMB2 repeatable field where you can pick shows from a dropdown and that saves the show post IDs as an array. That was the easy part, since we were already doing that in another place.

    Once that list exists, we grab the handpicked list, break it out into a simple array, check if the post is published and not already on the list, and combine it all:

    add_filter( 'related_posts_by_taxonomy', array( $this, 'alter_results' ), 10, 4 );
    function alter_results( $results, $post_id, $taxonomies, $args ) {
    	$add_results = array();
    
    	if ( ! empty( $results ) && empty( $args['fields'] ) ) {
    		$results = wp_list_pluck( $results, 'ID' );
    	}
    
    	$handpicked  = ( get_post_meta( $post_id, 'lezshows_similar_shows', true ) ) ? wp_parse_id_list( get_post_meta( $post_id, 'lezshows_similar_shows', true ) ) : array();
    	$reciprocity = self::reciprocity( $post_id );
    	$combo_list  = array_merge( $handpicked, $reciprocity );
    
    	if ( ! empty( $combo_list ) ) {
    		foreach ( $combo_list as $a_show ) {
    			//phpcs:ignore WordPress.PHP.StrictInArray
    			if ( 'published' == get_post_status( $a_show ) && ! in_array( $a_show, $results ) && ! in_array( $a_show, $add_results ) ) {
    				$add_results[] = $a_show;
    			}
    		}
    	}
    
    	$results = $add_results + $results;
    
    	return $results;
    }
    

    But … you may notice $reciprocity and wonder what that is.

    Well, in a perfect world if you added The Good Fight as a show similar to The Good Wife, you’d also go back and add The Good Wife to The Good Fight. The reality is humans are lazy. There were two ways to solve this reciprocity of likes issues.

    1. When a show is added as similar to a show, the code auto-adds it to the other show
    2. When the results are generated, the code checks if any other show likes the current show and adds it

    Since we’re already having saving speed issues (there’s a lot of back processing going on with the scores) and I’ve integrated caching, it was easier to pick option 2.

    function reciprocity( $post_id ) {
    	if ( ! isset( $post_id ) || 'post_type_shows' !== get_post_type( $post_id ) ) {
    		return;
    	}
    
    	$reciprocity      = array();
    	$reciprocity_loop = new WP_Query(
    		array(
    			'post_type'              => 'post_type_shows',
    			'post_status'            => array( 'publish' ),
    			'orderby'                => 'title',
    			'order'                  => 'ASC',
    			'posts_per_page'         => '100',
    			'no_found_rows'          => true,
    			'update_post_term_cache' => true,
    			'meta_query'             => array(
    				array(
    					'key'     => 'lezshows_similar_shows',
    					'value'   => $post_id,
    					'compare' => 'LIKE',
    				),
    			),
    		)
    	);
    
    	if ( $reciprocity_loop->have_posts() ) {
    		while ( $reciprocity_loop->have_posts() ) {
    			$reciprocity_loop->the_post();
    			$this_show_id = get_the_ID();
    			$shows_array  = get_post_meta( $this_show_id, 'lezshows_similar_shows', true );
    
    			if ( 'publish' === get_post_status( $this_show_id ) && isset( $shows_array ) && ! empty( $shows_array ) ) {
    				foreach ( $shows_array as $related_show ) {
    					if ( $related_show == $post_id ) {
    						$reciprocity[] = $this_show_id;
    					}
    				}
    			}
    		}
    		wp_reset_query();
    		$reciprocity = wp_parse_id_list( $reciprocity );
    	}
    
    	return $reciprocity;
    }
    

    There’s a little looseness with the checks, and because there are some cases were shows show up wrong because of the ids (ex: show 311 and 3112 would both be positive for a check on 311), we have to double up on the checks to make sure that the show is really the same.

    What’s Next?

    There are still some places I could adjust this. Like if I use more filters I can make the show stars worth ‘more’ than the genres and so on. And right now, due to the way most Anime are based on Manga (and thus get flagged as “Literary Inspired”), anything based on Sherlock Holmes ends up with a lot of recommended Anime.

    Still, this gives me a way more flexible way to list what’s similar.

  • Stopwords and Sort Queries

    Stopwords and Sort Queries

    The code I use is part and parcel from a comment Pascal Birchler made in 2015 and Birgir E. riffed on in 2016. I made one small change.

    The Problem

    People like to name TV shows with ‘The’ or ‘A’ or ‘An’ as the first word. “The Fall” and “The Good Wife” for example. However, when we order such things in a human sensible way, “A Touch of Cloth” should be listed behind both of those.

    • Frankenstein
    • The Fall
    • Grey’s Anatomy
    • The Good Wife
    • A Touch of Cloth

    WordPress, though, sees them as absolutes and you get this:

    • A Touch of Cloth
    • Frankenstein
    • Grey’s Anatomy
    • The Fall
    • The Good Wife

    Not quite right, is it?

    The Fix

    This code does two things.

    The first part is the filter on the posts_orderby function. That checks if the post type is the one I want to filter (in my case, only shows), and if so, use regex to filter out my stop words of ‘the ‘, ‘an ‘, and ‘a ‘. The extra space in each word is important. I want to reorder “The Fall” and not “Then They Fall” after all!

    The second part is the actual filter on the title, to mess with it only for the ordering of posts.

    add_filter( 'posts_orderby', function( $orderby, \WP_Query $q ) {
        if( 'post_type_shows' !== $q->get( 'post_type' ) )
            return $orderby;
    
        global $wpdb;
    
        // Adjust this to your needs:
        $matches = [ 'the ', 'an ', 'a ' ];
    
        return sprintf(
            " %s %s ",
            MYSITE_shows_posts_orderby_sql( $matches, " LOWER( {$wpdb->posts}.post_title) " ),
            'ASC' === strtoupper( $q->get( 'order' ) ) ? 'ASC' : 'DESC'
        );
    
    }, 10, 2 );
    
    function MYSITE_shows_posts_orderby_sql( &$matches, $sql )
    {
        if( empty( $matches ) || ! is_array( $matches ) )
            return $sql;
    
        $sql = sprintf( " TRIM( LEADING '%s' FROM ( %s ) ) ", $matches[0], $sql );
        array_shift( $matches );
        return MYSITE_shows_posts_orderby_sql( $matches, $sql );
    }
    

    If you’re using MariaDB, this can be even easier, but I have to test on my dev site, which uses MySQL.

  • Hugo and Lunr – Client Side Searching

    Hugo and Lunr – Client Side Searching

    I use Hugo on a static website that has few updates but still needs a bit of maintenance. With a few thousand pages, it also needs a search. For a long time, I was using a Google Custom Search, but I’m not the biggest Google fan and they insert ads now, so I needed a new solution.

    Search is the Worst Part

    Search is the worst thing about static sites. Scratch that. Search is the worst part about any site. We all bag on WordPress’ search being terrible, but anyone who’s attempted to install and manage anything like ElasticSearch knows that WordPress’ search is actually pretty good. It’s just limited. And by contrast, the complicated world of search is, well, complicated.

    That’s the beauty of many CMS tools like WordPress and Drupal and MediaWiki is that they have a rudimentary and perfectly acceptable search built in. And it’s the headache of static tools like Jekyll and Hugo. They simply don’t have it.

    Lunr

    If you don’t want to use third-party services, and are interested in self hosting your solution, then you’re going to have to look at a JavaScript solution. Mine was Lunr.js, a fairly straightforward tool that searched a JSON file for the items.

    There are pros and cons to this. Having it all in javascript means the load on my server is pretty low. At the same time I have to generate the JSON file somehow every time. In addition, every time someone goes to the search page, they have to download that JSON file, which can get pretty big. Mine’s 3 megs for 2000 or so pages. That’s something I need to keep in mind.

    This is, by the way, the entire reason I made that massive JSON file the other day.

    To include Lunrjs in your site, download the file and put it in your /static/ folder however you want. I have it at /static/js/lunr.js next to my jquery.min.js file. Now when you build your site, the JS file will be copied into place.

    The Code

    Since this is for Hugo, it has two steps. The first is the markdown code to make the post and the second is the template code to do the work.

    Post: Markdown

    The post is called search.md and this is the entirety of it:

    ---
    layout: search
    title: Search Results
    permalink: /search/
    categories: ["Search"]
    tags: ["Index"]
    noToc: true
    ---
    

    Yep. That’s it.

    Template: HTML+GoLang+JS

    I have a template file in layouts/_default/ called search.html and that has all the JS code as well as everything else. This is shamelessly forked from Seb’s example code.

    {{ partial "header.html" . }}
    
    	{{ .Content }}
    
    	<h3>Search:</h3>
    	<input id="search" type="text" id="searchbox" placeholder="Just start typing...">
    
    	<h3>Results:</h3>
    	<ul id="results"></ul>
    
    	<script type="text/javascript" src="/js/lunr.js"></script>
    	<script type="text/javascript">
    	var lunrIndex, $results, pagesIndex;
    
    	function getQueryVariable(variable) {
    		var query = window.location.search.substring(1);
    		var vars = query.split('&');
    
    		for (var i = 0; i < vars.length; i++) {
    			var pair = vars[i].split('=');
    
    			if (pair[0] === variable) {
    				return decodeURIComponent(pair[1].replace(/\+/g, '%20'));
    			}
    		}
    	}
    
    	var searchTerm = getQueryVariable('query');
    
    	// Initialize lunrjs using our generated index file
    	function initLunr() {
    		// First retrieve the index file
    		$.getJSON("/index.json")
    			.done(function(index) {
    				pagesIndex = index;
    				console.log("index:", pagesIndex);
    				lunrIndex = lunr(function() {
    					this.field("title", { boost: 10 });
    					this.field("tags", { boost: 5 });
    					this.field("categories", { boost: 5 });
    					this.field("content");
    					this.ref("uri");
    
    					pagesIndex.forEach(function (page) {
    						this.add(page)
    					}, this)
    				});
    			})
    			.fail(function(jqxhr, textStatus, error) {
    				var err = textStatus + ", " + error;
    				console.error("Error getting Hugo index flie:", err);
    			});
    	}
    
    	// Nothing crazy here, just hook up a listener on the input field
    	function initUI() {
    		$results = $("#results");
    		$("#search").keyup(function() {
    			$results.empty();
    
    			// Only trigger a search when 2 chars. at least have been provided
    			var query = $(this).val();
    			if (query.length < 2) {
    				return;
    			}
    
    			var results = search(query);
    
    			renderResults(results);
    		});
    	}
    
    	/**
    	 * Trigger a search in lunr and transform the result
    	 *
    	 * @param  {String} query
    	 * @return {Array}  results
    	 */
    	function search(query) {
    		return lunrIndex.search(query).map(function(result) {
    				return pagesIndex.filter(function(page) {
    					return page.uri === result.ref;
    				})[0];
    			});
    	}
    
    	/**
    	 * Display the 10 first results
    	 *
    	 * @param  {Array} results to display
    	 */
    	function renderResults(results) {
    		if (!results.length) {
    			return;
    		}
    
    		// Only show the ten first results
    		results.slice(0, 100).forEach(function(result) {
    			var $result = $("<li>");
    			$result.append($("<a>", {
    				href: result.uri,
    				text: "» " + result.title
    			}));
    			$results.append($result);
    		});
    	}
    
    	// Let's get started
    	initLunr();
    
    	$(document).ready(function() {
    		initUI();
    	});
    	</script>
    {{ partial "footer.html" . }}
    

    It’s important to note you will also need to call jQuery but I do that in my header.html file since I have a bit of jQuery I use on every page. If you don’t, then remember to include it up by <script type="text/javascript" src="/js/lunr.js"></script> otherwise nothing will work.

    Caveats

    If you have a large search file, this will make your search page slow to load.

    Also I don’t know how to have a form on one page trigger the search on another, but I’m making baby steps in my javascripting.

  • Prettier Search Queries

    Prettier Search Queries

    By default, when you search on a WordPress site, your search URL has an /?s= parameter. Back in the old days of WordPress, we all had URLs like /?p=123 where 123 was the page ID. With the advent of Pretty Permalinks, we moved to pretty URLs like /2016/prettier-search-queries/ and everyone was happier.

    What about search?

    As it happens, the WP Rewrite API actually has a search base of … search. If you go to your Settings > Permalinks page, you won’t see it there, and yet on every site if you go to https://halfelf.org/search/apache you’ll actually get that nice, pretty path.

    Because of that, you could get away with adding this to your .htaccess file in order to get those pretty URLs.

    RewriteCond %{QUERY_STRING} s=(.*)
    RewriteRule ^$ /search/%1? [R,L]
    

    You can also use a plugin like Mark Jaquith’s Nice Search.

    Those methods work for nearly all sites.

    But you know me. I’m not ‘all’ sites.

    Extra Paramater Headache

    I had a different problem. Because my site had specialized data, it had extra search parameters. I was intentionally limiting my search to specific post type. This meant my URLs looked like this: /?s=rookie+blue&post_type[]=post_type_shows

    When I translated that to use the pretty search, well …/search/rookie+blue&post_type[]=post_type_shows just didn’t work.

    This is for a pretty obvious reason when you study the URLs. The first one has ?s=... and then later an &, while the second only has the & in there. If I changed the URL to this, it worked: /search/rookie+blue/?post_type[]=post_type_shows

    The reason for this was due to how parameters work in URLs. They have to start with ? at the beginning. All additional parameters are added with ?param=value after that.

    Semi Pretty Search Permalinks

    To me, the nicest URLs would be `/search/rookie+blue/section/shows/’. The reality is that people will search shows and characters and I wasn’t quite sure how I wanted to handle that. Did I want them to be sections separated by plus signs, or extra ‘folders’ or what? In the end, I decided that for now it was okay to just make these prettier.

    Taking Mark’s code as my start point, I came up with this:

    function pretty_permalink_search_redirect() {
    	// grab rewrite globals (tag prefixes, etc)
    	// https://codex.wordpress.org/Class_Reference/WP_Rewrite
    	global $wp_rewrite;
    
    	// if we can't get rewrites or permalinks, we're probably not using pretty permalinks
    	if ( !isset( $wp_rewrite ) || !is_object( $wp_rewrite ) || !$wp_rewrite->using_permalinks() )
    		return;
    
    	// Set Search Base - default is 'search'
    	$search_base = $wp_rewrite->search_base;
    
    	if ( is_search() && !is_admin() && strpos( $_SERVER['REQUEST_URI'], "/{$search_base}/" ) === false ) {
    
    		// Get Post Types
    		$query_post_types = get_query_var('post_type');
    		if ( is_null($query_post_types) || empty($query_post_types) || !array($query_post_types) ) {
    			$query_post_types = array( 'post_type_characters', 'post_type_shows' );
    		}
    
    		$query_post_type_url = '/?';
    		foreach ( $query_post_types as $value ) {
    			$query_post_type_url .= '&post_type[]=' . $value ;
    		}
    
    		wp_redirect(
    			home_url( "/{$search_base}/"
    			. urlencode( get_query_var( 's' ) )
    			. urldecode( $query_post_type_url )
    			) );
    		exit();
    	}
    }
    add_action( 'template_redirect', 'pretty_permalink_search_redirect' );
    

    And that actually does work exactly as I want it to.