Categories
How To

Algolia: Search Faster

Being driven to dementia by search woes, it’s time to embrace insanity and find out what you really need.

Note: While this post is about using Algolia, the irony is that shortly after I posted it, I removed Algolia. The reason being, InstantSearch counts as a separate search per letter used — that means I was about to skyrocket over my allowance and hit the thousands-a-month. I feel their pricing was quite unclear about this. But hey, now you know!

My friends, it’s been a while. And if you follow my rants on Twitter (this is not a suggestion you should). you saw I faced off with ElasticSearch, my nemesis.

Moons ago, I attempted to use it to make a site I run faster by using ElasticSearch. At the time, I struggled with search ranking and all those things. Then it broke with Jetpack and made my server core-dump. So in 2016 I tossed it in the can and walked away. After all, I didn’t need it. WP’s search was sufficient.

Fastforward 4 years and with around 12k posts to search, guess what isn’t so okay anymore?

Nothing.

When you search on WordPress, it uses SQL queries to check in and find all instances of ‘a thing’ (whatever it is you searched for). So logically if you have a lot of posts (or a lot of content, be that in the form of a few huge posts or a high number of smaller ones), you’re going to experience slower searches.

Also WordPress’ search isn’t customizable. You can’t tell it “Don’t search page X” or even “Prioritize post titles over content.” This leads to some odd results.

But realistically neither of those issues are ‘wrong.’ Those are broad choices made to support 80% of WordPress users.

This means your question of “What’s wrong with search?” is really “Are there specific cases wherein the default search won’t be the best choice for me?” And those two issues? They’re why. If your site is large (or getting there) and if you need to ‘weigh’ search results to prioritize A over B, then this post is for you.

Top ↑

Solving The Right Problem Solving The Right Problem

While the first thing you always look at is “What do I need to solve?” by the time you get around to ranting about how WP search sucks, you kind of know where to start. That is, either search is too slow or you need to customize it. Or both.

If you need to customize your search results, I recommend you look at plugins like Relevanssi, which does a great job of handling that. However there are two critical flaws for most (if not all) self-hosted plugin solutions. They’re going to make your database big. And let’s be clear here, a bigger DB is not going to help your speed issues. It becomes harder to back-up and more fragile. Relevanssi is refreshingly honest about this, warning you that your DB will triple in size, but also making sure you know that over 50k posts won’t work.

Subsequently, a large site means you need to start looking at services. Algolia, Swiftype, ElasticSearch, and Solr are all amazing, viable, services. Some have plugins for easy WordPress integration, some do not. Some are open source, some are not. Some let you build your own… Let me just show you:

NameOpen
Source
ServiceRoll Your
Own
Plugin
AlgoliaNoYesNoUnofficial
SwitftypeNoYesNoOfficial
ElasticYesYesYesUnofficial
SolrYesNoYesUnofficial

You get the idea. Lots of options. And I did not pick Elastic (who owns Swiftype now, BTW).

You see, Elastic is more than just a search. It’s really a whole database of your content. This means you can hook into it to speed up WordPress queries for long/large tables. But … That isn’t my problem. My problem is just search.

Top ↑

Services are Spendy Services are Spendy

I ended up using a service because I was going bonkers. Seriously. The ‘directions’ for both Solr and Elastic are really terrible. They go in with some assumptions that you’ve done similar things (haven’t) and don’t have what I would call an ‘intro doc.’ Solr I got a lot further than Elastic, but WP integration was weird. And Elastic … No.

Installing it was weirdly easy. The problem was I could not find any information about configuring it. People say “You must secure it.” Okay, sure, I can do that… But no one sat and explained why you want the nodes, what they do together, why you want them on separate servers (or even that you do) and .. Honestly I wanted to throw my laptop out the window.

It does, mind you, bring up an important note. Search storage and Elastic services are expensive. Even Jetpack, who offers a bundled Elasticsearch integration (yes, that’s what Jetpack Search is) would cost a site with 10,000 posts around $600 a year. Even using Amazon’s Elasticsearch it’s going to run you a lot. How much? Well if you just toss in their defaults and accept the large settings, it’s to the tune of $22 a day. Give or take. Small settings for my site? Around $8 a day.

ElasticPress (whom I do recommend if you’re using Elastic) starts at $79 a month. Jetpack’s new search is free for small sites, but for mine (again, we’re over 10k posts) it would begin at $60.

Algolia though … 12k records is about $3 a month. And it’s all search.

Top ↑

Enter Algolia Enter Algolia

The name meaning delights me:

inability to speak due to mental deficiency or a manifestation of dementia.

Because when you are searching, you often feel like you’re losing your mind and you have a problem. Search is hard okay? There’s a reason AltaVista, Lycos, Yahoo, and now Google are important. Searching is crazy weird and hard and sometimes it’s faster to go “lezwatchtv ACTOR NAME” than to search on our site.

That was not good at all.

Algolia is one of the more straightforward setups I’ve had in a while.

  1. Register on algolia.com
  2. Spin up a new app
  3. Install WP Search with Algolia
  4. Add in your keys
  5. Tell it what to search
  6. Tell it if you want auto-complete
  7. Tell it if you want a new search-result page
  8. Index
  9. Done

Oh you know there’s a little more.

Top ↑

Reducing Records Reducing Records

I decided to make my records smaller. Algolia only cares about the number of records, not the size, as long as each record is under 10k. I have a lot of meta data and a lot of records. If I was to index everything, I’d be around 15k records, which isn’t bad but I really only needed about 12k of them.

One of the odd things the plugin does is that it uses separate indexes for Auto-Complete. So I could store all my searchable posts and all my shows, characters, etc etc. Which would make for 50k records, and I didn’t want that. Sure it makes some aspects of search easier, but I knew I could do this a better way.

I started by making a plugin and removing some records:

add_filter( 'algolia_should_index_user',  'my_prefix_algolia_never_index' );
add_filter( 'algolia_should_index_term', 'my_prefix_algolia_never_index' );

function 'my_prefix_algolia_never_index'() {
	return false;
}

This tells the plugin “Never index users or taxonomies” Most of you will want the taxonomies! I don’t, mostly because they don’t impact how people search really. And yes, I did study my logs. No one cares who wrote what for my site, and that’s okay.

Top ↑

Refining Search Results Refining Search Results

Next I needed to make sure that autocomplete (which I use) and the search page both put the right content to the top.

There is one and only one ‘flaw’ with Algolia, and that’s they don’t make it easy to define a ‘perfect’ match. I have a case where I have 5 post types (posts, pages, shows, actors, characters) and there’s crossover. If I search for “One Day at a Time” I get everything that mentions it. Which is not what I wanted. And while the title of the post I wanted to find was the TV show “One Day at a Time”, it was bringing up my blog posts (and the page!) first.

This was solvable because the plugin is amazing. I filtered and told it what attributes to remove:

add_filter( 'algolia_post_shared_attributes', 'my_prefix_algolia_attributes', 10, 2 );
add_filter( 'algolia_searchable_post_shared_attributes', 'my_prefix_algolia_attributes', 10, 2 );

function algolia_attributes( array $attributes, WP_Post $post ) {

	// Remove things we're not using to make it easier.
	$remove_array = array( 'taxonomies_hierarchical', 'post_excerpt', 'post_modified', 'comment_count', 'menu_order', 'taxonomies', 'post_author', 'post_mime_type' );
	foreach ( $remove_array as $remove_this ) {
		if ( isset( $attributes[ $remove_this ] ) ) {
			unset( $attributes[ $remove_this ] );
		}
	}
	return $attributes;
}

This ensured I keep my records small enough because I had some math to do.

The function algolia_attributes() needed to promote certain posts over others, so I added in a switch using some data I already saved

// Add Data for individual ranking
switch ( $post->post_type ) {
	case 'post_type_shows':
		// Base score on show score + 50
		$attributes['score'] = round( get_post_meta( $post->ID, 'lezshows_the_score', true ), 2 );
		$attributes['score'] = 50 + (int) $attributes['score'];
		break;
	case 'post_type_characters':
		$attributes['score'] = 150;
		break;
	case 'post_type_actors':
		$attributes['score'] = 150;
		break;
	default:
		$attributes['score'] = 0;
		break;
}

This adds an attribute of ‘score’ based on post type. I could weigh the up or down as I wanted.

Then I went into Algolia’s admin, and this is where the search tool becomes a champ. under Indices -> Configuration, I changed up the Ranking and Sorting:

Their default is: [“typo”,”geo”,”words”,”filters”,”proximity”,”attribute”,”exact”,”custom”]

Mine is:  [“exact”,”score”,”post_title”,”attribute”,”post_type_label”,”typo”,”proximity”,”words”, “is_sticky”, “post_date”]

This actually handled 90% of what I needed without any custom tweaks or rules.

Top ↑

But weight, there’s more! But weight, there’s more!

Those ‘Attributes’ are the searchable parts of the attributes I was messing with in the refinements section. Most of what they’re used for is helping rank and sort the relevant data to make sure Sara Lance is on top. Which she always is. But. I always wanted to make some related data show up.

By default, the searchable attributes were title and content. I added in a new attribute called lwtv_meta and in it I added more data. When the index is built for a character (say), it adds a list of all the actors who play the character and all the shows they’re on into that meta attribute. Then I added that attribute to the search. This means if you look for “Legends of Tomorrow” you will see our girl Sara Lance.

That has a small side effect though… Where’s the show!?

So I still have some kinks to work out, but the point is that with a couple tweaks and some extra data, I got everything set up in 3 days. Bonus? The plugin came with templates I quickly tweaked to match my theme. And I’m bad at design!

Algolia? Let’s be demented together!