Categories
How To

Lesbians Eat Data

Debugging why the data on my site killed my server was astounding.

The original title was “Lesbians Broke Jetpack” but it turned out to be even more complicated than all that. And thankfully more rare.

This concerns three things.

1) The website lezwatchtv.com
2) Jetpack for WordPress
3) ElasticSearch

On Sept 6th, Jetpack released a new version – 4.3 – and I promptly upgraded. When I did, I started getting weird emails from my server of a “Suspicious process running under user lezwatchtv” and the content looked like this:

Executable:

/usr/bin/php-cgi

Command Line (often faked in exploits):

/usr/bin/php-cgi /home/lezwatchtv/public_html/wp-cron.php


Network connections by the process (if any):

tcp: MYIP:39734 -> JETPACKIP:443

Being a proper code-nerd, I backed out a few things and tried again. Same error. I went into my process watcher and saw five processes calling wp-cron.php for that domain, but no others on the server. I killed the processes and turned off WordPress cron. Everything was fine. Then I installed WP Crontrol and manually kicked off cron jobs until it happened again.

The culprit was a ‘runs everyone one minute’ job by Jetpack, which struck me as bewildering.

		if ( ! wp_next_scheduled( 'jetpack_sync_cron' ) ) {
			// Schedule a job to send pending queue items once a minute
			wp_schedule_event( time(), '1min', 'jetpack_sync_cron' );
		}

The sync job is meant to update your data on Jetpack’s servers, which makes sense, and running every minute will copy up everything that changed in each minute. It seemed a little heavy to me, and disabling it stopped my run-away cron jobs. That meant the sync was failing. I reached out to a Jetpack tech and explained the situation. He re-ran the sync manually and it stalled.

We determined the likely issue was that the job was, for some reason, hanging and unable to finish, so it would just stay active forever. And ever. And since it would see that the sync had never done, it would start up all over again until, finally, my server killed the five (yes, five) processes and sent me an angry text about it. Yes, my server texts me.

At this point I emailed support with full details and got a very insightful reply from Brandon Kraft:

I’m interested in if there’s an issue with the server connecting with WP.com (seems unlikely given your other sites sound fine), if there’s a large amount of postmeta or something like that that is throwing a wrench into the system, or something to that effect. We’ve isolated some odd cases where when there is either a lot of postmeta or something yet undetermined in postmeta breaks things in a way similar to what you saw.

DING!

See there are 40 posts, 22 pages and then 1246 Custom Post posts on LezWatchTV.

906 posts are ‘characters’ and all characters have three separate taxonomies, two plain text post-meta values, and two serialized. 340 posts are ‘shows’ with two taxonomies, three plain text post-meta values, three integer (plain text) post-meta values, one true/false, six HTML, and one serialized data.

So if I was going to point at “a site with lot of weird post meta” I would pick this site.

I spent a few hours on the 7th (the day after the release) beta testing their 4.3.1 version. We tried a patch for the bug where full sync wasn’t giving up on wp error. That helped a little, but the error kept happening, limiting itself to two or three processes. I pointed to a special API, I ran some weird wp shell commands, and all we came up with was that at 190 or so ‘chunks’ out of 443, my server would stop sending messages to Jetpack’s servers.

Eventually I zipped up a copy of the theme and plugins and a sanitized DB (all secret information removed) and sent it over for them to play with. And they reproduced it! That was good. It meant it wasn’t my server, but it was my setup and the way Jetpack’s sync worked.

Like everything that has to sync, Jetpack plays the game between ‘sync it all super fast’ and ‘don’t kill the server.’ The way they sync the posts, they apply filters to render the content, including embeds. Because it does that with embeds, it triggers update_post_meta to update the _oembed_time_{long_base64_string} value, so it can know when to update the embed code for best caching.

Wasn’t I just talking about post meta the other day? Why yes! I was talking about optimizing post meta for search! The interesting thing about that is, since I’m using ElasticPress, it scans all my post meta for updates so it knows what to save as searchable data. That means when Jetpack triggers the update, it triggers ElasticPress, and all hell breaks loose.

But why did this happen now? Because I turned on “Sitemaps” for Jetpack. And when you enable (or disable) a Jetpack Module, it triggers a full sync. This happened to be the first time I’d done that since installing ElasticPress.

I did what any responsible person would do, and wrote this all up and submitted a bug report with ElasticPress. Sadly for now I’ve disabled ElasticPress until this can be resolved. I can probably turn it back on safely, since I won’t be triggering a full sync any time soon, but since I don’t want to accidentally crash things, I’ve left it off.

And how was your week?

2 replies on “Lesbians Eat Data”

Mika – I’m not anywhere close to being as smart as you are and much of this is over my head, BUT this post speaks to me!

I’ve been having issues with my site since I migrated to a new host. The host tells me I use up the resources. We’ve been going back and forth for months now with tech support. They can’t tell me what it is, just that I’m drawing large amounts of resources. I have jetpack with a few modules active. I don’t have ElasticPress so that’s not my problem. I’ve got a bunch of plugins and have disabled them all and activated one at a time and still haven’t found the issue.

Not that you are my own personal tech support, but might you happen to have an idea that could point me in the right direction?

The trick is to read the logs. If you’re using a lot of resources then there is a record somewhere of what’s too high. Server error logs, access logs, and even normal stats. They can all tell you where things are being used. You’ll have to figure out why though 🙁 if it’s WP then things like crontrol will tell you what’s running on that end. Good luck.

Comments are closed.