Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: debug

  • Custom Columns Search: Not unique table/alias

    Custom Columns Search: Not unique table/alias

    After making Custom Sortable Columns everything was awesome, right?

    Well… Mostly.

    When I tried to sort a search, I got this:

    WordPress database error: [Not unique table/alias: 'wp_postmeta']
    SELECT SQL_CALC_FOUND_ROWS DISTINCT wp_posts.ID 
    FROM wp_posts INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id ) 
    LEFT JOIN wp_postmeta ON wp_posts.ID = wp_postmeta.post_id WHERE 1=1 
    AND (((wp_posts.post_title LIKE '%kate%') OR ( (wp_postmeta.meta_key 
    IN ( 'lezchars_actor', 'lezshows_worthit_details', 'lezshows_plots', 
    'lezshows_episodes', 'lezshows_realness_details', 'lezshows_quality_details', 
    'lezshows_screentime_details' ) ) AND (wp_postmeta.meta_value LIKE '%kate%') ) 
    OR (wp_posts.post_excerpt LIKE '%kate%') OR (wp_posts.post_content LIKE '%kate%'))) 
    AND ( wp_postmeta.meta_key = 'lezchars_type' ) AND wp_posts.post_type = 'post_type_characters' 
    AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR 
    wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR 
    wp_posts.post_status = 'private') GROUP BY wp_posts.ID ORDER BY 
    wp_postmeta.meta_value DESC LIMIT 0, 20
    

    Not good.

    And not actually (really) related to the sortability! No no, I did this to myself by adding in optimized post-meta search. You see, that runs on the front and the back end of WordPress.

    I knew it had to be the search tweaks I’d made when I saw this:

    OR ( (wp_postmeta.meta_key IN ( 'lezchars_actor', 'lezshows_worthit_details', 
    'lezshows_plots', 'lezshows_episodes', 'lezshows_realness_details', 
    'lezshows_quality_details', 'lezshows_screentime_details' ) )
    

    Right away I realized it had to be my custom queries both calling from wp_postmeta and, in doing so, stemming all over each other. Not good at all.

    The fix was relatively simple. All I had to do was move the filters to run only if we’re not in wp-admin.

    /**
     * Only run if we're NOT in the admin screen!
     */
    
    if ( ! is_admin() ) {
    	add_filter( 'posts_join', 'lezwatch_search_join' );
    	add_filter( 'posts_where', 'lezwatch_search_where' );
    	add_filter( 'posts_distinct', 'lezwatch_search_distinct' );
    }
    

    This meant that the internal post search wouldn’t query those fields, but in the grand scheme of things, that’s okay. Generally when I’m looking for a post in the search, I’m looking for it by title. If I wanted to be more particular, I could have it not use my custom joins when I’m on a post listing page, but allow it if I’m editing a post (so adding a new link would search all that content too).

    For now, this is what I need.

  • Lesbians Eat Data

    Lesbians Eat Data

    The original title was “Lesbians Broke Jetpack” but it turned out to be even more complicated than all that. And thankfully more rare.

    This concerns three things.

    1) The website lezwatchtv.com
    2) Jetpack for WordPress
    3) ElasticSearch

    On Sept 6th, Jetpack released a new version – 4.3 – and I promptly upgraded. When I did, I started getting weird emails from my server of a “Suspicious process running under user lezwatchtv” and the content looked like this:

    Executable:
    
    /usr/bin/php-cgi
    
    Command Line (often faked in exploits):
    
    /usr/bin/php-cgi /home/lezwatchtv/public_html/wp-cron.php
    
    
    Network connections by the process (if any):
    
    tcp: MYIP:39734 -> JETPACKIP:443
    

    Being a proper code-nerd, I backed out a few things and tried again. Same error. I went into my process watcher and saw five processes calling wp-cron.php for that domain, but no others on the server. I killed the processes and turned off WordPress cron. Everything was fine. Then I installed WP Crontrol and manually kicked off cron jobs until it happened again.

    The culprit was a ‘runs everyone one minute’ job by Jetpack, which struck me as bewildering.

    		if ( ! wp_next_scheduled( 'jetpack_sync_cron' ) ) {
    			// Schedule a job to send pending queue items once a minute
    			wp_schedule_event( time(), '1min', 'jetpack_sync_cron' );
    		}
    

    The sync job is meant to update your data on Jetpack’s servers, which makes sense, and running every minute will copy up everything that changed in each minute. It seemed a little heavy to me, and disabling it stopped my run-away cron jobs. That meant the sync was failing. I reached out to a Jetpack tech and explained the situation. He re-ran the sync manually and it stalled.

    We determined the likely issue was that the job was, for some reason, hanging and unable to finish, so it would just stay active forever. And ever. And since it would see that the sync had never done, it would start up all over again until, finally, my server killed the five (yes, five) processes and sent me an angry text about it. Yes, my server texts me.

    At this point I emailed support with full details and got a very insightful reply from Brandon Kraft:

    I’m interested in if there’s an issue with the server connecting with WP.com (seems unlikely given your other sites sound fine), if there’s a large amount of postmeta or something like that that is throwing a wrench into the system, or something to that effect. We’ve isolated some odd cases where when there is either a lot of postmeta or something yet undetermined in postmeta breaks things in a way similar to what you saw.

    DING!

    See there are 40 posts, 22 pages and then 1246 Custom Post posts on LezWatchTV.

    906 posts are ‘characters’ and all characters have three separate taxonomies, two plain text post-meta values, and two serialized. 340 posts are ‘shows’ with two taxonomies, three plain text post-meta values, three integer (plain text) post-meta values, one true/false, six HTML, and one serialized data.

    So if I was going to point at “a site with lot of weird post meta” I would pick this site.

    I spent a few hours on the 7th (the day after the release) beta testing their 4.3.1 version. We tried a patch for the bug where full sync wasn’t giving up on wp error. That helped a little, but the error kept happening, limiting itself to two or three processes. I pointed to a special API, I ran some weird wp shell commands, and all we came up with was that at 190 or so ‘chunks’ out of 443, my server would stop sending messages to Jetpack’s servers.

    Eventually I zipped up a copy of the theme and plugins and a sanitized DB (all secret information removed) and sent it over for them to play with. And they reproduced it! That was good. It meant it wasn’t my server, but it was my setup and the way Jetpack’s sync worked.

    Like everything that has to sync, Jetpack plays the game between ‘sync it all super fast’ and ‘don’t kill the server.’ The way they sync the posts, they apply filters to render the content, including embeds. Because it does that with embeds, it triggers update_post_meta to update the _oembed_time_{long_base64_string} value, so it can know when to update the embed code for best caching.

    Wasn’t I just talking about post meta the other day? Why yes! I was talking about optimizing post meta for search! The interesting thing about that is, since I’m using ElasticPress, it scans all my post meta for updates so it knows what to save as searchable data. That means when Jetpack triggers the update, it triggers ElasticPress, and all hell breaks loose.

    But why did this happen now? Because I turned on “Sitemaps” for Jetpack. And when you enable (or disable) a Jetpack Module, it triggers a full sync. This happened to be the first time I’d done that since installing ElasticPress.

    I did what any responsible person would do, and wrote this all up and submitted a bug report with ElasticPress. Sadly for now I’ve disabled ElasticPress until this can be resolved. I can probably turn it back on safely, since I won’t be triggering a full sync any time soon, but since I don’t want to accidentally crash things, I’ve left it off.

    And how was your week?

  • Debugging cPanel’s Default Webpage

    Debugging cPanel’s Default Webpage

    It started with a weird email from someone complaining that a 5 year old link was broken. They were trying to go to tech.ipstenu.org. I don’t, and haven’t used that since maybe 2011 or so. That was when I bought halfelf.org you see. I knew the domain should be forwarding, I set that up a million years ago, but for some reason it wasn’t. I told him the right URL and went back to puttering around.

    But it bugged me, you know?

    And later that day, half my domains started spazzing. It turned out they were still pointing to the ‘temporary’ name servers, ns3 and ns4. I cleaned up my DNS zones and rebuilt them (thank you Dan E. from Liquidweb) but for some reason it was still derping.

    Now… as you know, I set up AutoSSL and Let’s Encrypt, like a good internet monkey.

    In the middle of all this shit, I thought to myself ‘Self, I should fix having a subdomain as an add-on which I don’t need anymore now that we have this set up!’ I deleted store.halfelf.org as an add-on and put it back properly as a named subdomain.

    Then I went and properly re-ran the AutoSSL check…

    Errors:

    3:43:30 AM WARN The domain “store.halfelf.org” has failed domain control validation (The system failed to fetch the <abbr title="Domain Control Validation">DCV</abbr> file at “<a href="http://store.halfelf.org/3712.BIN_AUTOSSL_CHECK_PL__.MREaLFbJJfusZuQX.tmp">http://store.halfelf.org/3712.BIN_AUTOSSL_CHECK_PL__.MREaLFbJJfusZuQX.tmp</a>” because of an error: The system failed to send an <abbr title="Hypertext Transfer Protocol">HTTP</abbr> “GET” request to “http://store.halfelf.org/3712.BIN_AUTOSSL_CHECK_PL__.MREaLFbJJfusZuQX.tmp” because of an error: SSL connection failed for store.halfelf.org: hostname verification failed .). at bin/autossl_check.pl line 449.
    

    I read down and saw I had this error for ALL the bad domains. Coincidence? I think not. And neither do you, right? Right.

    I did what you do and Googled and Googled and came across people saying that it was Sucuri (nope) or some other CloudFlare type firewall (nope), and then I thought about the crux of the error. “SSL connection failed” is a pretty distinct error, I felt. And of course the SSL connection failed, there wasn’t a certificate yet! So why was it trying to get to SSL right away?

    And then I remembered … I have this in my .htaccess

    # Force non WWW and SSL for everyone.
    <IfModule mod_rewrite.c>
            RewriteEngine On
    
            RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
            RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
    
            RewriteCond %{HTTPS} off
            RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
    <IfModule mod_rewrite.c>
    

    Which MEANS when it goes to http://store.halfelf.org, and doesn’t get the proper reply, it redirects to https which is the bad page that cPanel always does.

    Oh yes.

    Deleted those lines, re-ran AutoSSL, and it works.

    Picard, Riker, and Worf facepalm.

    Okay, smarty, what’s the real fix? Because as much as I want to leave this in place, I’ll have to remember to turn it off every time I add a new domain or subdomain to the system, and while that’s rare, it’s the rare cases that cause the most problems (thank you Herbert Hecht).

    I looked back at the error and recognized the pattern being repeated: .BIN_AUTOSSL_CHECK_PL__. I saw it all over the place. I also knew that the folder AutoSSL puts down for LE is .well-known/acme-challenge (it’s in your web root). And I also knew this extra thing… I knew .htaccess

    My new rule:

    # Force non WWW and SSL for everyone.
    <IfModule mod_rewrite.c>
    	RewriteEngine On
    
    	RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
    	RewriteCond %{REQUEST_URI} !^/\d+\.BIN_AUTOSSL_CHECK_PL__\.\w+\.tmp$
    	RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/
    	RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
    
    	RewriteCond %{HTTPS} off
    	RewriteCond %{REQUEST_URI} !^/\d+\.BIN_AUTOSSL_CHECK_PL__\.\w+\.tmp$
    	RewriteCond %{REQUEST_URI} !^/\.well-known/acme-challenge/
    	RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
    <IfModule mod_rewrite.c>
    

    Ironically, once I sorted all that out and understood I needed to whitelist things for AutoSSL and LE, I was able to Google and find an answer. cPanel knows about the issue and has a case open to fix it for everyone.

    Still, I’m leaving that code in place for the one account that tends to add subdomains often enough that I would need this, and not-often enough that I’d remember.

  • Backtrack to Clean Code

    Backtrack to Clean Code

    I was watching The Bletchley Circle, about four women who were part of the code breakers in World War II, and how they stumbled upon a serial killer because only they could see the patterns. In the third episode of the first season, the main character is trying to explain why understanding the killer, Crowley, from before he started killing, and she says the following:

    At Bletchley, when we came across corrupted data, we had to backtrack till we hit clean code. That’s how you find an error in the pattern. All Crowley’s giving us is corrupted data.We need to backtrack to before he was killing. We need to start from there. That’s how we’ll find him.

    I’d never thought of it in those words, but that’s exactly right.

    When we debug code, when we find errors, we always backtrack to clean code. Most of us aren’t trying to find psychopaths and serial killers, of course. What we’re trying to do is find the patterns and understand what went wrong. And many times, we’re trying to find patterns when the telling of the breaking doesn’t lend itself to any patterns.

    Think about how you describe a situation, how you explain what’s broken. You start with your part. “I was trying to do X.” Then you explain what you expected to happen. “Normally that makes the color blue.” Next you say what did happen. “Instead, it made the color red.”

    That’s all well and good, except there’s a great deal missing. Some of it will be pertinent and some won’t. Some will be overkill and useless signal to noise, and some minutiae will be just what is needed to solve a problem. The difficulty is that you may not know what happened that is important. If all you know is ‘I upgraded WordPress’ for example, then you may not be aware of all the changes that went into the WP_Http API. You may not know about the new Multisite functions.

    If you’re not a developer, reading the field guide for WordPress 4.6 RC1, and all the linked posts, and did a compare of 4.5.3 to 4.6-RC1, then maybe you’d be surprised when your plugin breaks. And while you thought well of yourself for testing on the release candidate, you’re stunned at how much changed, and not sure what on earth happened.

    So you backtrack. You know that the magic sauce is in the requests sent to the server. And you know you’re using wp_remote_request() to do it. So you look at anything related to that. What does it call? Did that change? You step back and back until you find as much as you can, and when you’ve determined it’s ‘something,’ you reach out for help.

    In WordPress, this is why we tell people to switch to default themes or disable plugins. We’re asking people to backtrack to code we know is clean. We can’t read minds and know the little things. So we ask people to backtrack in the most obvious ways. “Does it happen with all the other plugins off?”

    Backtracking to clean code.

  • Debugging Unexpected Output

    Debugging Unexpected Output

    It’s going to happen one day. You’re going to get that weird error when you activate your plugin to test it and you will have no idea what it means.

    What error?

    The plugin generated 306 characters of unexpected output during activation...

    Unexpected output.

    The problem is WordPress doesn’t tell you what did it, where, or why, not even if you have debug turned on. You do have debug turned on, right? Well I was stumped on this one. I checked all my PHP calls to make sure I didn’t have whitespace, I looked for any files that were accidentally saved UTF-8, and I checked and double checked my diffs between the good and bad versions.

    What I ended up doing was writing a function that saved the error and output it.

    /* For debugging only */
    if (defined('WP_DEBUG') && true === WP_DEBUG) { 
        function myplugin_activated_plugin_error() {
            update_option( 'myplugin_error',  ob_get_contents() );
        }
        function myplugin_deactivated_plugin_error() {
            delete_option( 'myplugin_error' );
        }
        add_action( 'activated_plugin', 'myplugin_activated_plugin_error' );
        add_action( 'deactivated_plugin', 'myplugin_deactivated_plugin_error' );
        
    	function myplugin_message_plugin_error() {
    	    ?>
    	    <div class="notice notice-error">
    	        <p><?php echo get_option( 'myplugin_error' ); ?></p>
    	    </div>
    	    <?php
    		}
    	if( get_option( 'myplugin_error' ) ) {
    		add_action( 'admin_notices', 'myplugin_message_plugin_error' );	
    	}
    }
    

    That results in a pretty kind of output:

    A pretty version of my errors!

    I like to use the pretty display so I can easily read what, exactly I messed up and how.

  • Mailbag: Why Do Cannonical URLs Mess Me Up?

    Mailbag: Why Do Cannonical URLs Mess Me Up?

    This came from a DreamHoster who said it was okay to blog it if I didn’t mention them by name. They were embarrassed about the mistake.

    Why was my site having a redirect loop, and why did changing the domain to force www in panel fix it?

    The ‘panel’ being mentioned here is DreamHost’s Panel. We don’t have cPanel, we have PANEL, and it’s our special, 100%, tool. In there, you can edit your domain settings. One of the settings has to do with your domain settings. Simply it asks “Do you want the www in your URL?”

    DreamHost Panel: Do you want www or not?

    In general I check “no” because I’m a no-www kind of person. I don’t like it.

    The issue this person was faceing was that you’d go to http://www.wordpress.dev and everything was fine, but his site went all over the place weird when you went to

    In order to fix that person’s website, I changed “Leave it alone” to “Add WWW” (their preference, not mine).

    Why did that work? Let’s understand the issue first. What was happening was that your server generally doesn’t care if you visit http://www.wordpress.dev or http://wordpress.dev because it knows that they both mean /home/user/wordpress/public_html/ (or whatever). But! Some servers (like DreamHost) let you prioritize and redirect. If they don’t, you can use nginx/htaccess to force www or not. As far as the server cares, none of this matters. If you force no-www and someone goes to www, they get redirected and everyone’s happy.

    Except WordPress. WordPress is picky. WordPress has special settings:

    Your home and site URLs in WP

    Recognize that? If you have WordPress set to use www and you go to the non-www URL, it won’t redirect you. It will, however, force all internal links and generated paths to have the www. The same thing happens if you don’t force https for your site. WordPress will still serve http://wordpress.dev but all the links on that page would be to https://wordpress.dev and so will your stylesheets and javascript.

    This may make you think that WordPress doesn’t care. You’d be wrong. WordPress cares deeply, and it shows up when you go a URL like http://wordpress.dev/i-am-fake/ – assuming you don’t have a page named that. If you’ve forced www, WordPress will look for that page and do a 404 redirect. Suddenly you’ll have www!

    Except in some cases, what happens is WordPress accepts the URL you gave it and shows without the www. Then it remembers it should have the www and redirects to that. Only it knows you told it non-www and redirects to that. And you get an endless 301 redirect loop. And you cry.

    If you force www in DreamHost’s Panel and you force it in htaccess and you set it your settings, this still may not be enough! You may have to do a search and replace to change all the non-www urls to www!

    Amusingly, when WordPress 4.4.1 dropped, we found a rare race condition with http and https and WooCommerce, born from a simple mistake.

    Woo lets you force http on post-checkout pages

    Everyone I know who looked at that went “Oh, right, if my site has https in the home and site URLs, maybe I shouldn’t try to force http here!” But Woo lets you shoot yourself in the foot. Or at least they did. There’s a ticket open where I suggested perhaps they prevent that.

    And for what it’s worth, I totally get how these things happen. You set up your domain and your WordPress site and your http/https settings at different times. It’s understandable that in 2013 you didn’t have SSL, but you added it in 2014. And at that time, you only put SSL on your checkout pages (right?) and since you wanted your cache to work better, you said “no https for checkout’ed!” That’s perfectly sane and logical. But since LetsEncrypt and HTTPS Everywhere became big in 2015, you changed the whole site over and simply forgot about that one, teensy, toggle…

    At least, until someone got that horrible 301 redirect on checkout.

    Like I told the people I fixed, don’t kick yourself over this. You have a lot of moving parts and the secret is understanding how each bit works. If you know that you set Panel properly, and .htaccess properly, and site/home URLs properly, and it only happens on checkout, you can zero in on what’s left and debug that.