Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: cron

  • Automate Your Site Checks with Cron (and WPCron)

    Automate Your Site Checks with Cron (and WPCron)

    I have a self-hosted healthchecks.io instance (mentioned), and I use it to make sure all the needful cron jobs for my site actually run. I have it installed via Docker, so it’s not super complex to update and that’s how I like it.

    The first cron jobs I monitored were the ones I have setup in my crontab on the server:

    1. Run WP ‘due now’
    2. Set daily random ‘of the day’
    3. Download an iCal file
    4. Run a nightly data validity check

    I used to have these using WP Cron, but it’s a little too erratic for my needs. This is important, remember this for later, it’ll come back up.

    Once I added in those jobs, I got to thinking about the myriad WP Cron jobs that WordPress sets up on its own.

    In fact, I have a lot of them:

    +------------------------------------------------+---------------------+-----------------------+---------------+
    | hook                                           | next_run_gmt        | next_run_relative     | recurrence    |
    +------------------------------------------------+---------------------+-----------------------+---------------+
    | rediscache_discard_metrics                     | 2025-04-25 17:51:15 | now                   | 1 hour        |
    | wp_privacy_delete_old_export_files             | 2025-04-25 18:16:33 | 20 minutes 38 seconds | 1 hour        |
    | wp_update_user_counts                          | 2025-04-25 20:30:03 | 2 hours 34 minutes    | 12 hours      |
    | recovery_mode_clean_expired_keys               | 2025-04-25 22:00:01 | 4 hours 4 minutes     | 1 day         |
    | wp_update_themes                               | 2025-04-26 04:57:57 | 11 hours 2 minutes    | 12 hours      |
    | wp_update_plugins                              | 2025-04-26 04:57:57 | 11 hours 2 minutes    | 12 hours      |
    | wp_version_check                               | 2025-04-26 04:57:57 | 11 hours 2 minutes    | 12 hours      |
    [...]
    +------------------------------------------------+---------------------+-----------------------+---------------+
    

    While I could manually add them all to my tracker, the question comes up with how to add the ping to the end of the command?

    The Code

    I’m not going to break down the code here, it’s far too long and a lot of it is dependant on my specific setup.

    In essence, what you need to do is:

    1. Hook into schedule_event
    2. If the event isn’t recurring, just run it
    3. If it is recurring, see if there’s already a ping check for that event
    4. If there’s no check, add it
    5. Now add the ping to the end of the actual cron even
    6. Run the event

    I actually built out code like that using Laravel recently, for a work related project, so I had the structure already in my head and I was familiar with it. The problem though is WP Cron is nothing like ‘real’ cron.

    Note: If you really want to see the code, the beta code can be found in the LWTV GitHub repository. It has an issue with getting the recurrence, which is why I made this post.

    When CRON isn’t CRON

    From WikiPedia:

    The actions of cron are driven by a crontab (cron table) file, a configuration file that specifies shell commands to run periodically on a given schedule. The crontab files are stored where the lists of jobs and other instructions to the cron daemon are kept. 

    Which means crontab runs on the server time. When the server hits the time, it runs the job. Adding in jobs with the ping URL is quick:

    */10 * * * * /usr/bin/wp cron event run --due-now --path=/home/username/html/ && curl -fsS -m 10 --retry 5 -o /dev/null https://health.ipstenu.com/ping/APIKEY/due-now-every-10

    This job relies on the server being up and available, so it’s a decent metric. It always runs every ten minutes.

    But WP Cron? The ‘next run’ time (GMT) is weirdly more precise, but less reliable. 2025-04-25 17:51:15 doesn’t mean it’ll run at 5:51pm GMT and 15 seconds. It means that the next time after that timestamp, it will attempt to run the command.

    Since I have a scheduled ‘due now’ caller every ten minutes, if no one visits the site at 5:52pm (rounding up), then it won’t run until 6pm. That’s generally fine, but HealthChecks.io doesn’t really understand that. More to the point, I’m guestimating when

    HealthChecks.io has three ways to check time: Simple, Cron, and onCalendar. In general, I use Cron because while it’s cryptic, I understand it. That said, there’s no decent library to convert seconds (which is what WP uses to store the interval timing) which means you end up with a mess of if checks.

    A Mess of Checks

    First, pick a decent ‘default’ (I picked every hour).

    1. If the interval in seconds is not a multiple of 60, use the default.
    2. If the interval is less than 60 seconds, run every minute.
    3. Divide seconds by 60 to get minutes.
    4. If the interval in minutes is not a multiple of 60, use the default.
    5. If the interval is less than an hour (1 to 59 minutes), run every x minutes.
    6. Divide minutes by 60 to get hours.
    7. If the interval in hours is not an even number of days (divide hours by 24), use the default
    8. If the interval is less than a day (1 to 23 hours), run every X hours.
    9. Divide hours by 24 to get days.
    10. If the days interval is not a multiple of 7 , use the default.
    11. If the interval is less than a week (1 to 6 days), run every X days.
    12. Divide days by 7 to get weeks.
    13. If the interval is a week, run every week on ‘today’ at 00:00

    You see where this is going.

    And then there’s the worse part. After you’ve done all this, you have to tweak it.

    Tweaking Timing

    Why do I have to tweak it? Well for example, let’s look at the check for expired transients:

    if ( ! wp_next_scheduled( 'delete_expired_transients' ) && ! wp_installing() ) {
    	wp_schedule_event( time(), 'daily', 'delete_expired_transients' );
    }
    

    This runs every day. Okay, but I don’t know exactly when it’ll run, just that I expect it to run daily. Using my logic above, the cron time would be 0 0 * * * which means … every day at midnight server time.

    But, like I said, I don’t actually know if it’ll run at midnight. In fact, it probably won’t! So I have to setup a grace period. Since I don’t know when in 24 hours something will run, I set it to 2.5 times the interval. If the interval runs every day, then I consider it a fail if it doesn’t run every two days and change.

    I really hate that, but it’s the best workaround I have at the moment.

    Should You Do This?

    Honestly?

    No.

    It’s positively ridiculous to have done in the first place, and I consider it more of a Proof of Concept than anything else. With the way WP handles cron and scheduling, too, it’s just a total pain in the backside to make this work without triggering alerts all the time!

    But at the same time, it does give you a lot more insight into what your site is doing, and when it’s not doing what it should be doing! In fact, this is how I found out that my Redis cache had held on to cron jobs from plugins long since removed!

    There are benefits, but most of the time this is nothing anyone needs.

  • Zap a Daily Tweet

    Zap a Daily Tweet

    Last week I told you how I made a random post of a day. Well, now I want to Tweet that post once a day.

    Now there are a lot (a lot) of possibilities to handle something like that in WordPress, and a lot of plugins that purport to Tweet old posts. The problem with all of them was that they used WordPress.

    There's nothing wrong with WordPress

    Obviously. But at the same time, asking WP to do 'things' that aren't it's business, like Tweeting random posts, is not a great idea. WordPress is the right tool for some jobs, but not all jobs, after all.

    What is WordPress' job is generating a random post and setting a tracker (transient) to store for a day. And it's also WordPress' job to output that data how I want in a JSON format.

    The rest, we turn to a service. Zapier.

    A Service?

    Like many WordPressers, I like to roll my own whenever humanly possible. In this case, I could have added an OAuth library and scripted a cron job, but that puts a maintenance burden on me and could slow my site down. Since I have the JSON call, all I need is 'something' to do the following:

    1. Every day, at a specific time, do things
    2. Visit a specific URL and parse the JSON data
    3. Craft a Tweet based on the data in 2

    I dithered and kvetched for days (Monday and Tuesday) before complaining to Otto on Tuesday night. He pointed out he'd written those scripts. On Wednesday, he and I bandied about ideas, and he said I should use IFTTT. Even using IFTTT's Maker code, though, the real tool needed is one that lets me code logically.

    Zapier

    The concept of IFTTT is just "If This, Then That." If one thing is true, then do another. It's very simple logic. Too simple. Because what I needed was "If this, then do that, and tell another that." There wasn't an easy way I could find to do it with IFTTT so I went to the more complicated.

    Example of what the flow looks like - Trigger is every day, action is GET, final action is tweet

    Three steps. Looks like my little three item'd list, doesn't it?

    The first step is obvious. Set a specific time to run the zap. It's a schedule. The second step is just a web hook saying 'Get the data from URL.' And the third step is aware!

    Showing the example of the tweet, with placeholders for the name and URL

    Pretty nice. If you click on the 'add field' box in the message content (upper right), it knows how to grab the variables from the previous steps and insert them. Which is damn cool.

  • Cron Caching

    Cron Caching

    WordPress' relationship with cron is touchy. It has it's own version, wp-cron which isn't so much cron as a check when people visit your site of things your site needs to do. The problem is that if no one visits your site… nothing runs. That's why you sometimes have posts that miss schedules.

    One possible solution is to use what we call 'alternate cron' to trigger your jobs. That works pretty well as it means I can tell a server "Every 10 minutes, ping my front page and trigger events."

    But in this case, I didn't want that. I receive enough traffic on this site that I felt comfortable trusting in WP cron, so what I wanted was every hour for a specific page to be visited. This would prompt the server to generate the cached content if needed (if not, it just loads a page).

    WordPress Plugin

    I'm a huge proponent of doing things the WordPress way for WordPress. This method comes with a caveat of "Not all caching plugins will work with this."

    I'm using Varnish, and for me this will work, so I went with the bare simple code:

    class LWTV_Cron {
    
    	public $urls;
    	
    	/**
    	 * Constructor
    	 */
    	public function __construct() {
    
    		// URLs we need to prime the pump on a little more often than normal
    		$this->urls = array(
    			'/statistics/',
    			'/statistics/characters/',
    			'/statistics/shows/',
    			'/statistics/death/',
    			'/statistics/trends/',
    			'/characters/',
    			'/shows/',
    			'/show/the-l-word/',
    			'/',
    		);
    
    		add_action( 'lwtv_cache_event', array( $this, 'varnish_cache' ) );
    
    		if ( !wp_next_scheduled ( 'lwtv_cache_event' ) ) {
    			wp_schedule_event( time(), 'hourly', 'lwtv_cache_event' );
    		}
    	}
    
    	public function varnish_cache() {
    		foreach ( $this->urls as $url ) {
    			wp_remote_get( home_url( $url ) );
    		}
    	}
    
    }
    
    new LWTV_Cron();

    Yes it's that site. This very simple example shows that I have a list of URLs (slugs really) I know need to be pinged every hour to make sure the cache is cached. They're the slowest pages on the site (death can take 30 seconds to load) so making sure the cache is caught is important.

  • MySQL – my.cnf

    MySQL – my.cnf

    This is a fairly rare file, and one I never would have found had I not needed to run a standard SQL process via cron.

    Names have been changed to protect the innocent.

    As the story goes, no matter what I did, I could not get this one app to stop spewing out ‘smart’ quotes. You know the fancy apostrophes and quotes that curl? Well, that’s not normally a problem, like in WordPress I’d just filter it out, but in this locked down system, I didn’t have that option. I called the vendor, and they said “Make sure you don’t paste in smart quotes.”

    mysqlThat was all fine and dandy for me but I’m not the master of the universe like that. Well, not all the time. I had people to input data for me! They were going to have to manually take the forms (Word Docs), filled in by non-techs, and copy the data into the right places in the app. And you want me to tell them they have to fix this for the non-techs? I thought about how much time that would take, and decided the best fix was to change the forms! Right?

    If you’ve ever worked for a major company, you know why this was about as effective as aspirin for a root canal. No deal. So I decided to get inventive.

    The only time this was a problem, these ugly quotes, was when we ran our weekly reports. This was how I found out about it, a manager complained that there was garbage instead of quotes on the form titles. Ergo: All I need to do is script something to clean them out!

    Enter SQL!

    # REPLACE SMART QUOTES WITH STUPID ONES
    # FIRST, REPLACE UTF-8 characters.
    UPDATE `secretapp_table` SET `formtitle` = REPLACE(`formtitle`, 0xE2809C, '"');
    UPDATE `secretapp_table` SET `formtitle` = REPLACE(`formtitle`, 0xE2809D, '"');
    # NEXT, REPLACE their Windows-1252 equivalents.
    UPDATE `secretapp_table` SET `formtitle` = REPLACE(`formtitle`, CHAR(147), '"');
    UPDATE `secretapp_table` SET `formtitle` = REPLACE(`formtitle`, CHAR(148), '"');
    

    In my testing, if I ran that on formtitle, it cleaned it up for the report. This was a default report in the app, by the way, not something I had any control to change. And you wonder why I love open source? Anyhow, once I knew how this would work, I sent about scripting it. I couldn’t hook into any triggers on the app, though, because they don’t like to make it easy.

    Fine, I decided. A crontab time it is! I made this simple script to run at midnight, every night, and clean up the DB:

    #! /bin/bash
    
    mysql -h "dbname-secretapp" "secretapp_db" < "quotecleaner.sql"
    

    It worked when I ran it by hand, but it failed when cron’d. This took me some headbanging, but after reading up on how SQL works, I realized it worked when I ran it as me because I’m me! But cron is not me. I have permissions to run whatever I want in my database. Cron does not. Nor should it! So how do I script it? I don’t want the passwords sitting in that file, which would be accessible by anyone with the CMS to update it.

    I went around the corner to my buddy who was a DB expert, and after explaining my situation (and him agreeing that the cron/sql mashup was the best), he asked a simple question. “Who has access to log in as you?” The answer? Just me and the admins. The updating tool for our scripts was all stuff we ran on our PCs that pushed out to the servers, so no one but an admin (me) ever logged in directly.

    He grinned and wrote down this on a sticky “.my.cnf”

    Google and a Drupal site told me that it was a file that was used to give the mysql command line tools extra information. You shove it in the home directory of the account, and, well, here’s ours:

    # Secret App user and password
    user=secretapp_user
    password=secretapp_password
    

    The only reason I even remembered all this was because an ex-coworker said he ran into the documentation I left explaining all of this, and was thankful. He had to have it scan the body of the form now, because the managers wanted that in the report too!

  • wp-cron it up

    wp-cron it up

    CronIf you’ve spent much time mastering *nix servers, you’ve run into cron. I love cron. Cron is a special command that lets you schedule jobs. I use it all the time for various commands I want to run regularly, like the hourly check for new posts in RSS on TinyRSS or RSS2Email, or the nightly backups, or any other thing I want to happen repeatedly. Cron is perfect for this, it runs in the background and it’s always running.

    My first thought, when I heard about wp-cron, was that it clearly tapped into cron! If I scheduled a post, it must write a one-time cron job for that. I was wrong. Among other reasons, not everyone has cron on their servers (Windows, I’m looking at you), but more to the point, cron is a bit overkill here. Instead WordPress has a file called wp-cron.php which it calls under special circumstances. Every time you load a page, WordPress checks if there’s anything for WP-Cron to run and, if so, calls wp-cron.php.

    Some hosts get a little snippy about that, claiming it overuses resources and ask that you disable cron. One might argue that checking on every pageload is overkill, but realistically, with all the other checks WordPress puts in for comments and changes, that seems a bit off. In the old WordPress days, this actually was a little true. If you have a server getting a lot of traffic, it was possible to accidentally loop multiple calls at the same time. This was fixed quite a bit in WordPress 3.3 with better locking (so things only run once). Multisite was a hassle for a lot of people, but seems to have been sorted out.

    StopwatchWe already know that WordPress only calls wp-cron if it needs to, so if you disable it, you have to run that manually (which ironically you could do via a cron job (Read How to Disable WordPress WP-Cron for directions on how to disable and then use real cron.)). To disable wp-cron, just toss this in your wp-config.php:

    define('DISABLE_WP_CRON', true);

    Now, remember, you still want to run a ‘cron’ job to fire off anything scheduled, so you have to add in your own cron-job for this, which I actually do! I have cron turned off, and a ‘real’ cron job runs at 5 and 35 past the hour to call wp-cron with a simple command:

    curl http://example.com/wp-cron.php
    

    Now since I have multiple domains on this install, I actually have that for every site on my Multisite. After all, wp-cron.php on Ipstenu.org doesn’t run for photos.ipstenu.org and so on. Doing this sped up my site, and put the strain back where I felt it should be. The server.

    By the way, wp-cron does more than schedule posts. That’s what checks for plugin, theme and core updates, and you can write code to hook into it to schedule things like backups. Remember before how I said that WP checks to see if there’s anything scheduled? This is a good thing, since if you have your server run a long job (like backing up a 100meg site), you don’t want to wait for that to be done before you page loaded, right?

    For most people, on most sites, this is all fine to leave alone and let it run as is. For the rest, if you chose to use cron instead, keep in mind how things are scheduled. Like I set my cron to run and 5 and 35, so I tend to schedule posts for 30 and the hour. That means my posts will go up no later that 5 minutes after. That’s good, but if I use my DreamObjects plugin, the ‘Backup ASAP’ button schedules a backup to run in 60 seconds (so as not to stop you from doing anything else, eh?), which means I’d have to manually go to example.com/wp-cron.php in order to kick it off. Clearly there are issues doing this for a client.