Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: regex

  • Bad Habits, Bad Dates

    Bad Habits, Bad Dates

    First of all, the migration of MediaWiki to Jekyll went fine. I binge watched “Person of Interest” and converted things with the clever use of grep and regex. Once I got to the point where I was converting templated files from Wiki to Jekyll, it got a lot easier. The hardest part was date conversion, and it started with some bad filenames.

    MediaWiki let me use whatever I wanted in (almost) whatever way I wanted, which is a problem. Also a problem is MediaWiki’s flat-level structure. Everything was the same level for the URLs, so you had http://example.com/wiki/NAME and, for the most part, that worked out okay. The problem I ran into was how I chose to name files.

    You see, I used the logical names “Interview Source (dd M yyyy)” for the interviews. That converted to the URL of http://example.com/wiki/Interview_Source_(dd_M_yyyy) which is nice and descriptive, if long. And it worked great right up until my subject had seven interviews on one day, two with the same source.

    Take this example. If you have an interview with the CBS morning news and the CBS evening news, on the same CBS local station, do you name the files “CBS Morning News (28 October 2015)” or “CBS News (28 October 2015)”? Obviously you have to go by the unique name (or the more unique one) to avoid name collisions. And for a time that worked out just fine. Except. I also had news articles. So if the CBS Morning News put out a news article on the same date as the interview, I was screwed. I ended up with multiple stupid filenames like “CBS Morning News (28 October 2015 b)” and so on. It was annoying.

    This could have been ‘avoided’ or at least mitigated more if I’d had used the subpage hierarchy for articles, making things http://example.com/wiki/Interview/Interview_Source_(dd_M_yyyy) and http://example.com/wiki/News/Interview_Source_(dd_M_yyyy) instead. And certainly I could have moved everything.

    But for whatever reason, subpages aren’t really super popular with MediaWiki. At least not the self-managed ones I’ve seen. They take a level of awareness that not everyone has. You can’t ‘see’ the subpages easily, not like categories with WordPress, or collections with Jekyll. And that means people just don’t use them. How do you train everyone to know how to do everything?

    Conversely, this naming issue isn’t a problem with WordPress because there has always been a clear delineation between URL and page name. This is made more-so when you use plugins like Yoast SEO, which allows you to remove ‘stopwords’ like ‘a’ and ‘the’ from your URL strings. This looks ‘wrong’ on MediaWiki, sadly, which is used to making pretty URLs that are descriptive.

    In the move to Jekyll, I renamed everything. First I made folders for each year and then I moved all files with that year in the name into the right folder. Since that muddled a few ‘extra’ files in there, I checked each file for the content {{InterviewTemplate or {{NewsTemplate and sorted them into /interviews/year/ or /news/year/ as appropriate. That was easy.

    To rename the files, I used my favorite tool Name Mangler to convert the filenames from Interview_Source_(dd_M_yyyy) to interview-source – nice and short. The ‘gotcha’ with that was, of course, multiple posts from the same source in a given year. And that was a problem because of that stupid naming convention. I would have to sort out some kind of script to rename things in bulk to convert the names into something I could then re-rename in order.

    And then I remembered something…

    'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'.

    Not that. I remembered that the post-slug didn’t matter. It could represent the date of the post, but also possibly the order in which the posts were created. Which meant they didn’t matter in the slightest and I could batch rename.

    Furthermore, my date convention lead to a massive annoyance inside the content. Jekyll wanted my name convention to be yyyy-mm-dd and there was no really easy way to take yyyy-M-dd and convert it. There is no regex that does that. In the end, I converted dd M yyyy into yyyy-M-dd (which regex can do nicely) and then a search on all files for date: /d{2}-January-/d{4} to replace with date: /1-01-/2 and repeated for every year.

    Annoying, but it worked.