Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: mediawiki

  • Mailbag: But You Can’t Post Mobile!

    Mailbag: But You Can’t Post Mobile!

    This has been marked as the biggest downside to using Jekyll. Once I started telling people I was moving the Wiki to Jekyll, a great many of them were cautionary about this issue.

    How do you post from your phone or iPad?

    The answer is that I don’t.

    I won’t lie. Mobile posting is a pain. Since I don’t have Jekyll running on my server, I can’t edit a file there and regenerate. If I had that it would all be a lot easy. In my case, since I’m using it as a non-blog it’s never the place I need to post mission critical things. Besides, if you’ve ever tried to keep your pretty formatted WordPress site updated when you want a custom crafted excerpt and a featured image, from your iPad, I gotta tell you … it sucks.

    And that’s WordPress, something that has a dedicated, usable, app for iOS. WordPress is also pretty okay in mobile. People like Ryan Boren spends a great deal of time caring about mobile usage. WordPress has gotten slower on the admin side in the last decade, but it’s gotten more responsive and agile at the same time.

    MediaWiki not so much. Editing should be a lot easier, seeing as there’s no ‘admin’ back end to mess with things, but for whatever reason MediaWiki was always terrible on my iPad. It was next to unusable on my iPhone. Even with their default themes (remember, with MediaWiki you see the front end theme on page edits) it was dodgy.

    Furthermore, what did I need to post? This is a wiki-type documentation site. It is rarely, if ever, updated on the fly. It houses long form news articles. There are recaps of TV episodes, explanations of humanitarian events, and reports of events. There is no live blogging. There is no quick off the cuff journaling. It’s storytelling.

    So here’s my ‘mobile’ workflow.

    1. Write the content in something, probably Byword
    2. Email it to myself
    3. Probably rewrite the whole thing in longer form, with design and fancy things
    4. Post

    I could probably streamline that better if I saved from Byword to Dropbox and had that automatically copy over (suggestions welcome), but I don’t really write from my iPad that much. I usually send myself an email with six or seven links and a note to ‘Import these things…’

  • Bad Habits, Bad Dates

    Bad Habits, Bad Dates

    First of all, the migration of MediaWiki to Jekyll went fine. I binge watched “Person of Interest” and converted things with the clever use of grep and regex. Once I got to the point where I was converting templated files from Wiki to Jekyll, it got a lot easier. The hardest part was date conversion, and it started with some bad filenames.

    MediaWiki let me use whatever I wanted in (almost) whatever way I wanted, which is a problem. Also a problem is MediaWiki’s flat-level structure. Everything was the same level for the URLs, so you had http://example.com/wiki/NAME and, for the most part, that worked out okay. The problem I ran into was how I chose to name files.

    You see, I used the logical names “Interview Source (dd M yyyy)” for the interviews. That converted to the URL of http://example.com/wiki/Interview_Source_(dd_M_yyyy) which is nice and descriptive, if long. And it worked great right up until my subject had seven interviews on one day, two with the same source.

    Take this example. If you have an interview with the CBS morning news and the CBS evening news, on the same CBS local station, do you name the files “CBS Morning News (28 October 2015)” or “CBS News (28 October 2015)”? Obviously you have to go by the unique name (or the more unique one) to avoid name collisions. And for a time that worked out just fine. Except. I also had news articles. So if the CBS Morning News put out a news article on the same date as the interview, I was screwed. I ended up with multiple stupid filenames like “CBS Morning News (28 October 2015 b)” and so on. It was annoying.

    This could have been ‘avoided’ or at least mitigated more if I’d had used the subpage hierarchy for articles, making things http://example.com/wiki/Interview/Interview_Source_(dd_M_yyyy) and http://example.com/wiki/News/Interview_Source_(dd_M_yyyy) instead. And certainly I could have moved everything.

    But for whatever reason, subpages aren’t really super popular with MediaWiki. At least not the self-managed ones I’ve seen. They take a level of awareness that not everyone has. You can’t ‘see’ the subpages easily, not like categories with WordPress, or collections with Jekyll. And that means people just don’t use them. How do you train everyone to know how to do everything?

    Conversely, this naming issue isn’t a problem with WordPress because there has always been a clear delineation between URL and page name. This is made more-so when you use plugins like Yoast SEO, which allows you to remove ‘stopwords’ like ‘a’ and ‘the’ from your URL strings. This looks ‘wrong’ on MediaWiki, sadly, which is used to making pretty URLs that are descriptive.

    In the move to Jekyll, I renamed everything. First I made folders for each year and then I moved all files with that year in the name into the right folder. Since that muddled a few ‘extra’ files in there, I checked each file for the content {{InterviewTemplate or {{NewsTemplate and sorted them into /interviews/year/ or /news/year/ as appropriate. That was easy.

    To rename the files, I used my favorite tool Name Mangler to convert the filenames from Interview_Source_(dd_M_yyyy) to interview-source – nice and short. The ‘gotcha’ with that was, of course, multiple posts from the same source in a given year. And that was a problem because of that stupid naming convention. I would have to sort out some kind of script to rename things in bulk to convert the names into something I could then re-rename in order.

    And then I remembered something…

    'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'.

    Not that. I remembered that the post-slug didn’t matter. It could represent the date of the post, but also possibly the order in which the posts were created. Which meant they didn’t matter in the slightest and I could batch rename.

    Furthermore, my date convention lead to a massive annoyance inside the content. Jekyll wanted my name convention to be yyyy-mm-dd and there was no really easy way to take yyyy-M-dd and convert it. There is no regex that does that. In the end, I converted dd M yyyy into yyyy-M-dd (which regex can do nicely) and then a search on all files for date: /d{2}-January-/d{4} to replace with date: /1-01-/2 and repeated for every year.

    Annoying, but it worked.

  • Jekyll Layouts vs Wiki Templates

    Jekyll Layouts vs Wiki Templates

    One of the things I was doing in Mediawiki was using a lot of templates. A lot. The way a template works in Mediawiki, you have a special page called Template:NAME and you can embed it with {{NAME}} in any post. You can even embed a template in a template. They’re basically static ‘blurbs.’ You can make them dynamic, but I have found that even after ten years of using Mediawiki, it’s still a bit of a mystery.

    With Jekyll, that gets thrown out the window.

    Let’s take, for example, my list of interviews. I have 14 or so years of interviews, broken up into a separate page by year and internally sorted by date. Manually. I also have a template {{Interviews}} which outputs a pretty formatted link to each year. Also made manually. For every new interview, I edited at least two pages (the interview itself and the year). And for every year I had to update the main interviews page and the template.

    My end goal was to do the following:

    1. Each year index would dynamically list the posts for that year
    2. The interview main page would list links to all the available years
    3. The interview ‘template’ would be output on every page
    4. The interview year page would list everything from that year
    5. All those things would dynamically update when I added a new item

    Oh and I also wanted a layout to be intelligent enough to show a special header with specific information on the individual interview pages.

    Love Collections

    To convert this, I first made use of collections, making one for _interviews and within that I have a folder for each year with the interview as a flat file and an index.md to make the main index. I don’t have to do this. I could have the index anywhere I wanted, but this was easier for me.

    There is a big gotcha here, though. Subfolders and collections and sorting by date doesn’t work the way you’d think it would. I could make it easily sort by title, and I could reverse it, but sorting by date proved to be a killer. Eventually I figured this out:

    1. All the pages have to have a date, even if you’re not going to sort that page (see my index)
    2. You can’t sort in a for loop

    The final code looks like this:

    {% assign posts = site.interviews | sort: 'date' %}
    <ul>
    	{% for post in posts %}
    		{% if post.topic != 'index' and post.tags contains page.year %}
    			<li><a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }} ({{ post.date | date: "%d %B" }})</a></li>
    		{% endif %}
    	{% endfor %}
    </ul>
    

    Front Matters

    This is funnier if you know that the ‘header’ of a Jekyll file is called the Font matter. Here’s an example of mine:

    ---
    title: Interviews
    author: Mika E.
    layout: interview
    permalink: /interviews/
    date: 2001-01-01
    topic: index
    tags:
      - 2001
    ---
    

    Everything except topic: index is a default variable. I made the topic, and what that does is tell me “This page is an index page” and what year things are. There are reasons for this down the line. Now I also want to sort by year, but I can parse the date for that.

    Design the Layout

    I designated my layout as ‘interview’ in the first example, so I made a file called interview.html in layouts and made it a child of my default layout. In there, I have this code:

    <ul>
    {% for post in site.interviews %}
    	{% if post.topic == 'index' %}
    		<li><a href="{{ site.baseurl }}{{ post.url }}">{{ post.date | date: "%Y" }}</a></li>
    	{% endif %}
    {% endfor %}
    </ul>
    

    That says “if a page is an index, list it.” Now when I want a new year, I just add in a new folder with an index file.

    I’ve gone even further, taking the logic from some WordPress themes I’ve see, and the layout file has all the code for both the index view and the per-item view, allowing me to format my interviews with custom headers and footers around the content.

    Does it Work?

    Yes it does! Mostly.

    The problem with this, and yes there’s a problem, is that the interview layout page doesn’t regenerate itself. I have to go and re-save the layout for interviews in order to regenerate any lists I have on that page.

    I can get away with typing this in shell: touch _content/_jekyll/layouts/interview.html && jekyll build but it is a little annoying. Even running a manual jekyll build won’t do it because the layout doesn’t realize it has a change yet. I do understand why, though. It may be worth moving that somewhere else, though I have a feeling even if I make it a template it would have the same problem, since that template file wouldn’t know to update until it was edited.

    It took me a while to find the magic sauce is a bit of code called regenerate: true – This is not something you should use everywhere! I use it on my interviews index pages because those pages get updated when a new item is added to their folder. It actually lets my index pages be totally blank except the yaml headers which is nice and simple.

  • Mailbag: Why Jekyll?

    Mailbag: Why Jekyll?

    Why didn’t you convert your site to WordPress? You said you had to import it from Mediawiki to WordPress already.

    I had this conversation with my wife, too.

    WordPress is awesome at being a dynamic website. To be a static ‘wiki’ style website, it sucks. It’s not meant to be static like that. It’s not intended to be static. Even if you turn off comments on your site, you mean for WordPress to generate index pages and categories and the like.

    With WordPress, all that work is done on the server. When you visit a page, it’s generated for the first time. I may have a cache that lets reader number 2 see that page, but always the page, the HTML, is being dynamically built on-demand. MediaWiki works the same way. In contrast, Jekyll is dynamically built on my laptop and deployed as an in-situ static site. Each HTML page is a real HTML page on the server. No extra work has to happen. It’s small, it’s light, and it’s fast, because all that processing was done by me on my laptop before putting it on the server.

    And that actually illustrates the problem with WordPress, and why we struggle with things like Varnish and nginx and caching. We want our sites to do more and be faster. We need flexibility and posting to Twitter and dynamic page generation when we make an edit, because we’re constantly making changes.

    Except I didn’t. I don’t. Not the particular site I was working on, anyway. The site has about 1000 pages (probably closer to 600 once I decided not to import some of the things) and they’re pretty static. At most I updated them once a week for half the year. WordPress would be overkill. Hell, the Wiki was overkill and the only reason I kept using it was technological debt. I didn’t want to add to the debt. I didn’t want to make things even weirder and harder to use. I didn’t want to put a site more at risk with software I didn’t want to upkeep (MediaWiki, not WordPress).

    So it was clearly time to dig myself out with a little sweat equity and decide what I really wanted. I made a list of what I needed, what I wanted, and what I could live without. When I did that, Jekyll started looking more and more like a viable option. I would have spent as much time removing the aspects of WordPress I don’t need as I would have learning a new theme system and language.

    Also in the end I didn’t use the WordPress import. I manually copy/pasted content. The content was what I wanted, and I needed it text only, and MediaWiki made that damn hard to get at. Of course the Jekyll exporter for WordPress was pretty freaking cool. If I was pure WordPress to Jekyll, I’d be fine. I guess there just aren’t a lot of people doing MediaWiki exports.

  • Bye Wiki, Hello Jekyll

    Bye Wiki, Hello Jekyll

    I’m trying to make life less messy by learning an entirely new system.

    I have a Wiki with 1000 or so pages and it’s running MediaWiki. And it’s overkill. I don’t update it often enough to need all the bells and whistles. I need it to be fast, I need it to be simple. I need it to work for one editor (hi). Oh and I need it to be secure.

    Create a Git Repository

    There’s a reason for this. My plan is to commit my changes for Jekyll to a git repo and then have it auto-copy the proper files up to the folder on my webserver. My git repository is private and on the same server owned by the same account, so I can do this. Once I had my bare git repo, I ran this in my local repository folder on my laptop:

    git clone ipstenu@example.com:/home/ipstenu/repositories/jekyll.git site-jekyll
    

    And I got a warning: warning: You appear to have cloned an empty repository.

    Which I knew. But that’s fine. I wanted it empty.

    Install Jekyll On Your Computer

    Full stop. This is where I got confused before.

    $ brew install ruby
    $ gem install jekyll
    

    That’s it. That’s how to get it started.

    Create Your Site

    I was still in that other folder, so I ran an install:

    jekyll new . --force
    

    The reason for the force was that I did have some git files in there and a readme. Then I spent a few hours trying to figure out how to write posts and pages in Jekyll. Posts are ‘easy’ in that you create a file named yyyy-mm-dd-PostName.md and it will generate a post with that name. You can read up on Writing Posts for more.

    But. I’m converting a Wiki and pretty much the whole thing is going to be ‘pages’. To be honest, Jekyll’s idea of pages are ugly. The Writing Pages directions want me to put it all in the same folder and I didn’t like that. I thought I’d rather write a mess of posts in the _posts folder and then let Jekyll generate on the fly.

    To do that was relatively easy. I set up permalinks:

    # Outputting
    permalink: "/:title"
    

    After I did that, I realized I would still have to name things that ugly way, so I added this to my _config.yaml file:

    # Pages
    include: ['_pages']
    

    Then I made a folder called _pages and put my files in there, named CSI_Crime_Scene_Investigation_(season_1).html and so on, with headers like this:

    ---
    layout: default
    title:  "CSI: Crime Scene Investigation (season 1)"
    permalink: "/CSI_Crime_Scene_Investigation_(season_1)/"
    categories: television
    tags: csi
    ---
    

    Yeah. It’s starting to make sense. I could change the permalink to ":/title/" and get the same result, where it would match the filename. But for now, the basic idea is enough.

    Themeing

    It was harder than expected. I had to convert a lot of random PHP includes into Jekyll includes (pity I can’t just say ‘include this file, yes, I know it’s PHP…). Then I wanted to add some features like a table of contents, like I had from MediaWiki, which was a little tricky. But. Once I sorted out the way you do includes and how I could do them, it was all a bit easier.

    Importing MediaWiki

    This proved to be incredibly hard. Like table flipping, teeth gnashing, up at night, wondering why the universe was created this way hard. It was so hard, I exported the wiki to XML (easy), converted that to WordPress xml via Perl (hard because of dependancies), edit all instances of <wp:post_type>wiki</wp:post_type> to be a post, import into a WordPress site (easy), and then …

    Then I spent a long time going through the import, fixing the pages, formatting things, uploading images properly, etc. The wiki I was importing was old. It happens to be the oldest part of the website it’s on, and I was using a lot of templates. In a way that was great. But in another way it was really a terrible idea because it locked me in.

    So a lot of things had to happen. First, I had to rebuild all my templates. The wonderful thing with this is that I was using a lot of templates to list things like episodes and I could convert those to yml (or csv) and then have Jekyll run a loop to display them. Once I realized that, it meant I had a lot more freedom with content.

    I ended up not importing everything. A lot of what was on that Wiki was never looked at by anyone but me, and fifteen plus years of cruft leads to a lot of messy things. Between Jekyll collections and data, I was able to break things out into sanity again. But that’s a whole post on it’s own.

    Pushing To My Server

    I’m using Git, and it’s set to auto-push when I push. But this time I did it a little different. Normally I’d run jekyll on the server, but in this case I don’t have the option so I went with adding my _site folder to the git repo (which meant editing .gitignore) and then writing this:

    #!/bin/bash -l
    GIT_REPO=$HOME/repositories/jekyll.git
    TMP_GIT_CLONE=$HOME/tmp/git/repositories
    PUBLIC_WWW=$HOME/www/jekyll
    
    git clone $GIT_REPO $TMP_GIT_CLONE
    cp -r $TMP_GIT_CLONE/_site/* $PUBLIC_WWW
    rm -Rf $TMP_GIT_CLONE
    exit
    

    This is not what I would consider a great idea. I’d rather run git on the box, but Ruby has been misbehaving there, and this actually lets me use the code on a shared box too.

  • Static Content Subdomain

    Static Content Subdomain

    I use a lot of different tools to run my websites, and over time I’ve learned what I want is to have my static content, the files that are uploaded and are images, stored separately from my apps. So while I have the basic folders on my domain (wordpress, wiki, gallery) I have a special subdomain called static.example.com for all those images and videos.

    There are a few reasons I do this. First, I like having my images separate. Second, it allows me to establish a cookie-free subdomain for images and that shuts up YSlow’s check.

    Create The Subdomain

    Do this however your host allows. Keep in mind that some don’t allow you to traverse domain folders. If your host creates your domain as /home/user/example.com and subdomains as /home/user/static.example.com you may have to fight a little more with things depending on your setup. If possible, I prefer to put the subdomain folder inside the main web root.

    If you’re using cPanel, by default you get your static subdomain installed at /home/user/public_html/static which is how I like it. This is perfectly accessible by all things but it’s also browsable at example.com/static/ and we don’t want that. Applying a little .htaccess magic will solve this.

    # CDN
    <If "%{HTTP_HOST} == 'example.com' ">
            RedirectMatch ^/static/(.*)$ http://static.example.com/$1
    </If>
    

    Now we’re ready to go!

    Move WordPress Uploads

    This used to be really easy. Go to Settings -> Media and change things. But we removed that to stop people from blowing themselves up. Now there are a couple ways about it. I jumped right over to editing the options by going to wp-admin/options.php and look for upload_path and upload_url_path.

    Setting image location options

    I change upload_path to /home/example/public_html/static/wordpress which is where I’ve moved all my images. Then upload_url_path becomes http://static.example.com/wordpress and I’m done except for fixing my old posts. It’s actually pretty neat that once I put those paths in, the Media Settings page lists them as editable.

    Fixing the old posts takes a little trick though, and you’ll have to search/replace your posts via the database:

    UPDATE wp_posts SET post_content = REPLACE(post_content,'http://example.com/wp-content/uploads/','http://static.example.com/wordpress/');
    

    Or in wp-cli:

    wp search-replace http://example.com/wordpress/wp-content/uploads http://example.com/wordpress
    

    The gotcha here is that since I use SSL for my administration, I had to set up a new certificate for the static domain. Not a big deal right now since I can set up a self-signed, or use StartSSL until Let’s Encrypt is off the ground. It is something to consider though.

    Move ZenPhoto Uploads

    I have to start by warning you that Zenphoto doesn’t like this. When you install it, it puts your images in an albums folder, in the Zenphoto gallery install. This isn’t so bad, but you actually can move it around. You have to look in your zenphoto.cfg.php file (found in zp-data). The default location for your albums is defined by this:

    $conf['album_folder'] = '/albums/';
    $conf['album_folder_class'] = 'std';
    

    Since I want it in the static location, I tell it my folder path based on ‘web root’ and that its ‘in_webpath’ (which tells ZenPhoto to look in the root and not relative), by changing that section to this:

    $conf['album_folder'] = '/static/gallery/albums/';
    $conf['album_folder_class'] = 'in_webpath';
    

    But that means my URLs for images become http://example.com/static/gallery/albums... and I wanted http://static.example.com/gallery/albums... instead. Thankfully the .htaccess rule I used at the beginning of all this covers me there. Looking into this, I understand this is the case because unlike MediaWiki or WordPress, ZenPhoto only has one ‘location’ setting. The other two have path and URL.

    MediaWiki

    This was … weird. Technically all you have to do is set up the folders and change the following values in LocalSettings.php:

    $wgUploadPath       = "/static/wiki";
    $wgUploadDirectory  = "/home/example/public_html/static/wiki/";
    

    The thing that’s weird is that the documentation says you can do this:

    $wgUploadPath       = "http://static.example.com/wiki";
    

    And when you do, the image URLs properly call from the domain name. They just won’t load. When you dig deeper, it turns out that it’s caused by the settings for responsive images. The way it puts in srcset doesn’t seem to like this. So for now I’ve disabled it and my setup is this:

    $wgUploadPath       = "http://static.example.com/wiki";
    $wgUploadDirectory  = "/home/example/public_html/static/wiki/";
    $wgResponsiveImages = false;
    

    End Result?

    All my uploaded content is on my ‘static’ subdomain, separate from everything else, which makes version control even easier. Also now if I ever decide to move things off to a CDN, I’m pretty well set up.

    The real reason I do this is that while some of my content is uploaded via the content management systems I use (WordPress, ZenPhoto, etc), the majority is not. ZenPhoto, for example, is faster to FTP up a gig of images than it is to use a PHP tool. Ditto videos. And because of them, it’s nice to have a separate location I can give access to without allowing someone full rights on all my tools.