Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: coding

  • Quick Notes on Ruby and Jekyll

    Quick Notes on Ruby and Jekyll

    I feel like I should be writing about Once Upon A Time at this point…

    Let’s take a moment to talk about our stack here.

    • Ruby is a dynamic, reflective, object-oriented, general-purpose programming language.
    • Ruby libraries are bundled into gems.
    • Jekyll is a gem that can publish static websites.
    • Bundler lets you list all your dependencies required for the project you’re working on.
    • A Gemfile is a file in which we can list gems for the aforementioned dependencies.

    Still with me?

    This matters because you can use a Gemfile to define your standard libraries for a Jekyll site. The general idea is that you install Bundler:

    $ gem install bundler

    Then you make a Gemfile in your Jekyll folder:

    source 'https://rubygems.org'
    
    gem 'jekyll', '>= 3.0.0.pre.beta9'
    gem 'jekyll-oembed', :require => 'jekyll_oembed'
    gem 'jekyll-last-modified-at', :require => 'jekyll-last-modified-at'
    

    What this does is it defines what version of Jekyll I want to use and some of the gems I want to use. For example, if I wanted to add Jekyll Compose to all the users of my Git repository, I would add this:

    group :jekyll_plugins do
      gem 'jekyll_compose'
    end
    

    Now all they have to do is run bundle after their git pull, and they get the new requirement.

  • Jekyll Table of Contents

    Jekyll Table of Contents

    One of the cool things about MediaWiki, about the only thing I missed, was the built in table of contents display. With Jekyll, since this was a static sort of site, you would have to tell it ‘When you build this page, include a table of contents.’

    And I found a brilliant Jekyll plugin, Jekyll ToC Generator, that added in a beautiful jquery based table of contents with a click back to top feature. But there was a problem. When I installed it and ran a build to test it, my Jekyll site took nine times as long to build.

    In order to generate a Jekyll site, you run the command jekyll build and, on Jekyll 3.0, that gives you the output “done in 8.493 seconds” or so on. Now, if I do a full build on a site I’ve cleaned (there’s a clean command you can run to scrub the site and ensure you get a clean rebuild), it generally takes about a minute. That’s to build a thousand files with a lot of weird trickery. If I’m just rebuilding the changed files, it takes about 10 seconds. Much more reasonable.

    With the ToC Plugin, it took 98 to 100 seconds, every single time. Right away, I knew why. I had included a plugin that had to check every single page on the site on that rebuild, see if it needed a table of contents, and then build the page. Of course that took a long time!

    I’m always talking about needs and wants when I work on websites. It’s a basic principle my high school drilled into me. Understand what you need and how it’s different from your wants. Don’t compromise on needs. Well, I knew that I didn’t need a table of contents, not on every page, so it clearly had to be a ‘mostly want.’

    By contrast, having my site build quickly was a need. A fast build ensured less overhead, less weight, and less time spent. Time is a massive factor in websites. The rendered site has to be fast for users, this much is obvious to everyone. But having your build be fast means you, the site maintainer, spends less time on the parts that don’t make the site better, freeing you to develop and write.

    Also speed is something Jekyll wants to work on. The build different between the 2.3 version of Jeykll and the 3.0-pre beta, is incredible. In 2013, a site with “362 articles with 660 words in average” took around 10 minutes for a full build. I have double the articles, about the same amount of words, and it’s a minute for a full build. It’s faster on my faster laptop (duh).

    The decision tree for Jekyll is more obvious on build than the same one is for WordPress (or MediaWiki) on render. The basic concept is simple. The more complex your site, the longer it takes to generate pages. For WordPress (or MediaWiki, or Drupal, or Joomla, etc etc), the render happens when someone visits your site. For Jekyll and other static site generators, the render occurs when you build the site. That means with Jekyll I can see right away, before I get close to deployment, which means I make the decisions well before the ‘stage’ step of my deployment process.

    What’s more important? The complexities that make your site personal or a fast build?

    Here’s an example from a Jekyll discussion on the matter. Someone had the following code in the templates, which made the date output rather pretty:

    <span>{% assign d = page.date | date: "%d" | plus:'0' %}
        {{ page.date | date: "%B" }} 
        {% case d %}
            {% when 1 or 21 or 31 %}{{ d }}st
            {% when 2 or 22 %}{{ d }}nd
            {% when 3 or 23 %}{{ d }}rd
            {% else %}{{ d }}th
        {% endcase %}, 
        {{ page.date | date: "%Y" }}
    </span>
    

    But that had to run on page builds, which naturally was going to make a site slower. One of the ways Jekyll has improve this was to introduce incremental updates. Only update the pages that need updating. That is a big “baboom!” moment and it let me run the plugin jekyll-last-modified-at (which spits out a last modified date on pages) without any performance hit except on the clean build. Since that only gets called when a page is built, and a page is only built when it changes, it’s a massive improvement for me.

    What does all this have to do with the table of contents?

    Once I pushed it into a ‘want’ and not a ‘need’ I opened my mind to other possibilities. I stopped looking for a Jekyll or Ruby or Liquid based table of contents, and I asked myself “Can Markdown make a table of contents?”

    Markdown is a ‘language’ like HTML that is actually faster to write in than raw HTML but can be read, rendered, and output as HTML. I’m a big fan of HTML and part of why I picked WordPress back in the beginning was that I stumbled on a post where Matt Mullenweg talked about how he didn’t like bbCode and didn’t want it in WordPress. HTML was something we knew. Why make people learn something new?

    It wasn’t until I started blogging more on my iPad and phone that Markdown made sense. Now I’m quite the fan. But I knew that I didn’t know a whole lot about Markdown. I did know that Jekyll used a flavor called ‘kramdown’ (all lowercase) so I read up on that and found that kramdown has a built in built in table of contents generator that was incredibly easy to implement.

    * TOC will be output here
    {:toc}
    

    It’s not something I want (or need) on every page, so I just put that on a few pages. No real overhead added and it’s easy enough to style with CSS. Suddenly I have my cake and I can eat it too.

  • Bad Habits, Bad Dates

    Bad Habits, Bad Dates

    First of all, the migration of MediaWiki to Jekyll went fine. I binge watched “Person of Interest” and converted things with the clever use of grep and regex. Once I got to the point where I was converting templated files from Wiki to Jekyll, it got a lot easier. The hardest part was date conversion, and it started with some bad filenames.

    MediaWiki let me use whatever I wanted in (almost) whatever way I wanted, which is a problem. Also a problem is MediaWiki’s flat-level structure. Everything was the same level for the URLs, so you had http://example.com/wiki/NAME and, for the most part, that worked out okay. The problem I ran into was how I chose to name files.

    You see, I used the logical names “Interview Source (dd M yyyy)” for the interviews. That converted to the URL of http://example.com/wiki/Interview_Source_(dd_M_yyyy) which is nice and descriptive, if long. And it worked great right up until my subject had seven interviews on one day, two with the same source.

    Take this example. If you have an interview with the CBS morning news and the CBS evening news, on the same CBS local station, do you name the files “CBS Morning News (28 October 2015)” or “CBS News (28 October 2015)”? Obviously you have to go by the unique name (or the more unique one) to avoid name collisions. And for a time that worked out just fine. Except. I also had news articles. So if the CBS Morning News put out a news article on the same date as the interview, I was screwed. I ended up with multiple stupid filenames like “CBS Morning News (28 October 2015 b)” and so on. It was annoying.

    This could have been ‘avoided’ or at least mitigated more if I’d had used the subpage hierarchy for articles, making things http://example.com/wiki/Interview/Interview_Source_(dd_M_yyyy) and http://example.com/wiki/News/Interview_Source_(dd_M_yyyy) instead. And certainly I could have moved everything.

    But for whatever reason, subpages aren’t really super popular with MediaWiki. At least not the self-managed ones I’ve seen. They take a level of awareness that not everyone has. You can’t ‘see’ the subpages easily, not like categories with WordPress, or collections with Jekyll. And that means people just don’t use them. How do you train everyone to know how to do everything?

    Conversely, this naming issue isn’t a problem with WordPress because there has always been a clear delineation between URL and page name. This is made more-so when you use plugins like Yoast SEO, which allows you to remove ‘stopwords’ like ‘a’ and ‘the’ from your URL strings. This looks ‘wrong’ on MediaWiki, sadly, which is used to making pretty URLs that are descriptive.

    In the move to Jekyll, I renamed everything. First I made folders for each year and then I moved all files with that year in the name into the right folder. Since that muddled a few ‘extra’ files in there, I checked each file for the content {{InterviewTemplate or {{NewsTemplate and sorted them into /interviews/year/ or /news/year/ as appropriate. That was easy.

    To rename the files, I used my favorite tool Name Mangler to convert the filenames from Interview_Source_(dd_M_yyyy) to interview-source – nice and short. The ‘gotcha’ with that was, of course, multiple posts from the same source in a given year. And that was a problem because of that stupid naming convention. I would have to sort out some kind of script to rename things in bulk to convert the names into something I could then re-rename in order.

    And then I remembered something…

    'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'.

    Not that. I remembered that the post-slug didn’t matter. It could represent the date of the post, but also possibly the order in which the posts were created. Which meant they didn’t matter in the slightest and I could batch rename.

    Furthermore, my date convention lead to a massive annoyance inside the content. Jekyll wanted my name convention to be yyyy-mm-dd and there was no really easy way to take yyyy-M-dd and convert it. There is no regex that does that. In the end, I converted dd M yyyy into yyyy-M-dd (which regex can do nicely) and then a search on all files for date: /d{2}-January-/d{4} to replace with date: /1-01-/2 and repeated for every year.

    Annoying, but it worked.

  • Jekyll Layouts vs Wiki Templates

    Jekyll Layouts vs Wiki Templates

    One of the things I was doing in Mediawiki was using a lot of templates. A lot. The way a template works in Mediawiki, you have a special page called Template:NAME and you can embed it with {{NAME}} in any post. You can even embed a template in a template. They’re basically static ‘blurbs.’ You can make them dynamic, but I have found that even after ten years of using Mediawiki, it’s still a bit of a mystery.

    With Jekyll, that gets thrown out the window.

    Let’s take, for example, my list of interviews. I have 14 or so years of interviews, broken up into a separate page by year and internally sorted by date. Manually. I also have a template {{Interviews}} which outputs a pretty formatted link to each year. Also made manually. For every new interview, I edited at least two pages (the interview itself and the year). And for every year I had to update the main interviews page and the template.

    My end goal was to do the following:

    1. Each year index would dynamically list the posts for that year
    2. The interview main page would list links to all the available years
    3. The interview ‘template’ would be output on every page
    4. The interview year page would list everything from that year
    5. All those things would dynamically update when I added a new item

    Oh and I also wanted a layout to be intelligent enough to show a special header with specific information on the individual interview pages.

    Love Collections

    To convert this, I first made use of collections, making one for _interviews and within that I have a folder for each year with the interview as a flat file and an index.md to make the main index. I don’t have to do this. I could have the index anywhere I wanted, but this was easier for me.

    There is a big gotcha here, though. Subfolders and collections and sorting by date doesn’t work the way you’d think it would. I could make it easily sort by title, and I could reverse it, but sorting by date proved to be a killer. Eventually I figured this out:

    1. All the pages have to have a date, even if you’re not going to sort that page (see my index)
    2. You can’t sort in a for loop

    The final code looks like this:

    {% assign posts = site.interviews | sort: 'date' %}
    <ul>
    	{% for post in posts %}
    		{% if post.topic != 'index' and post.tags contains page.year %}
    			<li><a href="{{ site.baseurl }}{{ post.url }}">{{ post.title }} ({{ post.date | date: "%d %B" }})</a></li>
    		{% endif %}
    	{% endfor %}
    </ul>
    

    Front Matters

    This is funnier if you know that the ‘header’ of a Jekyll file is called the Font matter. Here’s an example of mine:

    ---
    title: Interviews
    author: Mika E.
    layout: interview
    permalink: /interviews/
    date: 2001-01-01
    topic: index
    tags:
      - 2001
    ---
    

    Everything except topic: index is a default variable. I made the topic, and what that does is tell me “This page is an index page” and what year things are. There are reasons for this down the line. Now I also want to sort by year, but I can parse the date for that.

    Design the Layout

    I designated my layout as ‘interview’ in the first example, so I made a file called interview.html in layouts and made it a child of my default layout. In there, I have this code:

    <ul>
    {% for post in site.interviews %}
    	{% if post.topic == 'index' %}
    		<li><a href="{{ site.baseurl }}{{ post.url }}">{{ post.date | date: "%Y" }}</a></li>
    	{% endif %}
    {% endfor %}
    </ul>
    

    That says “if a page is an index, list it.” Now when I want a new year, I just add in a new folder with an index file.

    I’ve gone even further, taking the logic from some WordPress themes I’ve see, and the layout file has all the code for both the index view and the per-item view, allowing me to format my interviews with custom headers and footers around the content.

    Does it Work?

    Yes it does! Mostly.

    The problem with this, and yes there’s a problem, is that the interview layout page doesn’t regenerate itself. I have to go and re-save the layout for interviews in order to regenerate any lists I have on that page.

    I can get away with typing this in shell: touch _content/_jekyll/layouts/interview.html && jekyll build but it is a little annoying. Even running a manual jekyll build won’t do it because the layout doesn’t realize it has a change yet. I do understand why, though. It may be worth moving that somewhere else, though I have a feeling even if I make it a template it would have the same problem, since that template file wouldn’t know to update until it was edited.

    It took me a while to find the magic sauce is a bit of code called regenerate: true – This is not something you should use everywhere! I use it on my interviews index pages because those pages get updated when a new item is added to their folder. It actually lets my index pages be totally blank except the yaml headers which is nice and simple.

  • Jekyll Collections

    Jekyll Collections

    Early on, Jekyll’s developers said that if someone was using posts for non-blog content, they were doing something wrong. That left one other avenue open, the first time I looked at Jekyll, which was pages. They’re nice, but they’re not what I wanted.

    Enter Jekyll collections. These are ‘arbitrary’ groups of related content which you put in their own folder. I had 15 years of interviews collected, so for me this seemed like a perfect idea. I read up on Ben Balter’s – Explain Jekyll Collections like I’m 5 and it helped me sort out what I wanted.

    Configure

    This is easy. You just add the collection code to _config.yaml

    # Collections
    collections:
      interviews:
        output: true
    

    Having the output set to true means that when I run jekyll build the pages are generated. That’s pretty simple. They don’t get auto-generated when you run a jekyll serve and you’re testing locally, however. Which sucks. I upgraded to Jekyll 3.0 beta and it started working, though, and I’m okay with running a beta.

    Create A Folder

    Also easy. Make a folder called _interviews in the main Jekyll folder. I will note, this gave me a fit. I wish I could put all my collections in a subfolder, because now I have this:

    _data
    _includes
    _interviews
    _layouts
    _pages
    _posts
    _sass
    _site
    

    It’s messy, and if I didn’t know that some of those folders are special (like _includes) I could easily be confused. The _site folder makes some sense, that’s where my site is output. But even if I use the source setting to move all my source pages into a folder (called _source in my case), I still can’t separate the code from the content. What I would like is this:

    _assets – Store all of my ‘code’ like layouts, plugins, css, etc here.
    _content – Store all my post content, collections, pages, etc here.

    Still this is a little better for me. Less insane. I will note, I was able to move my folders by defining the directories in my configuration file like this:

    # Moving Folders
    source:       _content
    plugins_dir:  _jekyll/plugins
    layouts_dir:  _jekyll/layouts
    includes_dir: _jekyll/includes
    

    So now my main folder has two folders _site and _content which is a lot easier for me to work with. I feel less muddled. Inside the content folder is a _jekyll folder which is my ‘wp-content’ folder, and a _data folder, which has some data files. More on that later.

    NB: This only works on Jekyll 3.0 and up!

    Create Files

    All I had to do was make my files in my _interviews folder and I was done. Well. Not really. I needed a way for Jekyll to link through everything, and I really didn’t think making manual pages was smart. I tossed in this code to my interviews post file and it cleverly looped through everything it found, generating the page on the fly:

    <ul>
    {% for topic in site.interviews %}
    	<li><a href="{{ site.baseurl }}{{ topic.url }}">{{ topic.title }}</a></li>
    {% endfor %}
    </ul>
    

    If you’re familiar with WordPress loops, this is the same thing as saying “For all posts in a category…”

    Customize the Hell Out of It

    Of course you know that’s what I did next. I went and made it super-complex by putting my interviews in year subfolders and then making the main interview page a list of all the years, with links to those pages, and loops back and … well. That’s another post.

  • Changing Git History

    Changing Git History

    Working on a group project in Git, I did the smart thing with my code. I made a branch and proceeded to edit my files. I also did a dumb thing. I made four commits.

    The first was for the first, ugly, functional version of the code. The second was a less ugly, kind of broken version. The third was the rewrite and the fourth was the working version. When I wanted to submit my changes for a review, it was going to be ugly. I did not need or want people looking at four commits They only wanted the one.

    Now I’m a weird person for how I do commits. I add a new feature like a new function to parse things, and I commit that. Then I change my CSS and commit that. And so on and so on. This means I can look through my commit history and see exactly when I made a change. When I’m ready to do my release, I document all the changes based on that commit log and have it as my message.

    But when you’re working with a team, and all they want is one clean commit? Well I’m their worst nightmare. There is a cure for this, though! You can squash your commits, merging them all into one.

    Squash

    Actually it’s rebase. It can be squash too, though. I ran the following command which says to rebase my last 4 commits:

    git rebase -i HEAD~4
    

    That opens up another editor

    pick b17617p Crap I need to do this thing!
    pick 122hdla Added feature HUMAN to autogenerate a humans.txt file
    pick nw9v88a Changed comment avatar size to 96px
    pick 8jsdy1m Updated CSS for comment avatars to make them a circle
    
    # Rebase b17617p..8jsdy1m onto b17617p
    #
    # Commands:
    #  p, pick = use commit
    #  r, reword = use commit, but edit the commit message
    #  e, edit = use commit, but stop for amending
    #  s, squash = use commit, but meld into previous commit
    #  f, fixup = like "squash", but discard this commit's log message
    #  x, exec = run command (the rest of the line) using shell
    #
    # If you remove a line here THAT COMMIT WILL BE LOST.
    # However, if you remove everything, the rebase will be aborted.
    #
    

    Now here’s where it’s weird. The first one, b17617p is the one I have to merge everything into. And it has the worst commit message, doesn’t it? Oh and I was totally not using the right formatting for how the company wants me to format my commits. They want the comment to be “Feature: Change” so I would have “Humans: Added new feature to autogenerate humans.txt”

    Since I knew I wanted to merge it all and totally rewrite the commit, I just did this:

    pick b17617p Crap I need to do this thing!
    squash 122hdla Added feature HUMAN to autogenerate a humans.txt file
    squash nw9v88a Changed comment avatar size to 96px
    squash 8jsdy1m Updated CSS for comment avatars to make them a circle
    

    Which, once saved and exited, gave me this:

    # This is a combination of 4 commits.
    # The first commit's message is:
    
    Crap I need to do this thing!
    
    # This is the 2nd commit message:
    
    Added feature HUMAN to autogenerate a humans.txt file
    
    # This is the 3rd commit message:
    
    Changed comment avatar size to 96px
    
    # This is the 4th commit message:
    
    Updated CSS for comment avatars to make them a circle
    
    # Please enter the commit message for your changes. Lines starting
    # with '#' will be ignored, and an empty message aborts the commit.
    # Explicit paths specified without -i nor -o; assuming --only paths...
    # Not currently on any branch.
    # Changes to be committed:
    #   (use "git reset HEAD <file>..." to unstage)
    #
    #	new file:   LICENSE
    #	modified:   README.textile
    #	modified:   Rakefile
    #	modified:   bin/jekyll
    #
    

    Since everything with a # is ignored, I deleted it and made it this:

    Humans: New Feature -- Humans.txt is now autogenerated
    Comments: Changed avatar size to 96px and edited CSS to make it a circle
    

    Yeah, that’s it. Admittedly, these should be two separate changes, but they’re all a part of the same project in this case so it’s okay.

    Of course, at the end of this, I looked at my code on our web tool and swore, because I’d left a debug line in. My hero Mike said “Don’t worry! ammend!”

    I made my change, instead of a normal git commit -a -m "These are my changes" I ran a git add FILENAME and git commit --ammend to fix up your most recent commit.

    It lets you combine staged changes with the previous commit instead of committing it as an entirely new snapshot. It can also be used to simply edit the previous commit message without changing its snapshot.

    And yes, it’s pretty awesome. Use it wisely.