Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: deployment

  • A Simplier Hugo Deploy

    A Simplier Hugo Deploy

    I have a Hugo site that I’ve been deploying by running Hugo on the server. But this isn’t the only way about it.

    If you use Git and it’s on the same server as your site, and owned by the same user, it’s remarkably easy to do this.

    First make sure the public folder in your Hugo repository is being tracked. Yes, this can make your repository a little large but that’s not something to worry about too much. Space is cheap, or it should be. Next make a folder in tmp – I called mine library – to store the Git output in.

    The new post-update code then looks like this:

    #!/bin/sh
    
    SRC_DIR=$HOME/tmp/library/public/
    DST_DIR=$HOME/public_html/library/
    
    export GIT_WORK_TREE=$HOME/tmp/library/
    git checkout -f
    
    rsync -a --delete $SRC_DIR $DST_DIR 
    
    exit
    

    What this does is checkout the Git repository and then copy it over. The format of the sync will delete anything not found. Done.

    The benefit of this method is that you don’t need to install GoLang or Hugo on your server, and everything is pure and simple Git and copy. Rsync is a delightful way to copy everything over as well. You can delete the temp folder when you’re done, but the checkout process handles things for you. Another nice trick is you can specify what branch to checkout, so if you have a special one for publishing, just use that.

    But could this be even easier? Yes and no. You see, what I’m going is checking out the whole thing and then copying over folders. What if I could tell Git to just checkout the code in that one folder?

    There’s a think called a ‘sparse checkout’ where in I can tell Git “Only checkout this folder.” Then all I have to do is go into that folder and checkout the content I wanted. The problem there is it literally checked out the folder ‘public’ and what I wanted was the content of the public folder. Which means while it’s ‘easier’ in that I’ve only checked out the code I need, I can’t just checkout it out into where I want. I will always have to have a little extra move.

    To set up my folder, I did this:

    cd ~/tmp/library/
    git init
    git remote add -f origin ~/repositories/library.git
    git config core.sparsecheckout true
    echo public/ >> .git/info/sparse-checkout
    git checkout master
    

    And then my script remains the same. But! This is going to be a faster checkout since it’s only ever going to be exporting and seeing the folders it needs.

  • Deploying with Hugo

    Deploying with Hugo

    One of my headaches with Jeykll was the deployment. It wasn’t until I found Octopress that I had any success. Hugo wants you to use Wrecker for automated deployment. I don’t want to since I don’t host on GitHub.

    Deploying (v1)

    The first method I could use is a script like I have with Octopress, which runs an rsync, but instead I went with a git post-update hook:

    GIT_REPO=$HOME/repositories/my-repo.git
    TMP_GIT_CLONE=$HOME/tmp/my-repo
    PUBLIC_WWW=$HOME/public_html/
    
    git clone $GIT_REPO $TMP_GIT_CLONE
    rm -rf $PUBLIC_WWW/*
    cp -r $TMP_GIT_CLONE/public/* $PUBLIC_WWW
    rm -Rf $TMP_GIT_CLONE
    exit
    

    And that copies things over nice and fast. I could do this for Jekyll as well, since now I’m version controlling my site content (in public) as well as my source code. There are downsides to this. I don’t want to sync my site folder. It’s going to get very large and messy and annoying.

    The workaround for Jekyll people is to install on the server and that didn’t work for me.

    Install on the Server

    But Wait! There’s More!

    I actually like Hugo better than Jekyll,. It feels slicker and faster and more … app like. Also since it’s not Ruby, I was able to install it properly on my server. Yes, unlike Ruby which was a crime, GoLang was incredibly easy to install on CentOS 6.

    First install Go and it’s dependancies:

    $ yum install golang
    $ yum install hg
    

    Now install Hugo. No you can’t yum it up:

    $ export GOPATH=/usr/local/go
    $ go get -v github.com/spf13/hugo
    

    This puts it in /usr/local/go/bin/hugo and I can run Hugo commands natively.

    Which brings me to …

    Deployment (v2)

    Change the aforementioned script to this:

    GIT_REPO=$HOME/repositories/my-repo.git
    TMP_GIT_CLONE=$HOME/tmp/my-repo
    PUBLIC_WWW=$HOME/public_html/
    
    git clone $GIT_REPO $TMP_GIT_CLONE
    /usr/local/go/bin/hugo -s $TMP_GIT_CLONE -d $PUBLIC_WWW 
    rm -Rf $TMP_GIT_CLONE
    exit
    

    Works perfectly. Unlike Ruby which is just a brat.

    Now the Magic

    When I’m ready to write a new post, I can locally spin up my hugo site with hugo server, write my post, make sure it looks nice, and then commit my changes to git and push.

    cd ~/Development/my-repos/hugo-site/
    hugo new content/some-file.md
    vi content/some-file.md
    git add content/some-file.md
    git commit -m "revised some-file"
    git push deploy master
    

    And it all works.

    Massive hattip to Andrew Codispiti for detailing his work which made mine easier.

  • Deploy Jekyll Without Ruby

    Deploy Jekyll Without Ruby

    This is a bit of a lie. I do have Ruby, it’s just on my laptop.

    I don’t have it on my server, though. I have my git repo there, though. I could, but there are a couple reasons I don’t, and because I don’t, I can’t just run a jekyll build on my server. I’m not the only one with this particular issue. A lot of people on shared hosts can’t do it, for example. And people on cloud based tools can’t really either.

    Option one, which is very common, is what I have been doing. I added _site (the Jekyll output folder) to Git and I copy that over on a post commit hook. For what it was, that worked just fine. It only ran when I did a git commit, and if I wanted to work on a version I could totally do that in a branch, edit it, and bring it back in without accidentally deploying things.

    But option two would be rsync and that appealed to me more.

    I found the gem I was looking for eventually. It’s called (simply) Jekyll deploy and you add it to your Gemfile with this:

    group :jekyll_plugins do
      gem 'jekyll_deploy'
    end
    

    Then run the bundle command:

    $ bundle

    Now you have a new command called deploy which runs first a build and then it deploys based on the configuration options you put in. In my case it’s an rsync deploy, but you can do Git too. There was just one problem with it. The build every time made it such that my site would rebuild every time, which meant the rysnc would always be 100% new and that was more traffic than I really wanted.

    So I did what you always do here and I made a fork of Jekyll Deploy and changed my Gemfile to this:

    group :jekyll_plugins do
    	gem 'jekyll_deploy', :git => 'https://github.com/ipstenu/jekyll_deploy.git', :branch => 'develop'
    end
    

    Now my deploy only runs a deploy.

    A better solution would have been to put in some options and create jekyll deploy --build to allow me to run a build first, but I actually kind of like having them separate.

    The only question left was if I should keep _site under version control. I decided that I should, since the git repository would keep the file dates under control, assuring me that only the files I changed would be pushed with a deploy.

    I will note that the only reason it’s so simple for me is that I have passwordless SSH set up, where I don’t have to put in my passwords when I connect from a trusted computer. And since I only have this installed on a trusted server, and if I didn’t, I’d have to have a password to get access to the git repo anyway, I felt it was safe.

  • A Monstrous Regiment of Content

    A Monstrous Regiment of Content

    Any similarities to website configurations, current or defunct, is probably somewhat intentional. But I’m not naming names here in this tale of woe.

    The situation:

    You have your website: domain.com

    You have a staging version where you test all your code: stage-domain.com

    Your coders edit the theme code, install plugins, configure them, etc. At the same time, your editors edit content, make posts, etc. The code (plugins etc) is managed by Git because you know you should be using versioning. The posts are not because… WordPress.

    The Problem:

    In order to post content, you have to push code and content up to the production server. This means that if you’re trying to upgrade WordPress, you have to put a hold out on content until the upgrade is done. How do you keep the two in sync, without goobering your content?

    An Answer:

    Decouple content and code.

    First let’s version control the php files. This means your themes and plugins will all be edited on staging, and when you’re done, you check the code in and then check it out on production.

    Next we attack the DB. Have a job that, when you check out files in production, it copies up only following tables:

    _options
    _users
    _usermeta
    

    This means that all your code (plugin settings, theme settings, etc) gets pushed live, and all your content remains isolated. If you have extra tables, you may need to make allowances for them (like, say _podpress or _badbehavior), but you can find them quickly looking at the current DB. You’ll want to add these as necessary, or not. I’d count _podpress as ‘content’ and leave it out (we’ll get to what to do with it in a second), and _badbehavior as transient data.

    A military regiment marching in a paradeOn production, you have your content makers be Editors. They edit the content live (because you trust them). If you want to be extra secure, lock down the server via IP rules. At the end of every day, run a reverse sync, where the staging DB’s posts and content are replaced by the live site’s, thus ensuring everyone has the ‘live’ data. Obviously you’d want to script in a serialization safe search/replace after every sync (and have an auto-backup taken before any messing about starts).

    It’s in the copy back that we decide what to do with the other files. Either make an exclude list or an include list, whichever makes you feel safer.

    Includes would be:

    _commentmeta
    _comments
    _postmeta
    _posts
    _terms
    _term_relationships
    _term_taxonomy

    And then anything else you want like _podpress (see? I told you I’d get back to this).

    Excludes would be anything I have in my push script from before, so _options etc. To this I’d also add _badbehavior or anything transient. I don’t really need it.

    Pitfalls:

    Featured PageContent isn’t just posts and pages. What if your ‘editors’ need to edit widgets?

    What if they don’t? StudioPress’s Genesis Framework has a totally awesome widget called “Featured Page” which lets you pick a page, check options, and off you go. What if you had a page called ‘Header Widget’ and in your widget area for Header, put that widget, pointed it to that page, and checked ‘Show Page Content’? Now your editors just need to edit that page!

    What if they still do? Grab a plugin like Members and make a special role for “Super Editors” to let them make widget changes. Of course, now you have to uncouple _options and you’ve lost some of the magic here. This is a bad idea. You want to decouple content and code. Using Widgets for this is a cool idea, but what if you did it another way entirely. What if instead you used custom post types, and just had it show the most recent in the ‘spot’ for that header? Say you wanted an ad to show at the top of the page. Make your CPT for ‘header ads’ and write a post. Then schedule another for Wednesday at 3pm. Boom, your ad gets replaced! This also makes it easy to go a step further, make a Role for Marketing or Advertising, and now only they (and admins) can mess with ads. Downside there is you may end up with a lot of CPTs.

    What about when WordPress updates? Well, that’s something to be careful over. If you update staging to (say) 3.5.1, but production is on 3.5, the tables will get updated. Thankfully, WordPress rarely messes with posts and comment tables. The ones that normally get poked with an update is _options, and we’re syncing that anyway. Still, you should set aside some time to test in staging before you go ahead and push this. Since we’re only posting content on production, this will have no impact on your editors.

    Alternatives:

    Put the content tables on another database. While WP doesn’t make it as easy as it does for custom user and meta tables, certainly you should be able to fiddle with that. Then just sync the rest of the tables. Code like hyberDB already lets you split the DB for load balancing, so you could fork that.

    Or … Sync the whole database. But that defeats the purpose.

  • Deploying With Git

    Deploying With Git

    gitI’ve been banging my head on this for a while. It really did take me a year and reading lots of things to begin to understand that I was totally wrong. As with many things, I have to sit down and use them for a while to understand what I’m doing wrong, and what I need to learn. I finally had my git breakthrough. It’s very possible (no, likely) that I got some of this wrong, but I feel like I now understand more about git and how it should be used, and that made me more confident in what I’m doing with it.

    Speaking as a non-developer (hey, sometimes I am!), I just want a command line upgrade for my stuff. This code also lacks a WordPress-esque click to upgrade, so I have to do a three step tango to download, unpack, copy, delete, in order to upgrade.(By the way, more software should have one-click upgrades like that, it would make life easier for everyone. I do know that the backend support is non-trivial, so I would love to see a third-party act as a deployment hub, much like GitHub is a repository hub.) The more steps I have, the more apt I am to make an error. So in the interests of reducing my errors and my overhead, I wanted to find a faster and safer way to deploy.(My previous job was all about deployment. We had a lot of complicated scripts to take our code, compile it, compress it, move it to a staging site, and then email that it was ready. From there, we had more scripts to ‘move to test’ and ‘move to prod’ which made sense.)

    Since I already thing that automating and simplifying deployment is good, and all I want to do is get one version, the ‘good’ version, of code and be able to easily update it. One or two lines is best. Simple, reliable, and easy to use. That’s what I want.

    Recently, Ryan Hellyer pointed out git archive, which he claims is faster than clone. I’d believe it if I could get it to work. When I tried using HTTPS, I got this: fatal: Operation not supported by protocol.. So I tried using ssh and got Could not resolve hostname… instead. Basically I had all these problems. Turns out github turned off ‘git archive -remote’ so I’m dead in the water there for any code hosted there, which is most of my code.

    I kicked various permutations of this around for a couple afternoons before finally throwing my hands up, yet again, and looking into something else, including Capistrano, which Mark Jaquith uses in WP Stack. It’s something I’m personally interested in for work related reasons. Capistrano is a Ruby app, and vulnerability fears aside, it’s not very user friendly. At my old job, we used ant a lot to deploy, though there don’t seem to be ant tasks yet for Git. The problem with both of those is that they require you to pull down the whole hunk ‘o code and I’m trying to avoid that in this use case. Keep it simple, stupid. Adding more layers of code and complication onto a project that doesn’t need it is bad.

    Finally I went back to git and re-read how the whole distributed deployment works. I know how to clone a repository, which essentially gets me ‘trunk.’ And I know that a pull does a fetch followed by a merge, in case I’d done any edits, and it saves my edits. Hence merge, and why I dig it for dev. At length it occurred to me that what I wanted was to check out the git repo without downloading the code at first. Well I know how to do that:

    $ git clone --no-hardlinks --no-checkout https://github.com/wp-cli/wp-cli.git wp-cli
    Cloning into 'wp-cli'...
    remote: Counting objects: 10464, done.
    remote: Compressing objects: 100% (3896/3896), done.
    remote: Total 10464 (delta 6635), reused 10265 (delta 6471)
    Receiving objects: 100% (10464/10464), 1.20 MiB | 1.04 MiB/s, done.
    Resolving deltas: 100% (6635/6635), done.
    

    That brings down just a .git folder, which is small. And from there, I know how to get a list of tags:

    $ git tag -l
    v0.3.0
    [...]
    v0.8.0
    v0.9.0
    

    And now I can check out version 8!

    $ git checkout v0.8.0
    Note: checking out 'v0.8.0'.
    
    You are in 'detached HEAD' state. You can look around, make experimental
    changes and commit them, and you can discard any commits you make in this
    state without impacting any branches by performing another checkout.
    
    If you want to create a new branch to retain commits you create, you may
    do so (now or later) by using -b with the checkout command again. Example:
    
      git checkout -b new_branch_name
    
    HEAD is now at 8acc57d... set version to 0.8.0
    

    Well damn it, that was simple. But I don’t want to be in a detached head state, as that means it’s a little weird to update. I mean, I could do it with a switch back to master, a pull, and a checkout again, but then I thought about local branches. Even though I’m never making changes to core code (ever), let’s be smart.

    Linus Torvalds
    Linus Torvalds flipping Nvidia the bird
    One codebase I use has the master branch as their current version, which is cool. Then there’s a 1.4.5 branch where they’re working on everything new, so when a new version comes out, I can git pull and be done.(In this moment, I kind of started to get how you should be using git. In SVN, trunk is where you develop and you check into tags (for WordPress at least) to push finished versions. In git, you make your own branch, develop there, and merge back into master when you’re ready to release. Commence head desking.)

    One conundrum was that there are tags and branches, and people use them as they see fit. While some of the code I use defines branches so I can check out a branch, others just use tags, which are treeish. Thankfully, you can make your own branch off a tag, which is what I did.

    I tried it again with a non-Github slice of code: Mediawiki.(Mediawiki, compared to wp-cli, is huge, and took a while to run on my laptop. It was a lot faster on my server. 419,236 objects vs 10464. I’m just saying someone needs to rethink the whole ‘Clone is faster!’ argument, since it’s slow now, or slow later, when downloading large files. Large files is large.) Now we have a new issue. MediaWiki’s .git folder is 228.96 MiB… Interestingly, my MediaWiki install is about 155MiB in and of itself, and diskspace is cheap. If it’s not, you’ve got the wrong host. Still, it’s a drawback and I’m not really fond of it. Running repack makes it a little smaller. Running garbage collection made it way smaller, but it’s not recommended. This, however, is recommended:

    git repack -a -d --depth=1 --window=1

    It doesn’t make it super small, but hey, it worked.

    Speaking of worked, since the whole process worked twice, I decided to move one of my installs (after making a backup!) over to this new workflow. This was a little odd, but for Mediawiki it went like this:

    git clone --no-hardlinks --no-checkout https://gerrit.wikimedia.org/r/p/mediawiki/core.git wiki2
    mv wiki2/.git wiki/
    rmdir wiki2
    cd wiki
    git reset --hard HEAD
    

    Now we’re cloning the repo, moving our files, resetting where HEAD is, and I’m ready to set up my install to use the latest tag, and this time I’m going to make a branch (mysite-1.20.3) based on the tag (1.20.3):

    git checkout -b mysite-1.20.3 1.20.3
    

    And this works great.

    The drawback to pulling a specific tag is that when I want to update to a new tag (1.20.4 let’s say), I have to update everything and then checkout the new tag in order to pull down the files. Now, unlike svn, I’m not making a full copy of my base code with every branch or tag, it’s all handled by head files, so there’s no harm keeping these older versions. If I want to delete them, it’s a simple git branch -D mysite-1.20.3 call and I’m done. No code changes (save themes and .htaccess), no merging needed. And if there’s a problem, I can switch back really fast to the old version with git checkout mysite-1.20.3. The annoyance is that I just want to stay on the 1.20 branch, don’t I? Update the minors as they come, just like the WP minor-release updater only updates changed files.

    Thus, I asked myself if there was a better way and, in the case of MediaWiki, there is! In world of doing_it_right(), MediaWiki has branches and tags(So does WP if you looked at trac.), and they use branches called ‘REL’. If you’re not sure what branches your repo uses, type git remote show origin and it will list everything. There I see REL1_20 and since I’m using version 1.20.3 here, I surmised that I can actually do this instead:

    git checkout -b mysite-REL1_20 origin/REL1_20 
    

    This checks out my branch and says “This branch follows along with REL1_20.” so when I want to update my branch it’s two commands:

    git fetch --all
    git pull
    

    The fetch downloads the changesets and the pull applies it. It looks like this in the real world (where I’m using REL1_21 since I wanted to test some functionality on the alpha version):

    $ git fetch --all
    Fetching origin
    remote: Counting objects: 30, done
    remote: Finding sources: 100% (14/14)
    remote: Getting sizes: 100% (17/17)
    remote: Total 14 (delta 10), reused 12 (delta 10)
    Unpacking objects: 100% (14/14), done.
    From https://gerrit.wikimedia.org/r/p/mediawiki/core
       61a26ee..fb1220d  REL1_21    -> origin/REL1_21
       80347b9..431bb0a  master     -> origin/master
    $ git pull
    Updating 61a26ee..fb1220d
    Fast-forward
     includes/actions/HistoryAction.php | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)
    

    This doesn’t work on all the repos, as not everyone follows the same code practices. Like one repo I use only uses tags. Still, it’s enough to get me fumbling through to success in a way that doesn’t terrify me, since it’s easy to flip back and forth between versions.

    Fine. I’m sold. git’s becoming a badass. The only thing left is to protect myself with .htaccess:

     # SVN and GIT protection
     RewriteRule ^(.*/)?(\.svn|\.git)/ - [F,L]
     ErrorDocument 403 "Access Forbidden"
    

    Now no one can look at my svn or git files.

    And to figure out how to get unrelated instances of git in subfolders to all update(Mediawiki lets you install extensions via git, but then you don’t have a fast/easy way to update…):

    #!/bin/sh
     
    for i in `find ./ -maxdepth 1 -mindepth 1 -type d`; do
            cd $i
            git pull
            cd ../
    done
    

    And I call that via ./git.sh which lives in /wiki/extensions and works great.

    From here out, if I wanted to script things, it’s pretty trivial, since it’s a series of simple if/else checks, and I’m off to the races. I still wish every app had a WordPress-esque updater (and plugin installer, hello!) but I feel confident now that I can use git to get to where my own updates are faster.