If you use Git and it’s on the same server as your site, and owned by the same user, it’s remarkably easy to do this.
First make sure the public folder in your Hugo repository is being tracked. Yes, this can make your repository a little large but that’s not something to worry about too much. Space is cheap, or it should be. Next make a folder in tmp – I called mine library – to store the Git output in.
What this does is checkout the Git repository and then copy it over. The format of the sync will delete anything not found. Done.
The benefit of this method is that you don’t need to install GoLang or Hugo on your server, and everything is pure and simple Git and copy. Rsync is a delightful way to copy everything over as well. You can delete the temp folder when you’re done, but the checkout process handles things for you. Another nice trick is you can specify what branch to checkout, so if you have a special one for publishing, just use that.
But could this be even easier? Yes and no. You see, what I’m going is checking out the whole thing and then copying over folders. What if I could tell Git to just checkout the code in that one folder?
There’s a think called a ‘sparse checkout’ where in I can tell Git “Only checkout this folder.” Then all I have to do is go into that folder and checkout the content I wanted. The problem there is it literally checked out the folder ‘public’ and what I wanted was the content of the public folder. Which means while it’s ‘easier’ in that I’ve only checked out the code I need, I can’t just checkout it out into where I want. I will always have to have a little extra move.
And then my script remains the same. But! This is going to be a faster checkout since it’s only ever going to be exporting and seeing the folders it needs.
One of my headaches with Jeykll was the deployment. It wasn’t until I found Octopress that I had any success. Hugo wants you to use Wrecker for automated deployment. I don’t want to since I don’t host on GitHub.
Deploying (v1)
The first method I could use is a script like I have with Octopress, which runs an rsync, but instead I went with a git post-update hook:
And that copies things over nice and fast. I could do this for Jekyll as well, since now I’m version controlling my site content (in public) as well as my source code. There are downsides to this. I don’t want to sync my site folder. It’s going to get very large and messy and annoying.
The workaround for Jekyll people is to install on the server and that didn’t work for me.
Install on the Server
But Wait! There’s More!
I actually like Hugo better than Jekyll,. It feels slicker and faster and more … app like. Also since it’s not Ruby, I was able to install it properly on my server. Yes, unlike Ruby which was a crime, GoLang was incredibly easy to install on CentOS 6.
First install Go and it’s dependancies:
$ yum install golang
$ yum install hg
Now install Hugo. No you can’t yum it up:
$ export GOPATH=/usr/local/go
$ go get -v github.com/spf13/hugo
This puts it in /usr/local/go/bin/hugo and I can run Hugo commands natively.
Works perfectly. Unlike Ruby which is just a brat.
Now the Magic
When I’m ready to write a new post, I can locally spin up my hugo site with hugo server, write my post, make sure it looks nice, and then commit my changes to git and push.
cd ~/Development/my-repos/hugo-site/
hugo new content/some-file.md
vi content/some-file.md
git add content/some-file.md
git commit -m "revised some-file"
git push deploy master
And it all works.
Massive hattip to Andrew Codispiti for detailing his work which made mine easier.
This is a bit of a lie. I do have Ruby, it’s just on my laptop.
I don’t have it on my server, though. I have my git repo there, though. I could, but there are a couple reasons I don’t, and because I don’t, I can’t just run a jekyll build on my server. I’m not the only one with this particular issue. A lot of people on shared hosts can’t do it, for example. And people on cloud based tools can’t really either.
Option one, which is very common, is what I have been doing. I added _site (the Jekyll output folder) to Git and I copy that over on a post commit hook. For what it was, that worked just fine. It only ran when I did a git commit, and if I wanted to work on a version I could totally do that in a branch, edit it, and bring it back in without accidentally deploying things.
But option two would be rsync and that appealed to me more.
I found the gem I was looking for eventually. It’s called (simply) Jekyll deploy and you add it to your Gemfile with this:
group :jekyll_plugins do
gem 'jekyll_deploy'
end
Then run the bundle command:
$ bundle
Now you have a new command called deploy which runs first a build and then it deploys based on the configuration options you put in. In my case it’s an rsync deploy, but you can do Git too. There was just one problem with it. The build every time made it such that my site would rebuild every time, which meant the rysnc would always be 100% new and that was more traffic than I really wanted.
group :jekyll_plugins do
gem 'jekyll_deploy', :git => 'https://github.com/ipstenu/jekyll_deploy.git', :branch => 'develop'
end
Now my deploy only runs a deploy.
A better solution would have been to put in some options and create jekyll deploy --build to allow me to run a build first, but I actually kind of like having them separate.
The only question left was if I should keep _site under version control. I decided that I should, since the git repository would keep the file dates under control, assuring me that only the files I changed would be pushed with a deploy.
I will note that the only reason it’s so simple for me is that I have passwordless SSH set up, where I don’t have to put in my passwords when I connect from a trusted computer. And since I only have this installed on a trusted server, and if I didn’t, I’d have to have a password to get access to the git repo anyway, I felt it was safe.
Any similarities to website configurations, current or defunct, is probably somewhat intentional. But I’m not naming names here in this tale of woe.
The situation:
You have your website: domain.com
You have a staging version where you test all your code: stage-domain.com
Your coders edit the theme code, install plugins, configure them, etc. At the same time, your editors edit content, make posts, etc. The code (plugins etc) is managed by Git because you know you should be using versioning. The posts are not because… WordPress.
The Problem:
In order to post content, you have to push code and content up to the production server. This means that if you’re trying to upgrade WordPress, you have to put a hold out on content until the upgrade is done. How do you keep the two in sync, without goobering your content?
An Answer:
Decouple content and code.
First let’s version control the php files. This means your themes and plugins will all be edited on staging, and when you’re done, you check the code in and then check it out on production.
Next we attack the DB. Have a job that, when you check out files in production, it copies up only following tables:
_options
_users
_usermeta
This means that all your code (plugin settings, theme settings, etc) gets pushed live, and all your content remains isolated. If you have extra tables, you may need to make allowances for them (like, say _podpress or _badbehavior), but you can find them quickly looking at the current DB. You’ll want to add these as necessary, or not. I’d count _podpress as ‘content’ and leave it out (we’ll get to what to do with it in a second), and _badbehavior as transient data.
On production, you have your content makers be Editors. They edit the content live (because you trust them). If you want to be extra secure, lock down the server via IP rules. At the end of every day, run a reverse sync, where the staging DB’s posts and content are replaced by the live site’s, thus ensuring everyone has the ‘live’ data. Obviously you’d want to script in a serialization safe search/replace after every sync (and have an auto-backup taken before any messing about starts).
It’s in the copy back that we decide what to do with the other files. Either make an exclude list or an include list, whichever makes you feel safer.
And then anything else you want like _podpress (see? I told you I’d get back to this).
Excludes would be anything I have in my push script from before, so _options etc. To this I’d also add _badbehavior or anything transient. I don’t really need it.
Pitfalls:
Content isn’t just posts and pages. What if your ‘editors’ need to edit widgets?
What if they don’t? StudioPress’s Genesis Framework has a totally awesome widget called “Featured Page” which lets you pick a page, check options, and off you go. What if you had a page called ‘Header Widget’ and in your widget area for Header, put that widget, pointed it to that page, and checked ‘Show Page Content’? Now your editors just need to edit that page!
What if they still do? Grab a plugin like Members and make a special role for “Super Editors” to let them make widget changes. Of course, now you have to uncouple _options and you’ve lost some of the magic here. This is a bad idea. You want to decouple content and code. Using Widgets for this is a cool idea, but what if you did it another way entirely. What if instead you used custom post types, and just had it show the most recent in the ‘spot’ for that header? Say you wanted an ad to show at the top of the page. Make your CPT for ‘header ads’ and write a post. Then schedule another for Wednesday at 3pm. Boom, your ad gets replaced! This also makes it easy to go a step further, make a Role for Marketing or Advertising, and now only they (and admins) can mess with ads. Downside there is you may end up with a lot of CPTs.
What about when WordPress updates? Well, that’s something to be careful over. If you update staging to (say) 3.5.1, but production is on 3.5, the tables will get updated. Thankfully, WordPress rarely messes with posts and comment tables. The ones that normally get poked with an update is _options, and we’re syncing that anyway. Still, you should set aside some time to test in staging before you go ahead and push this. Since we’re only posting content on production, this will have no impact on your editors.
Alternatives:
Put the content tables on another database. While WP doesn’t make it as easy as it does for custom user and meta tables, certainly you should be able to fiddle with that. Then just sync the rest of the tables. Code like hyberDB already lets you split the DB for load balancing, so you could fork that.
Or … Sync the whole database. But that defeats the purpose.
I’ve been banging my head on this for a while. It really did take me a year and reading lots of things to begin to understand that I was totally wrong. As with many things, I have to sit down and use them for a while to understand what I’m doing wrong, and what I need to learn. I finally had my git breakthrough. It’s very possible (no, likely) that I got some of this wrong, but I feel like I now understand more about git and how it should be used, and that made me more confident in what I’m doing with it.
Speaking as a non-developer (hey, sometimes I am!), I just want a command line upgrade for my stuff. This code also lacks a WordPress-esque click to upgrade, so I have to do a three step tango to download, unpack, copy, delete, in order to upgrade.(By the way, more software should have one-click upgrades like that, it would make life easier for everyone. I do know that the backend support is non-trivial, so I would love to see a third-party act as a deployment hub, much like GitHub is a repository hub.) The more steps I have, the more apt I am to make an error. So in the interests of reducing my errors and my overhead, I wanted to find a faster and safer way to deploy.(My previous job was all about deployment. We had a lot of complicated scripts to take our code, compile it, compress it, move it to a staging site, and then email that it was ready. From there, we had more scripts to ‘move to test’ and ‘move to prod’ which made sense.)
Since I already thing that automating and simplifying deployment is good, and all I want to do is get one version, the ‘good’ version, of code and be able to easily update it. One or two lines is best. Simple, reliable, and easy to use. That’s what I want.
Recently, Ryan Hellyer pointed out git archive, which he claims is faster than clone. I’d believe it if I could get it to work. When I tried using HTTPS, I got this: fatal: Operation not supported by protocol.. So I tried using ssh and got Could not resolve hostname… instead. Basically I had all these problems. Turns out github turned off ‘git archive -remote’ so I’m dead in the water there for any code hosted there, which is most of my code.
I kicked various permutations of this around for a couple afternoons before finally throwing my hands up, yet again, and looking into something else, including Capistrano, which Mark Jaquith uses in WP Stack. It’s something I’m personally interested in for work related reasons. Capistrano is a Ruby app, and vulnerability fears aside, it’s not very user friendly. At my old job, we used ant a lot to deploy, though there don’t seem to be ant tasks yet for Git. The problem with both of those is that they require you to pull down the whole hunk ‘o code and I’m trying to avoid that in this use case. Keep it simple, stupid. Adding more layers of code and complication onto a project that doesn’t need it is bad.
Finally I went back to git and re-read how the whole distributed deployment works. I know how to clone a repository, which essentially gets me ‘trunk.’ And I know that a pull does a fetch followed by a merge, in case I’d done any edits, and it saves my edits. Hence merge, and why I dig it for dev. At length it occurred to me that what I wanted was to check out the git repo without downloading the code at first. Well I know how to do that:
That brings down just a .git folder, which is small. And from there, I know how to get a list of tags:
$ git tag -l
v0.3.0
[...]
v0.8.0
v0.9.0
And now I can check out version 8!
$ git checkout v0.8.0
Note: checking out 'v0.8.0'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b new_branch_name
HEAD is now at 8acc57d... set version to 0.8.0
Well damn it, that was simple. But I don’t want to be in a detached head state, as that means it’s a little weird to update. I mean, I could do it with a switch back to master, a pull, and a checkout again, but then I thought about local branches. Even though I’m never making changes to core code (ever), let’s be smart. Linus Torvalds flipping Nvidia the birdOne codebase I use has the master branch as their current version, which is cool. Then there’s a 1.4.5 branch where they’re working on everything new, so when a new version comes out, I can git pull and be done.(In this moment, I kind of started to get how you should be using git. In SVN, trunk is where you develop and you check into tags (for WordPress at least) to push finished versions. In git, you make your own branch, develop there, and merge back into master when you’re ready to release. Commence head desking.)
One conundrum was that there are tags and branches, and people use them as they see fit. While some of the code I use defines branches so I can check out a branch, others just use tags, which are treeish. Thankfully, you can make your own branch off a tag, which is what I did.
I tried it again with a non-Github slice of code: Mediawiki.(Mediawiki, compared to wp-cli, is huge, and took a while to run on my laptop. It was a lot faster on my server. 419,236 objects vs 10464. I’m just saying someone needs to rethink the whole ‘Clone is faster!’ argument, since it’s slow now, or slow later, when downloading large files. Large files is large.) Now we have a new issue. MediaWiki’s .git folder is 228.96 MiB… Interestingly, my MediaWiki install is about 155MiB in and of itself, and diskspace is cheap. If it’s not, you’ve got the wrong host. Still, it’s a drawback and I’m not really fond of it. Running repack makes it a little smaller. Running garbage collection made it way smaller, but it’s not recommended. This, however, is recommended:
git repack -a -d --depth=1 --window=1
It doesn’t make it super small, but hey, it worked.
Speaking of worked, since the whole process worked twice, I decided to move one of my installs (after making a backup!) over to this new workflow. This was a little odd, but for Mediawiki it went like this:
git clone --no-hardlinks --no-checkout https://gerrit.wikimedia.org/r/p/mediawiki/core.git wiki2
mv wiki2/.git wiki/
rmdir wiki2
cd wiki
git reset --hard HEAD
Now we’re cloning the repo, moving our files, resetting where HEAD is, and I’m ready to set up my install to use the latest tag, and this time I’m going to make a branch (mysite-1.20.3) based on the tag (1.20.3):
git checkout -b mysite-1.20.3 1.20.3
And this works great.
The drawback to pulling a specific tag is that when I want to update to a new tag (1.20.4 let’s say), I have to update everything and then checkout the new tag in order to pull down the files. Now, unlike svn, I’m not making a full copy of my base code with every branch or tag, it’s all handled by head files, so there’s no harm keeping these older versions. If I want to delete them, it’s a simple git branch -D mysite-1.20.3 call and I’m done. No code changes (save themes and .htaccess), no merging needed. And if there’s a problem, I can switch back really fast to the old version with git checkout mysite-1.20.3. The annoyance is that I just want to stay on the 1.20 branch, don’t I? Update the minors as they come, just like the WP minor-release updater only updates changed files.
Thus, I asked myself if there was a better way and, in the case of MediaWiki, there is! In world of doing_it_right(), MediaWiki has branches and tags(So does WP if you looked at trac.), and they use branches called ‘REL’. If you’re not sure what branches your repo uses, type git remote show origin and it will list everything. There I see REL1_20 and since I’m using version 1.20.3 here, I surmised that I can actually do this instead:
git checkout -b mysite-REL1_20 origin/REL1_20
This checks out my branch and says “This branch follows along with REL1_20.” so when I want to update my branch it’s two commands:
git fetch --all
git pull
The fetch downloads the changesets and the pull applies it. It looks like this in the real world (where I’m using REL1_21 since I wanted to test some functionality on the alpha version):
This doesn’t work on all the repos, as not everyone follows the same code practices. Like one repo I use only uses tags. Still, it’s enough to get me fumbling through to success in a way that doesn’t terrify me, since it’s easy to flip back and forth between versions.
Fine. I’m sold. git’s becoming a badass. The only thing left is to protect myself with .htaccess:
And to figure out how to get unrelated instances of git in subfolders to all update(Mediawiki lets you install extensions via git, but then you don’t have a fast/easy way to update…):
#!/bin/sh
for i in `find ./ -maxdepth 1 -mindepth 1 -type d`; do
cd $i
git pull
cd ../
done
And I call that via ./git.sh which lives in /wiki/extensions and works great.
From here out, if I wanted to script things, it’s pretty trivial, since it’s a series of simple if/else checks, and I’m off to the races. I still wish every app had a WordPress-esque updater (and plugin installer, hello!) but I feel confident now that I can use git to get to where my own updates are faster.
We use cookies to personalize content and ads, to provide social media features, and to analyze our traffic. We also share information about your use of our site with our social media, advertising, and analytics partners.