With all the kerfluffle about the new Lifehacker (and the 2.0.1 redesign) one thing stood out for me as the more interesting fallout: a rush on articles about the dreaded Hashbang.
Unix heads can skip this paragraph. For the rest of the world, a hashbang (aka a shebang) is the character sequence consisting of the characters number sign and exclamation point #!
. You see this a lot at the beginning of scripts, and are used to define which ‘interpretor’ you’re using. Basically you’re telling Unix which program to use to parse the rest of the script. It’s more complicated than that, but this is not the place for goobley tech speak.
A hashbang used in a web URL is also an interpretor, but of a different sort. The hashbang is a fake tool used to get URLs generated by javascripts and AJAX picked up by Google. Great, what does that mean? The # tells the browser that the rest of the URL is an ‘identifier’ and everything after that # is computer code for a part of the current page. So in the case of Lifehacker, URLs went from http://lifehacker.com/5753509/hello-world-this-is-the-new-lifehacker
to http://lifehacker.com/#!5753509/hello-world-this-is-the-new-lifehacker
.
Most people missed the change and concentrated on the new layout. The #!
in there manages to parse the right page, so why should we worry? After all, AJAX is the ‘future’ of the web. AJAX, by the way, is not the dim-witted but incredibly strong Greek hero. AJAX is shorthand for Asynchronous JavaScript and XML, but has very little to do with XML. Instead it’s used to create interactive web-apps. It’s exceptionally powerful, but a little dim-witted. Oh, okay, maybe my Illiad joke wasn’t so far off, after all!
AJAX has a whole lot of problems, most important (to me) is the fact that it’s not easily read by screen-readers, which means your blind visitors get the shaft a lot from poorly written AJAX sites. You also can’t use the back button most of the time, nor are these sites easily read by web crawler bots, which means that you don’t end up on Google results. Your page isn’t ‘real’, therefore it’s not followed. Of course, the folks at Google recognized this problem and came up with a weird solution: The Hashbang! That’s right, Google introduced it as a way to get sites dependant on AJAX into their results. (A proposal for making AJAX crawlable – Google Webmaster Central, Wednesday, October 07, 2009 at 10:51 AM)
Sounds great, though rather complicated and a little kludgey. Now you can have the best of both worlds. There are still drawbacks, of course, and I hear this refrain over and over:
The main problem is that LifeHacker URLs now don’t map to actual content. (Breaking the Web with hash-bangs – Mike Davies (Isolani.co.uk) Tuesday, February 08, 2011)
This initially perplexed me. After all, WordPress (which runs this site) is really a single page site (index.php) with a million identifiers, and it’s not like there’s an actual, physical, page on my server called do-pretty-urls-still-matter
. What’s the difference? Turns out it’s all in how the URL is generated. The hashbang method is prone to error, as Lifehacker found out already, all you have to do is kill javascript and your entire site goes down. That I knew, but as I sat to write this I got myself in a bind. After all, in order for my pretty URLs here to work, I need .htaccess (which means my httpd.conf stuff is all in a row), PHP and SQL. Of course, all that is server side, and the user only needs the most basic of browsers to get to my site. But how were those failure points less than a javascript induced set?
I appealed to Twitter and got a couple fast responses from @JohnPBloch:
JohnPBloch: The difference is that one (js) depends on millions of different configurations working correctly; the other depends only on one […] for example, I’ve had chrome extensions that crashed JS on non-jQuery pages before.(JohnPBloch) (JohnPBloch)
Thanks to John, my brain kicked back into gear. There is no browser dependency on my URLs. The only way for them to fail is if I screw up the backend (that could be my site setup or my server). Either way, the critical failure will always be me, and never my user. That’s as it should be. We’re so often told that our customers are always right. As a corollary, your customers should never be at fault for a problem (as long as they’re using the products as intended). There are good reasons to use AJAX. Like using AJAX and javascript allows for a responsive and truly interactive experience for you users, in situations where small parts of the interface are changing (think Gmail, Flickr and Google Maps). Gmail actually would be perfect. That’s something where I neither want nor need a ‘pretty’ URL past
http://gmail.com
because I’ll never share the URL.
The rest of the time, though, URLs are important. If I want someone to visit my site, I tell them ‘Go to https://halfelf.org’ and they know what to do. Back in the Trojan War days, you’d hear “Login to AOL, keyword ‘Ipstenu’.” instead of a URL, and we’ve moved away from that to allowing people to claim their own domains and their own presence online. We shouldn’t be reliant on AOL, Google, or anything else to generate our representation. So, regardless of the problems with the hashbang (and there are a lot), your URL is important.
We all know the difference between a ‘pretty’ URL and an ugly one on sight. https://halfelf.org/?p=1358
is ugly and https://halfelf.org/2011/do-pretty-urls-still-matter
is pretty! Simple. Lifehacker (and Twitter for that matter) have always had ‘kind of’ pretty URLs, with a weird number string in there, but enough to make you go “Okay, this Lifehacker post is about monkeys” or “This tweet is by Ipstenu.” With the change to hashbang, both sites (temporarily) broke the cardinal rule of websites: never change your URLs. By now, both sites will redirect a hashbangless site to a hashbanged one, which is as it should be. You never get rid of your old URLs, which is why on older sites, you have a really freakish .htaccess with miles and miles of regexp. AJAX makes you jump through hoops for pretty URLs.
But that really begs the question of whether or not pretty URLs actually matter anymore, or this just me being an old stick in the mud? Much like Google’s minor issues with AJAX/Javascript, they have minor issues with Dynamic URLs. Quickie explanation: static URLs are pretty, dynamic aren’t. Basically, they can and do crawl dynamic URLs, but static are preferred. (Dynamic URLs vs. static URLs – Google Webmaster Tools, Monday, September 22, 2008 at 3:20 PM) On the other hand, we know that shorter is better, and you can’t get much shorter than
https://halfelf.org/?p=1358
when you get down to it.
I would posit that, since the web is based on look and feel, the design of your site still relies, in part, on the ease of someone in understanding the URL.
Comments
14 responses to “Do Pretty URLs Still Matter?”
The thing is, Google’s proposal was never designed to be adopted the way it was. It was a way for sites that already adopted this kind of JS-driven architecture to allow their sites to be crawlable.
The real future isn’t in hashbang URLs, but that doesn’t mean we’re stuck with static and slow pretty URLs. There’s now history management in the latest of modern browsers, in the form of pushState, popState, and replaceState. This allows a site to have real URLs — no hash, no bang — and move from page to page with JS, updating the URL in the address bar as you go.
Dynamic, pretty URLs that leverage XHR/AJAX, but don’t at all require it: This is the future. Gawker, Twitter, and so many others are doing it wrong, and the web is paying dearly for it.
I don’t think that it’s ‘the’ future, but it seems to be a popular one that two heavy hitters are after, which will means a lot of start ups will make these bad choices. Thankfully HTML 5 is coming, and not soon enough for me! As we move away from the kludge, hopefully the future will be true dynamism!
Very impressive post. I wondered how the new Twitter urls worked because I knew through frustrated experience that everything after the # was never sent to the server. It never occurred to me that it was all done client side through Ajax. This was a very thorough and spot on article & I better stop now or I’ll sound too much like an anonymous spammer!
The Twitter URL pissed me off since I was copy/pasting and then wondering why the damn things broke if I wasn’t logged in. They didn’t ever reply to my help ticket about that, but they did fix it… (Twitter’s lack of support makes me beat my head in sometimes).
Further reading:
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch
Hash-Bang is a hack for AJAX apps to let Google query for content without JavaScript, when there is no sensible map of the content. It basically requires a server side (PHP for example) implementation of the JavaScript query handling… Which is exactly backwards from the right way to do it. Google doesn’t recommend using hashbang unless you’ve already painted yourself into a corner. Non-hashed URLs (like Pretty URLs) with Progressively Enhanced AJAX is actually Google’s real recommendation.
Yeah, basically it’s Google’s ‘You’re doing it wrong, but okay…’ which is causing the problem. People are using the kludgy hack as the ‘best’ way to do something and are hurting us all.
Hopefully, peeps like you say “You’re doing it wrong” often enough will convince people that they shouldn’t just be following without understanding. (I mean, even Google’s help docs for implementing the thing says HashBangs are doing it wrong…)
A post by a WordPress dev might go far in discouraging any WordPress dev thinking of adding a HashBang “Feature.” 😉
*points up at Nacin* 🙄 The NacinBot has spoken.
Makes me happy to know it’ll never happen in WP.
“is the fact that it’s not easily read by screen-readers”
Not true, we have WAI-ARIA and Single Page Interface web sites can run with JavaScript disabled.
“You also can’t use the back button most of the time”
Yes you can, with no page reload
“nor are these sites easily read by web crawler bots”
By using “dual links” any web crawler can traverse your site.
Take a look to this demo
http://itsnatsites.appspot.com/
It is not only Twitter, is FaceBook, is Lifehacker/Gizmodo/Gawker, is … Google Search (tried http://www.google.com with a modern browser?)
Yes sometime we can get rid of hashbangs with HTML 5, however current hashbang approach is ok in a myriad of browsers, new and old, including mobile browsers with AJAX (and JS disabled can be an option).
Modern Browser. Heh. I use Safari and Chrome pretty much exclusively. So yes, I think I hit Google with a ‘modern’ browser.
I disagree with your points, because in order for all of those to work, you have to duplicate the effort on your back-end. There’s no need to have everything subjected to a browser-dependent hashbang, when it can be done in other, more sustainable, ways.
You miss the point 🙂 Google Search is also Single Page Interface using hashbangs when you open it with FireFox, Chrome, Safari… not the same with Internet Explorer, for instance in MSIE 6, Google Search is shown as a conventional paged application (I don’t know the reason but I think is has nothing to to with hashbangs, they work, may be more related with bugs in DHTML visual behavior in old MSIE browsers).
Said this I must to say SPI in web sites is a challenging “new” way of web development, but in my opinion is not the future, the future is already here when three of the most visited sites in the world are SPI hangbased, it seems something is “radically” changing in web development…
I detest ‘You miss the point.’ You missed mine, which is that the hashbang makes this more complicated for the dev who writes the code and the reader who visits the site. Yes, it can work, but even Google says it’s a hack. This stuff wasn’t meant to be used the way it’s being used.
IMO, Single Page Interface is fine, provided that even the URL remains user friendly. Using #! in your URLs is a dirty way to handle SPI and not something we should advocate. A pretty URL allows people to know what they’re getting into before they click.
And for the record, I use Chrome, and I never see a Hashbang in Google Search. Ever.
http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=kldjfsdfsd
I excluded Gmail (and I’ll exclude Google Reader) from my examples, because those are specifically instances where SPI DOES NOT need a pretty URL. You’d never paste it in an email.
“makes this more complicated for the dev who writes the code”
YES
“and the reader who visits the site”
NO, the reader just feel the site is more responsive, all is working as ever.
“Using #! in your URLs is a dirty way to handle SPI and not something we should advocate”
There is no impact for end users, URLs can be very similar to normal ones, unless you use hashbang bookmarks with JS disabled, yes the default page is shown because hashbang is ignored, I think it is the only significative glitch.
“And for the record, I use Chrome, and I never see a Hashbang in Google Search. Ever.”
Umm, I see hashbangs in FireFox 3.6 and Chromeand Safari, the first page is not SPI, search for something and the hashbang arises.