With all the kerfluffle about the new Lifehacker (and the 2.0.1 redesign) one thing stood out for me as the more interesting fallout: a rush on articles about the dreaded Hashbang.
Unix heads can skip this paragraph. For the rest of the world, a hashbang (aka a shebang) is the character sequence consisting of the characters number sign and exclamation point #!. You see this a lot at the beginning of scripts, and are used to define which ‘interpretor’ you’re using. Basically you’re telling Unix which program to use to parse the rest of the script. It’s more complicated than that, but this is not the place for goobley tech speak.
A hashbang used in a web URL is also an interpretor, but of a different sort. The hashbang is a fake tool used to get URLs generated by javascripts and AJAX picked up by Google. Great, what does that mean? The # tells the browser that the rest of the URL is an ‘identifier’ and everything after that # is computer code for a part of the current page. So in the case of Lifehacker, URLs went from http://lifehacker.com/5753509/hello-world-this-is-the-new-lifehacker to http://lifehacker.com/#!5753509/hello-world-this-is-the-new-lifehacker.
Most people missed the change and concentrated on the new layout. The #! in there manages to parse the right page, so why should we worry? After all, AJAX is the ‘future’ of the web. AJAX, by the way, is not the dim-witted but incredibly strong Greek hero. AJAX is shorthand for Asynchronous JavaScript and XML, but has very little to do with XML. Instead it’s used to create interactive web-apps. It’s exceptionally powerful, but a little dim-witted. Oh, okay, maybe my Illiad joke wasn’t so far off, after all!
AJAX has a whole lot of problems, most important (to me) is the fact that it’s not easily read by screen-readers, which means your blind visitors get the shaft a lot from poorly written AJAX sites. You also can’t use the back button most of the time, nor are these sites easily read by web crawler bots, which means that you don’t end up on Google results. Your page isn’t ‘real’, therefore it’s not followed. Of course, the folks at Google recognized this problem and came up with a weird solution: The Hashbang! That’s right, Google introduced it as a way to get sites dependant on AJAX into their results. (A proposal for making AJAX crawlable – Google Webmaster Central, Wednesday, October 07, 2009 at 10:51 AM)
Sounds great, though rather complicated and a little kludgey. Now you can have the best of both worlds. There are still drawbacks, of course, and I hear this refrain over and over:
The main problem is that LifeHacker URLs now don’t map to actual content. (Breaking the Web with hash-bangs – Mike Davies (Isolani.co.uk) Tuesday, February 08, 2011)
This initially perplexed me. After all, WordPress (which runs this site) is really a single page site (index.php) with a million identifiers, and it’s not like there’s an actual, physical, page on my server called do-pretty-urls-still-matter. What’s the difference? Turns out it’s all in how the URL is generated. The hashbang method is prone to error, as Lifehacker found out already, all you have to do is kill javascript and your entire site goes down. That I knew, but as I sat to write this I got myself in a bind. After all, in order for my pretty URLs here to work, I need .htaccess (which means my httpd.conf stuff is all in a row), PHP and SQL. Of course, all that is server side, and the user only needs the most basic of browsers to get to my site. But how were those failure points less than a javascript induced set?
I appealed to Twitter and got a couple fast responses from @JohnPBloch:
JohnPBloch: The difference is that one (js) depends on millions of different configurations working correctly; the other depends only on one […] for example, I’ve had chrome extensions that crashed JS on non-jQuery pages before.(JohnPBloch) (JohnPBloch)
Thanks to John, my brain kicked back into gear. There is no browser dependency on my URLs. The only way for them to fail is if I screw up the backend (that could be my site setup or my server). Either way, the critical failure will always be me, and never my user. That’s as it should be. We’re so often told that our customers are always right. As a corollary, your customers should never be at fault for a problem (as long as they’re using the products as intended). There are good reasons to use AJAX. Like using AJAX and javascript allows for a responsive and truly interactive experience for you users, in situations where small parts of the interface are changing (think Gmail, Flickr and Google Maps). Gmail actually would be perfect. That’s something where I neither want nor need a ‘pretty’ URL past http://gmail.com because I’ll never share the URL.
The rest of the time, though, URLs are important. If I want someone to visit my site, I tell them ‘Go to https://halfelf.org’ and they know what to do. Back in the Trojan War days, you’d hear “Login to AOL, keyword ‘Ipstenu’.” instead of a URL, and we’ve moved away from that to allowing people to claim their own domains and their own presence online. We shouldn’t be reliant on AOL, Google, or anything else to generate our representation. So, regardless of the problems with the hashbang (and there are a lot), your URL is important.
We all know the difference between a ‘pretty’ URL and an ugly one on sight. https://halfelf.org/?p=1358 is ugly and https://halfelf.org/2011/do-pretty-urls-still-matter is pretty! Simple. Lifehacker (and Twitter for that matter) have always had ‘kind of’ pretty URLs, with a weird number string in there, but enough to make you go “Okay, this Lifehacker post is about monkeys” or “This tweet is by Ipstenu.” With the change to hashbang, both sites (temporarily) broke the cardinal rule of websites: never change your URLs. By now, both sites will redirect a hashbangless site to a hashbanged one, which is as it should be. You never get rid of your old URLs, which is why on older sites, you have a really freakish .htaccess with miles and miles of regexp. AJAX makes you jump through hoops for pretty URLs.
But that really begs the question of whether or not pretty URLs actually matter anymore, or this just me being an old stick in the mud? Much like Google’s minor issues with AJAX/Javascript, they have minor issues with Dynamic URLs. Quickie explanation: static URLs are pretty, dynamic aren’t. Basically, they can and do crawl dynamic URLs, but static are preferred. (Dynamic URLs vs. static URLs – Google Webmaster Tools, Monday, September 22, 2008 at 3:20 PM) On the other hand, we know that shorter is better, and you can’t get much shorter than https://halfelf.org/?p=1358 when you get down to it.
I would posit that, since the web is based on look and feel, the design of your site still relies, in part, on the ease of someone in understanding the URL.



Now that you know all about the
“Content farms” are the wave of the future, and Google calls them sites with “shallow or low-quality content.” The definition is vague, and basically means a content farm is a website that trolls the internet, takes good data from other sites, and reproduces it on their own. Most content farms provided automatically inserted data. There is no man behind the scenes manually scanning the internet for related topics and copy/pasting them into their site. Instead, this is all done via software known as content scrapers. The reasons why they do this I’ll get to in a minute, but I think that Google’s statement that they’re going to spend 2011 burning down the content farms is what’s got people worried about duplicate content again.
What Google is doing is not only laudable, but necessary. They are adapting to the change of how spam is delivered, and doing so in a way that should not impact your site. The only ways I can see this affecting ‘innocent’ sites are those blogs who use RSS feed scrapers to populate their sites. This is why anytime someone asks me how to do that, I either tell them don’t or I don’t answer at all. While I certainly use other news articles to populate my site, I do so my quoting them and crafting my own, individual, posts. In that manner I both express my own creativity and promotion the high quality of my own site. I make my site better. And that is the only way to get your site well-ranked. Yes, it is work, and yes, it is time consuming. Anything worth doing is going to take you time, and the sooner you accept that, the happier you will be.

I am not an SEO expert. In fact, there are only a handful of people whom I feel can claim that title without making me roll my eyes so hard I get a migraine. Anyone who tells you they have all the answers to get your site listed well in Google is a liar, because there’s only one good answer: Make a good site. That’s really it. How then do all those spam sites get listed in Google, Bing and Yahoo to begin with, and will the techniques the search engines are using to eradicate those sites hurt you?
You write a blog post and the content is stored in the database, along with any tags, catgories, or meta data you put in. When someone goes directly to the blog post, they see. However, they can also see the post if they go to a list of posts in that category, with that tag, on that date, in that year, etc etc and so on and so forth. So the question a lot of new webfolks ask is “Is that duplicate content?” No. It’s not. Nor is having ipstenu.org and ipstenu.org point to the same page. In fact, that’s good for your site. The more, valid, ways you have of providing your user with information, the easier it is for them to find what they want, and they happier they are. Happy users means repeat users, which means profit (in that oh so nebulous “web = profit” theory).
Google also makes the claim that since CMSs generally don’t handle duplicate content ‘well’ (their word, not mine), non-malicious duplication is common and fairly harmless, though it will affect search results. Here’s where things get sticky. Personally, I disagree with Google’s claim that CMSs handle duplicate content poorly. A well written CMS, knowing that no two people think the same way, takes that into consideration when crafting a site. You want an index, but if you know someone looks for things by subject matter or year, you need to have a way to provide that information for the reader. Google’s problem is that in doing so, you have also provided it for the GoogleBots who patrol your site and pull in the data for searches, which makes the dreaded duplicate content.
The answer is the same when I’m asked why WordPress gets so much spam. Because it’s used a lot! The more prevalent your product is (i.e. the more successful it is), the higher the likelihood is that some jerk with a kiddie script will try to attack it. This is just a fact of life, and I’m not going to get into how to solve it.
Every now and then I see someone ask ‘How do I disable right-clicking on images on my site?’ My answer, invariably, is ‘You don’t.’ The real question I suppose is ‘How do I stop people from ripping off my work on the net?’ and the answer to that is still ‘You don’t.’