Tag: seo

Introducing HEO

We all know that SEO is ‘Search Engine Optimization.’ I humbly suggest we pay better attention to HEO – Human Experience Optimization.

After you spend hours and hours optimizing your site for search engines, you should sit back and think about how the humans who are reading your site. This should be blindingly obvious to everyone, but more and more we hear about how you should make your URLs SEO friendly, or your post excerpts/slugs/format/meta-data the best to get highly ranked in Google. At a certain point, you’re missing the goal of a website.

A website is not for search engines, a website is for humans.

Humans like to be able to find what they want relatively painlessly. They like to know when something was written (or when whatever it’s about took place). They like to be able to search, sort, surf and select. They like to know weird things. It’s your job to make sure that when a user hits your site, they stay.

Fonts

I’ve mentioned before that font choices matter on your site. Perhaps the most important thing to remember about fonts is that people have to be able to read them. A lot of sites make their fonts very small, which force viewers to hit Ctrl-+. This is one of Jakob Nielsen’s pet peeves. Users should be able to control their font size, but you should also set your font starting size to something legible.

Imagine my surprise when I went to a site and saw this:

I had to zoom in to read. That font is set to font: 11px/13px "Lucida Grande"..... Just by changing it to 12px/20px it was easier to read, but to make it a perfect starting point, it should really be 14px/20px. You’ll need to balance on your font choice with the size, though, as too-thick and too-thin fonts are equally painful for people to read.

Colors

I’m in my mid-thirties with the best worst vision you’ll find before someone gets classified legally blind (that said, I have fantastic night vision). I cannot read black backgrounds with white text for more than a few seconds without getting after-images. I’m not in the minority of the world. There’s a reason books, eReaders, newspapers and magazines tend to print dark text on light backgrounds, and it’s not just the cost. More people can read that setup. On top of that, don’t use background images. The busier the background, the more difficult it will be to read and you’ll draw the attention away from the text.

The colors on your site need to be easy to read, and not strain the eyes.

Layout

Did you know that users tend to read to the left? This sort of flow makes sense when you consider that most languages are read left-right. Jakob Neilsen points out that people spend “more than twice as much time looking at the left side of the page as they did the right.” (Jakob Nielsen’s Alertbox, April 6, 2010: Horizontal Attention Leans Left) Not only that, but people actually tend to read pages in a pretty distinct F-shaped pattern. (Jakob Nielsen’s Alertbox, April 17, 2006: F-Shaped Pattern For Reading Web Content)

So how do you best layout your website? I tend to think people read content better if it’s on the left, so I put the body of my text left and the sidebars right. I also take into account that newspapers and magazine break up text into columns for readability reasons, and set a fixed width to my site. That choice is somewhat controversial among my friends, but I like to look at the iPad and Kindle for examples as to why you want to not allow forever-width pages. Monitors are big, browser windows can be huge, but in the human head, eyes are spaced in a certain way. Making your page’s content too wide is a drain.

Page Length

There used to be a concept of ‘The fold’, which was basically that people didn’t scroll down on webpages in the early days of the web, so if they didn’t see your important content on the top half of your page (i.e. above the fold), they weren’t going to see it at all. It’s 2011. People know to scroll down a page.(Jakob Nielsen’s Alertbox, March 22, 2010: Scrolling and Attention) But you still need to make sure your site has the most important content ‘above’ the fold.

Where’s the fold these days, though? Monitor size is a lot more variable today than it was in 1995, and the break-point on a page is getting pretty difficult to figure out. Unlike a newspaper, where the ‘fold’ is pretty obvious (unless you’re the Chicago Sun Times), you have to take a pretty good guess at where the ‘top’ of your site is. Oddly, this is a lot easier with the iPad, which currently is my benchmark for ‘the fold.’

Keeping that in mind, page length matters! I try to keep each post no more than 1200 words, because of human attention span. If I happen to dip longer, I’ll consider breaking the post into multiples.

Permalinks/URLS

Samuel Wood (aka Otto) said it simply:

Humans care about dates. Leaving a date identifier (like the year) out
of the URL is actually de-optimizing the site for humans.

Not everything should have a date, mind you. Resources like WikiPedia or other sites that act as repositories for static, timeless material (like a book), certainly do not need date stamps. Deciding if your site needs to include the year in the URL (like I do here), or not at all (like I do elsewhere), is something you need to think long and hard about. If you’re making a ‘traditional’ blog, or a newspaper, or some site that acts as a repository for time-based information, the answer is simple: Yes you do.

In addition to sorting out if you need dates or not on your site, you have to think about the post format. I’m a huge proponent of pretty URLs, so I tend to lean to custom crafted URLs. On WordPress, I always review the permalink and, if I think it could be better shorter, I do so. MediaWiki defaults to whatever you want to name the page and puts that in as your page title(Oddly you can only override this with {{DISPLAYTITLE:Custom title}} , which has weird results in searches.), but WordPress uses the ‘title’ of your post and makes that your page title.

This is pretty easy to change, though. Just click on edit and make it shorter (which I strongly suggest you do in most cases).

What else?

I could go on and on. Like how you shouldn’t use too many ads (and whatever you use, they shouldn’t be bigger than your post content!), don’t use flashing images/text, and keep in mind your audience! What are your hot-button topics for making your site human friendly?

27 June, 2011
What’s Your Net Worth?

I get a lot of requests from people to link to their sites. Back in the day, we all used to have massive link pages where we just listed all the cool sites we knew about. On a fansite, I actually still have one where I list all the related sites, organized by how they’re related, separated by language, etc etc. Here, though, you see a list on the right of links, broken down into “Websites” and “WordPress” and that’s pretty much it.

The reason is that I subscribe to the belief of contextual links. If a link, by itself, has no context, my reader cannot determine the inherent value of the link. When I write a blog post, I try to put links that make sense inside my post. On my fansite, where I have a moderately sized wiki, I link from the related page to the related site.

Still, when people ask me to link to their site (or to friend them on Twitter/Facebook whatever) my knee-jerk reaction is “Why?” and it should be yours too! You should always ask that when someone wants to network. What’s in it for me? What good will this bring me? Do you write good content? If you’re asking someone to link to you, you had better be bringing something good to the table, other wise you’re an unsolicited request, and no one likes those.

Perhaps this flies in the face of my SEO advice (which is to network), but networking doesn’t mean you should cold-call everyone with a related site and ask for attention. Sometimes networking is linking to people, but it’s also tweeting and working the community. If you have a site about dog biscuits, hang out on the Milk Bone forum and talk to people. If someone has a question about the best biscuits for an old dog missing teeth, and you know you wrote a great post to it, you link to it. “Hi, Bob. My dog is 16 and he’s got no teeth on the right side, I know your pain! I spent a lot of time researching this problem, and hopefully this will help you. Link.”

Look at that! You were nice, polite, and helpful! It’s even better if you stick around and talk to Bob some more, if he needs it. You’re building your reputation in a productive and constructive way.(Yes, it’s a lot of work. If you haven’t caught on to that yet, I also have a bridge for sale …) The most important part is that you told Bob why your link was going to help him. You put up some cred and you didn’t make it too long.

When you think about it, the best way to get people to link to you is to get them interested in your site. The best way to get them interested in your site is to make content of value. Part of having a site with percieved value is having a site that attracts myriad walks of life. It’s a vicious circle. You have to get that foot in the door for people to notice you, and that’s what makes you popular.

How do you get the foot in the door if you don’t want to spend all your time on related sites?

You don’t.

Look. If this was a brick and mortar company, you’d be advertising, wouldn’t you? You’d know you had to network your vegan dog biscuits to all the hippies and dog lovers out there, and you wouldn’t think twice about it. You’d hire that idiot kid to stand on the corner in a gorilla outfit handing out coupons, or spin a sign while dressed as a sandwich. You would spend money and time to introduce the world to your brand.

The Internet is the exact same way. So when you cold-email someone and say ‘Hi, I really like your stuff! Will you link to me site?’ you need to bring your A Game. You need to sell your work, explain to me why you’re worth space on my site, and how come I should read your blog. Just saying ‘I, too, am a blog about vegan dog food!’ doesn’t cut it for the bigger sites. You can’t expect people to spend all their time checking out people they should link to, especially if you’re not already linking to them. Think of it like coming up with a good cover letter for your resume. You want people to read that page and go “Yeah, this cat is cool!”

Your links make or break you, but more important than who links to you is who, and how, you link to others. If you link to every dog site in the world, links from you are worthless. If you’re discerning and link only to the ones that mean the most to you, or are the most like your own site, then you’ve shown the ability to tell the difference between any old site and one of value. You’ve made yourself worth something.

And when you’re there, you won’t need ask people to link to you any more. That’s when you’ve made it.

Just don’t think it’ll happen all in one day.

6 June, 2011
SEO “Experts” Are Lying To You (About Backlinks)
Stop me if you’ve heard this one.

“For just $19.95, we offer hundreds of certified backlinks!”
“Quality backlinks for your site!”
“In just one week, we can make your site #1 in Google Searches!”

I see people ask, a lot, what the best WordPress plugin is to generate backlinks. And I always reply something like this: “The best way to get backlinks is to write good posts that people will link to and share.”

But what is a backlink anyway? As obvious as it sounds, a backlink is a link from someone else’s site back to yours. So when I say things like “Yoast’s explanation on how BlogPress SEO Plugin generates spam is an invaluable resource”, I’ve linked back to his site and made a backlink. If he has pings on, he’ll see my remark and link, and it’ll show up on his site in the comments section.(I actually turn pings off, because of the high number of spammers and the low value it was giving me. If the only reason you’re linking to someone is to get the link BACK to your site, you’re doing something wrong, but that’s another blog post.) Backlinks, honest ones between two good sites, are great. I love getting linked to from CNN (it happened once) or other sites who like my writing. It’s a great compliment.

However, people seem to think that backlinks are going to ‘generate SEO.’ First off, they’re not using the words correctly. SEO stands for ‘Search Engine Optimization.’ My first grown-up job, where I wasn’t just fiddle-farting around on the computer, was to optimize meta-data for sites to get them ranked first on AltaVista, so yes, I do know what I’m talking about here. Due to that early work, I’ve got pretty awesome Google-Fu, because I used to spend hours going over the specs for search engines, and reading up on how they worked, what their algorithmic engines were, and how to get legitimately good results for my key words. I also learned what keywords are useless.

Back in the day, search engines would rate your site based solely on your self-contained content. One of the ways we would promote our sites would be to use hidden text or meta keywords that only the search engine would see. We’d list all the keywords related to our site about dog biscuits, and awesomely, we’d get rewarded. Naturally some people would shove totally irrelevant keywords in, to game the system for other searches. Which is why sometimes you’d search for ‘free range catnip’ and get a link for ‘wetriffs.com'(Note: wetriffs.com is NOT SAFE FOR WORK!). Today, no search engine relies on keyword meta data because of that (though most sites still include it).

Nothing can ‘generate’ SEO, because by it’s nature, optimization isn’t something you generate. It’s something you can leverage and build on, but we don’t generate it. Backlinks are, certainly, a component in getting your site highly ranked on Google for your keywords, but you’re really not optimizing your site for backlinks by doing anything other than making good posts, if you think about it. Maybe I’m splitting hairs, but your page rank (i.e. how cool Google thinks you are) is going to be build on a few things, and while backlinks are one of them, it’s not everything.

Here’s how you make a good site that’s highly ranked in Google:
1. Write good content
2. Include decent meta data in your site’s HTML (If you need help with that, check out Google’s page on Meta Tags.)
3. Network with other (related) sites to share links
4. Advertise
So why do people get all fired up about backlinks? Google themselves say:

Your site’s ranking in Google search results is partly based on analysis of those sites that link to you. The quantity, quality, and relevance of links count towards your rating. The sites that link to you can provide context about the subject matter of your site, and can indicate its quality and popularity.(Link Schemes – Google.com)

Sounds great, doesn’t it? If a lot of people link back to me, like WikiPedia, then my content is proven to be good, and I win! You knew it wasn’t that simple, right? Google’s smart. They actually care about the quality and relevance of people linking to you! Heck, Google actually agrees with me when we both say the best way to get a good page ranking is to make good content. More to the point, those get-backlink-quick tools are going to engage in what basically amounts to spam, which will adversely impact your page ranking.

Of course, there are good backlinks. Like mine to Yoast’s (not that he needs the ‘link juice'(The term ‘link juice’ is what we use to call the ‘value’ of a link coming back to our site. If I link to you, I give you ‘juice’ which boosts your page rank. In Yoast’s case, he doesn’t need any help, but I give it anyway.)). But the best way to get those is to get yourself known in your arena. People don’t link to new sites because they don’t know about them, so you need to get out there and get known. Talk to a site you admire (or people you admire) and ask them if they’ll read and review your site. Post your articles on twitter/facebook/digg/whatever and basically put in the sweat equity to make your site shine. And if that sounded like a lot of work for you, then you’re right. It is work. It’s hard work.

The obvious question now is that if these so-called experts are telling you that they can generate hundreds of backlinks, what are they actually doing? They’re ripping you off. There’s no automatic, auto-generated, way to create backlinks. So if someone tells you that they can do it for $19.95, they’re not lying, but they are cheating you out of money, and giving you something useless. If you’ve fallen for one of those scams, I’d cancel that credit card ASAP. I have a horror story about a guy who got scammed and then ripped off for a couple grand.

The lessons learned from this are pretty simple: There is no quick fix, no magic bullet, no perfect tool that will make you popular. You have to find your audience and pitch good content to them. You have to work hard and yes, this takes a lot of time and effort. Anyone who says differently is selling something. Of course, optimizing the hell out of your site (with caching software and minification and CDN) is a great thing to speed your site up, but at the end of the day, all advice in the world boils down to this: If there’s nothing here for people to read and find beneficial, your site is useless.

Before you get depressed and think there’s nothing you can do to improve your site, I refer back to Joost de Valk. When people tell me they’re an SEO expert, I compare their website and work to Yoast, because in my opinion, he’s the example of what an SEO expert looks like and he doesn’t call himself an expert. He says he’s a ‘SEO and online marketer.’ Sounds to me like a guy with his head on staight. Pretty much everyone else I ignore. And he’s written the Ultimate SEO Guide and it’s free for anyone to use.

SEO Folks I Would Hire (culled from my ‘Folks I’d Hire’ list):
1 June, 2011
Do Pretty URLs Still Matter?

With all the kerfluffle about the new Lifehacker (and the 2.0.1 redesign) one thing stood out for me as the more interesting fallout: a rush on articles about the dreaded Hashbang.

Unix heads can skip this paragraph. For the rest of the world, a hashbang (aka a shebang) is the character sequence consisting of the characters number sign and exclamation point #!. You see this a lot at the beginning of scripts, and are used to define which ‘interpretor’ you’re using. Basically you’re telling Unix which program to use to parse the rest of the script. It’s more complicated than that, but this is not the place for goobley tech speak.

A hashbang used in a web URL is also an interpretor, but of a different sort. The hashbang is a fake tool used to get URLs generated by javascripts and AJAX picked up by Google. Great, what does that mean? The # tells the browser that the rest of the URL is an ‘identifier’ and everything after that # is computer code for a part of the current page. So in the case of Lifehacker, URLs went from http://lifehacker.com/5753509/hello-world-this-is-the-new-lifehacker to http://lifehacker.com/#!5753509/hello-world-this-is-the-new-lifehacker.

Most people missed the change and concentrated on the new layout. The #! in there manages to parse the right page, so why should we worry? After all, AJAX is the ‘future’ of the web. AJAX, by the way, is not the dim-witted but incredibly strong Greek hero. AJAX is shorthand for Asynchronous JavaScript and XML, but has very little to do with XML. Instead it’s used to create interactive web-apps. It’s exceptionally powerful, but a little dim-witted. Oh, okay, maybe my Illiad joke wasn’t so far off, after all!

AJAX has a whole lot of problems, most important (to me) is the fact that it’s not easily read by screen-readers, which means your blind visitors get the shaft a lot from poorly written AJAX sites. You also can’t use the back button most of the time, nor are these sites easily read by web crawler bots, which means that you don’t end up on Google results. Your page isn’t ‘real’, therefore it’s not followed. Of course, the folks at Google recognized this problem and came up with a weird solution: The Hashbang! That’s right, Google introduced it as a way to get sites dependant on AJAX into their results. (A proposal for making AJAX crawlable – Google Webmaster Central, Wednesday, October 07, 2009 at 10:51 AM)

Sounds great, though rather complicated and a little kludgey. Now you can have the best of both worlds. There are still drawbacks, of course, and I hear this refrain over and over:

The main problem is that LifeHacker URLs now don’t map to actual content. (Breaking the Web with hash-bangs – Mike Davies (Isolani.co.uk) Tuesday, February 08, 2011)

This initially perplexed me. After all, WordPress (which runs this site) is really a single page site (index.php) with a million identifiers, and it’s not like there’s an actual, physical, page on my server called do-pretty-urls-still-matter. What’s the difference? Turns out it’s all in how the URL is generated. The hashbang method is prone to error, as Lifehacker found out already, all you have to do is kill javascript and your entire site goes down. That I knew, but as I sat to write this I got myself in a bind. After all, in order for my pretty URLs here to work, I need .htaccess (which means my httpd.conf stuff is all in a row), PHP and SQL. Of course, all that is server side, and the user only needs the most basic of browsers to get to my site. But how were those failure points less than a javascript induced set?

I appealed to Twitter and got a couple fast responses from @JohnPBloch:

JohnPBloch: The difference is that one (js) depends on millions of different configurations working correctly; the other depends only on one […] for example, I’ve had chrome extensions that crashed JS on non-jQuery pages before.(JohnPBloch) (JohnPBloch)

Thanks to John, my brain kicked back into gear. There is no browser dependency on my URLs. The only way for them to fail is if I screw up the backend (that could be my site setup or my server). Either way, the critical failure will always be me, and never my user. That’s as it should be. We’re so often told that our customers are always right. As a corollary, your customers should never be at fault for a problem (as long as they’re using the products as intended). There are good reasons to use AJAX. Like using AJAX and javascript allows for a responsive and truly interactive experience for you users, in situations where small parts of the interface are changing (think Gmail, Flickr and Google Maps). Gmail actually would be perfect. That’s something where I neither want nor need a ‘pretty’ URL past http://gmail.com because I’ll never share the URL.

The rest of the time, though, URLs are important. If I want someone to visit my site, I tell them ‘Go to https://halfelf.org’ and they know what to do. Back in the Trojan War days, you’d hear “Login to AOL, keyword ‘Ipstenu’.” instead of a URL, and we’ve moved away from that to allowing people to claim their own domains and their own presence online. We shouldn’t be reliant on AOL, Google, or anything else to generate our representation. So, regardless of the problems with the hashbang (and there are a lot), your URL is important.

We all know the difference between a ‘pretty’ URL and an ugly one on sight. https://halfelf.org/?p=1358 is ugly and https://halfelf.org/2011/do-pretty-urls-still-matter is pretty! Simple. Lifehacker (and Twitter for that matter) have always had ‘kind of’ pretty URLs, with a weird number string in there, but enough to make you go “Okay, this Lifehacker post is about monkeys” or “This tweet is by Ipstenu.” With the change to hashbang, both sites (temporarily) broke the cardinal rule of websites: never change your URLs. By now, both sites will redirect a hashbangless site to a hashbanged one, which is as it should be. You never get rid of your old URLs, which is why on older sites, you have a really freakish .htaccess with miles and miles of regexp. AJAX makes you jump through hoops for pretty URLs.

But that really begs the question of whether or not pretty URLs actually matter anymore, or this just me being an old stick in the mud? Much like Google’s minor issues with AJAX/Javascript, they have minor issues with Dynamic URLs. Quickie explanation: static URLs are pretty, dynamic aren’t. Basically, they can and do crawl dynamic URLs, but static are preferred. (Dynamic URLs vs. static URLs – Google Webmaster Tools, Monday, September 22, 2008 at 3:20 PM) On the other hand, we know that shorter is better, and you can’t get much shorter than https://halfelf.org/?p=1358 when you get down to it.

I would posit that, since the web is based on look and feel, the design of your site still relies, in part, on the ease of someone in understanding the URL.

28 February, 2011
Google vs Splogs – Part 2

Now that you know all about the myth of the duplicate content penalty, we can look into spam.

This year, Google got slammed because the quality of their search was being degraded by spammers. Mostly splogs, I will admit, but Google rightly points out that their ability to filter out spam and splogs in all languages is actually much better than it was five years ago. (Google search and search engine spam – 1/21/2011 09:00:00 AM) No, Google isn’t getting worse, there are just more spammers out there. They also take the time to differentiate between “pure webspam” and “content farms.”

“Pure webspam” is what you see in a search result when a website uses meta data or hidden content in order to bully their way into being highly ranked in unrelated searches, or just basically game the system. A decade ago, this was horrific. Now it’s nearly negligible. This type of spam grew pretty organically out of people trying to understand the algorithm behind search engines and manipulate it legally. As we gained greater understanding of meta keywords and in-context content, we came up with more and more tricks to legitimately make our sites more popular. There was a point in time where having hidden text with as many keywords related to your site was not only common place, but lauded. It didn’t last long, as shortly after the good-guys sorted that out, the bad-guys did too.

“Content farms” are the wave of the future, and Google calls them sites with “shallow or low-quality content.” The definition is vague, and basically means a content farm is a website that trolls the internet, takes good data from other sites, and reproduces it on their own. Most content farms provided automatically inserted data. There is no man behind the scenes manually scanning the internet for related topics and copy/pasting them into their site. Instead, this is all done via software known as content scrapers. The reasons why they do this I’ll get to in a minute, but I think that Google’s statement that they’re going to spend 2011 burning down the content farms is what’s got people worried about duplicate content again.

A content farm is (partly) defined as a website that exists by duplicating content. Your site’s activity feed/archives/post tags pages are duplicating content for the users. Does that mean your site will be adversely affected because of this?

No. It will not.

Google’s algorithm is targeting sites of low content quality. While your stolen post is a beautifully written piece of art on its own, it’s the site as a whole that is used to generating a search ranking. As I’ve been touting for a decade, the trick to getting your site promoted in Google searches is to make a good site. Presuming you made a good site, with good content, and good traffic, and it’s updated regularly, there is very little risk that Google will peg your site as being of “low content quality.” Keep that phrase in mind and remember it well. Your site isn’t highly ranked because of low content, remember! It’s the reverse. If you’re being ranked for good behavior, good content, and good work, you will continue to be rewarded. In a weird way, content farms are actually helping Google refine their search so that it can tell the difference between good sites and bad! (Why The Web Needs Content Farms – by Eric Ward on February 16, 2011)

The next Google algorithm update will focus on cleaning content farms from positions of unfair advantage in our index. This will likely affect websites with considerable content copied from other online sources. Once this update is complete, preference will be given to the originators of content. We expect this to be in effect in no less than 60 days. (Google search and search engine spam – 1/21/2011 09:00:00 AM)

What Google is doing is not only laudable, but necessary. They are adapting to the change of how spam is delivered, and doing so in a way that should not impact your site. The only ways I can see this affecting ‘innocent’ sites are those blogs who use RSS feed scrapers to populate their sites. This is why anytime someone asks me how to do that, I either tell them don’t or I don’t answer at all. While I certainly use other news articles to populate my site, I do so my quoting them and crafting my own, individual, posts. In that manner I both express my own creativity and promotion the high quality of my own site. I make my site better. And that is the only way to get your site well-ranked. Yes, it is work, and yes, it is time consuming. Anything worth doing is going to take you time, and the sooner you accept that, the happier you will be.

For most small to medium sites, there’s not a thing you need to do in order to maintain your ranking. There are no magic bullets or secrets behind the SEO, to manipulate your site to a better ranking. In point of fact, doing so can be seen as gaming the system and can downgrade your results! Once again. Make a good site and you will be rewarded. Certainly, as I said yesterday, optimizing your robots.txt file and getting a good sitemap will help, and I really do suggest a Google Webmaster Tools account to help you with that. In 2011, Google is still king, so once you get your site well listed within Google’s machine, you’re pretty much going to be tops everywhere.

Why do splogs and content farm game the system in order to get highly ranked? Profit. Some do it to get their domain highly ranked and then sell it for a lot of money, others do it to infect your computer with a virus, and then there’s the rare hero who thinks this will get them money because of the ads on their site. Sadly, this still works enough to generate just enough of a profit to keep the splogs going. This is also true of spam emails. Yes, that means your grandmother and Carla Tallucci are still falling for the Nigerian Princess scam emails. The only way to stop all of that is to stop those methods from being productive money makers for the spammers, and that is something that will take us all a very long time and a great deal of education to the masses.

Your take aways are pretty simple. Make a good site with good content. Update it regularly. Use a sitemap to teach search engines what’s important. You’ll be fine. Don’t sweat internal duplication.

22 February, 2011
Google vs Splogs – Part 1

I am not an SEO expert. In fact, there are only a handful of people whom I feel can claim that title without making me roll my eyes so hard I get a migraine. Anyone who tells you they have all the answers to get your site listed well in Google is a liar, because there’s only one good answer: Make a good site. That’s really it. How then do all those spam sites get listed in Google, Bing and Yahoo to begin with, and will the techniques the search engines are using to eradicate those sites hurt you?

Everyone’s done that search for something and been nailed by a bunch of splogs. Like you look for ‘Laurence Fishburne’ news and you get 100 websites that claim to have news, but really it’s just total, useless crap? Those are splogs, a neologism for spam blogs, and they suck. Splogs are blogs where the articles are fake, and are only created for search engine spamming. They sometimes steal your hard work and use them, by scraping RSS feeds or who knows what else, and generating fake content. Why? Some people do it to infect your PC with a virus, and others do it to trick in into clicking on their ads.

The problem is spam blogs are so prevalent that they’re adversely affecting search engines, making it harder and harder for you to find real, factual content. This year, rurmors started flying that Google was going to go on the warpath against Search Engine Spam, and in doing so, would downgrade perfectly valid sites with ‘duplicate content.’ Having read and re-read the articles posted by Google on the matter, I’m quite certain that, yet again, people are playing Chicken Little. Nowhere, in any of the articles I’ve read, has there been any discussion of the intent to penalize legitimate, valid, websites for containing internally duplicated content.

In order to understand the duplicate content penalty problem, and yes, it is a problem, you need to understand how most content management systems (CMS – this includes sites like Drupal, Joomla and WordPress) display their data to the users.

You write a blog post and the content is stored in the database, along with any tags, catgories, or meta data you put in. When someone goes directly to the blog post, they see. However, they can also see the post if they go to a list of posts in that category, with that tag, on that date, in that year, etc etc and so on and so forth. So the question a lot of new webfolks ask is “Is that duplicate content?” No. It’s not. Nor is having ipstenu.org and ipstenu.org point to the same page. In fact, that’s good for your site. The more, valid, ways you have of providing your user with information, the easier it is for them to find what they want, and they happier they are. Happy users means repeat users, which means profit (in that oh so nebulous “web = profit” theory).

So what is this mysterious duplicate content penalty?

Let’s take this from the horse’s mouth (or at least Google):

Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.(Demystifying the “duplicate content penalty” – Friday, September 12, 2008 at 8:30 AM)

Google goes on to outright state that so long as the intention is well meaning (like making it easier for people to find your valid and useful content), then you will receive no adverse effects in searches for your blog. That means 99.999% of you out there can relax and walk away. What about people who use things like WordPress MU Sitewide Tags Pages (which takes the excerpts of all posts on a WordPress MultiSite installation and duplicates them onto another site), or BuddyPress’s activity stream (which records everything in multiple places)? Again, the answer is the same. You’re doing this to make the site more available and accessible, ergo no harm ergo no foul.

Google also makes the claim that since CMSs generally don’t handle duplicate content ‘well’ (their word, not mine), non-malicious duplication is common and fairly harmless, though it will affect search results. Here’s where things get sticky. Personally, I disagree with Google’s claim that CMSs handle duplicate content poorly. A well written CMS, knowing that no two people think the same way, takes that into consideration when crafting a site. You want an index, but if you know someone looks for things by subject matter or year, you need to have a way to provide that information for the reader. Google’s problem is that in doing so, you have also provided it for the GoogleBots who patrol your site and pull in the data for searches, which makes the dreaded duplicate content.

Perhaps Google has forgotten (or not made the connection) that they do the exact same thing. They want to show you what you want to see, and while I may search for “Laurence Fishburne actor” and you might look for “Morpheus Actor”, in the end, we both want to see sites about this actor guy named Laurence Fishburne. How do you make sure we get the right information? You have the content sortable in myriad manners. Does that make it duplicate content? Of course not (unless you’re Bing, which is a whole different subject). Google points out:

Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate documents so that users experience less redundancy. (Demystifying the “duplicate content penalty” – Friday, September 12, 2008 at 8:30 AM)

Thankfully, you can eliminate redundancy by providing Google with a sitemap of your website.(About Sitemaps – Google Webmaster Central) With a good sitemap, you can tell search engines how to weigh your site’s content. Which pages are more important, which can be ignored, etc etc. With WordPress and a good plugin, this can be done automatically by making a few choices in an admin interface. You also want to spend a little time understanding your robots.txt file. Perishable Press has a great article on optimizing it for WordPress.

Now that you know about the myth behind the duplicate content penalty, tomorrow we can get into content farms!

21 February, 2011