Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • Obfuscation Obtuseness

    Obfuscation Obtuseness

    I’m not getting into the ethics of free vs pay and theft or any of that. Comments on that matter will be deleted on sight.

    Earlier in the year I remarked on how right-click protection doesn’t work.   Right-click disabling is a form of obfuscation protection.   By hiding the normal methods to perform actions, you are ‘protected.’  You’re not.  I told you so then, and I tell you now.  If it’s on the internet, it can and will get ‘stolen.’

    Pretty recently I got into a ‘tiff’ with the guy who wrote a WordPress plugin to disable right click.   A user had a problem and bitched (generically).   I told him that the plugin worked well for what it was, but that disabling doesn’t really do much.   The plugin author accused me of trolling (incorrect, I was attempted to start a conversation with the user to sort out why he needed the plugin, and to then help determine a solution – which is my volunteer job).   Of course, that meant I wanted to see how this guy’s product worked.  I went and grabbed WP Protect and determined that you can block right-click, image drag and text selection.   He doesn’t block CTRL-A, though, so I could easily select all and move on. (Actually you CAN block ctrl-A, per Disable ctrl + n and other ctrl + key combinations in JavaScript, but remember that on a Mac it’s Apple-A and you have to take all that into consideration.)

    My point remains valid that the technical code for doing these things is not complete, doesn’t work for all situations, and puts a burden on you.  And it doesn’t work!   If you printed up a newspaper, it’s easy for people to copy your work.   We have a copy machine, a scanner, and scissors.   If you send out a DVD, we can rip it at home and pass it around to our friends (which is legal actually, so long as you’re not selling it as your own work – see mix tapes, yo).  Why is stopping copying bad for your users?   If I want to send someone a link to your article, there are two things I want to do.   First, I copy your URL.  Second I want to copy your title (and maybe an excerpt to illustrate a point).   By killing right click, you made it a royal pain to SHARE your work.   And if you’re online, you want people to share.  The same goes with DVDs, mix tapes etc.   Sharing is how we tell people “I really like this!”  You’ve shot that down and will lose customers.

    Apropos of all this, the New York Times decided they needed to charge for media.   This makes sense, as they charge for a newspaper.  I have no objection to paying for media (traditional or not) at all.   What did The NYT do?  They put up a paywall.  You now have to pay to get in. Kind of.

    Come March 28, you’ll only be able to read 20 articles per month for free. After that, you’ll need a digital subscription, which costs $3.75 per week for Web and mobile phone access, $5 per week for Web and tablet access and $8.75 per week for access on the web, phones, tablets, TimesReader and the Chrome Web app. Print subscribers get all this stuff at no extra charge. (New York Times Paywall: A Small Change That Seems Big – By Jared Newman, PCWorld Mar 17, 2011 12:00 PM)

    And then there’s a catch.   If you go to their site via Google, Facebook or Twitter, it’s ‘free’.

    Essentially, the New York Times doesn’t want to charge you for its content. It wants to charge you for the delivery mechanism, whether it’s through the Website, the iPhone app, the tablet app or the TimesReader software. That’s the best approach, because content is abundant on the Internet. An elegant tablet app is worth more than the individual stories within.

    The problem for the Times, and the reason a lot of people should shrug off the paywall, is that people don’t necessarily need major media gatekeepers to provide the delivery mechanism. A recent study by Pew Internet found that 75 percent of people who find news online get it through e-mail or social networks. (Here’s a fitting anecdote: a friend alerted me to the Times’ paywall announcement by e-mail.) (New York Times Paywall: A Small Change That Seems Big – By Jared Newman, PCWorld Mar 17, 2011 12:00 PM)

    Not to mention that you can overcome the technical aspects of their wall pretty easily.  The Times built their paywall with some simple Javascript, which can be tweaked pretty fast.  And it cost them $40 million. (New Media Barbarians Breach New York Times Paywall in Hours – By Erik Sherman | March 22, 2011)

    What the NYT is learning pretty quickly is that hiding your content, or putting up barriers, isn’t effective.  Nor will their plan succeed because of the same reason we still have people out there who can crack your DVD and DVD players.  If you build it, we can unbuild it and share it and there you go.  If you don’t want people to read your articles for free, you stop putting them online.   That’s it.  If you email me an article, I can email it to my friends (copy & paste) after all.

    I hate to say it, but once it’s out there, it’s done.  Hiding it doesn’t help, because you either chase people off who can’t find it (and who would have paid) or you make people smarter who steal it (and they wouldn’t have paid anyway).   We need a culture shift to make this all ‘profitable’ as well as something the end users are willing to pay for.  We’re not there yet.  I’m not entirely sure we’re all that different from where we were when I was in high school, though.  The only major change is how easy it is to find the information to steal.  That’s why I think the problem isn’t the technology, but the mentality.

    It’s hard to see a future where we can run a business like a newspaper, make it profitable, and convince people to pay for it.  Yes, part of this is our own fault for providing information for free all this time, but the other part is the abject denial that this was an issue until it was too late.  I certainly cannot advocate cable TV’s method of charging, but perhaps a restructuring of sites, so you can only see domain foo.com if your ISP pays a fee to foo.com (and you can access it by telling your ISP you want to ‘see’ foo.com as well).  Lifting the model wholesale strikes me as a terrible idea, but the idea of charging the provider, not the end user makes some sense.

    After all, basic network TV is ‘free’ in the US, isn’t it? Well, that’s another post.  The point here is that the methods we’re using now to stop ‘theft’ isn’t working.  So how do you protect your intellectual property and attract readers?

  • Limitations on Sortable Columns

    Limitations on Sortable Columns

    As of WordPress 3.1 you can add new columns to admin pages and sort them. You could always add them, but being able to sort the columns is new! This was very much welcome from pretty much everyone who makes extra columns, for whatever reason. As someone who likes them (they make sorting so much easier), I pinged out there to ask how it was done?

    My Twitter friends bailed me on and found these very helpful links:

    These both told you how to add columns to the posts table. But what I wanted to do was add a column to the USERS table. When you see what I wanted to add, I’m sure it’ll be obvious:

    1. Show the DATE someone registered in the users menu
    2. Allow the column to be sortable

    Adding in the date was really painless, but I could not, for the life of me, get it to sort! Small problem. Again, I appealed to Twitter, and bless his little black heart, Otto bailed me out and explained to me why that one could work, but others would not.

    What I learned

    Unless your data in the header is stored in the database, you cannot sort by it. End of story. No further discussion needs to happen. This makes sense, as you cannot sort by dynamically generated content.

    Unlike the wp_posts table, you cannot sort by user generated headers in the User Table because of how it draws from the database. Your plugin would have to write to the wp_users table (making new columns instead of using wp_usermeta which is prefered) and even then, the sortables are hard coded. That one broke my head a little. The sortables in the posts table are pluggable (that is, you can make your own). This just isn’t the same becuase WordPress knows what columns are in wp_users. After all, we add stuff to wp_usermeta. This created a circle of ‘Auuuuugh!’ for me.

    Thankfully, Otto pointed out that since ‘registered’ is listed for MultiSite, you can leverage that on SingleSite and MultiSite (and also explained why my tweaks worked on MultiSite!).

    In the end, I was able to take all that and tweak my “Recently Registered” plugin for WordPress 3.1, and made it look hella cool. If you’re using WordPress MultiSite, you don’t need this at all.

  • Show Site ID Columns in MultiSite

    Show Site ID Columns in MultiSite

    This is totally a use at your own risk sort of thing. I really have no plans to do anything with it, and no, you cannot sort by ID due to limitations in how the columns work in WordPress.

    Installation Instructions
    Make a file called show-site-ids.php, put this code in it, and upload to your mu-plugins folder.

    <?php
    /*
    Plugin Name: Show Site ID
    Plugin URI: https://halfelf.org/hacks/site-id-columns-multisite/
    Description: Show Site ID in Sites Column for Multisite
    Version: 1.0
    Author: Mika 'Ipstenu' Epstein
    Author URI: http://ipstenu.org/
    
            This plugin is free software; you can redistribute it and/or modify
            it under the terms of the GNU General Public License as published by
            the Free Software Foundation; either version 2 of the License, or
            (at your option) any later version.
    
            This plugin is distributed in the hope that it will be useful,
            but WITHOUT ANY WARRANTY; without even the implied warranty of
            MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
            GNU General Public License for more details.
    
    */
    
    function siteid_columns($column, $blog_id) {
            global $wpdb;
            if ( $column == 'site_id' ) {
                    echo $blog_id;
            }
            return $value;
    }
    
    // Add in a column header
    function site_id($columns) {
        $columns&#91;'site_id'&#93; = __('ID', 'site_id');
        return $columns;
    }
    
            add_filter( 'wpmu_blogs_columns', 'site_id' );
            add_action('manage_sites_custom_column',  'siteid_columns', 10, 3);
            add_action('manage_blogs_custom_column',  'siteid_columns', 10, 3);
    ?>
    
  • WordPress MultiSite – New Dashboards

    WordPress MultiSite – New Dashboards

    Back in the WordPress MU and the recent WordPress Multisite 3.0.x days, we had something called a ‘Dashboard Blog.’ This was the ‘main’ site of your install, and ostensibly was the default blog to sign users up to and control them from. This was also where you, the admin, had the Super Admin menu. So what were those things for and why were they moved? After all, a lot of people will tell you they worked just fine.

    The simplest answer is that it’s considered good design to separate the ‘user’ interface from the ‘admin’ interface. That’s why, when a regular user with the lowest role possible logs in to a regular (non-MultiSite) WordPress install, they see a very limited site. They see a dashboard, their profile, and that’s it. You want to keep the subscribers out of your meat and potatoes. Pursuant to that, there are plugins like WP Hide Dashboard that kick users to just their profile. I love that plugin, because it hides the man behind the curtain. If the Dashboard of WordPress is not a part of your desired experience (and really, it only is for the people who run the site), then you keep Dorothy, Toto, the Scarecrow, the Tin Man and the Cowardly Lion out, Ruby Slippers or not.

    When WordPress 3.0 came out, it was a bit of a chimera. We’ve got all sorts of weird parts where we call things blogs instead of sites, and from the back end, it’s really confusing. The sad thing is we cannot declare fiat, fix it all, and move on, because that would break backwards compatibility. Did you know WordPress is backwards compatible, nearly all the way to the start of WordPress 1? (17 Reasons WordPress is a Better CMS than Drupal – Mike Schinkel, Dec 1st, 2010) In order to be able to upgrade from WordPress MU (which was a fork – i.e. a totally separate version – of WordPress), the fold-in of MU to regular WordPress was a lot of work and duplication. There are some things I’m sure the devs would have chosen to do differently in a perfect world, but they decided the headache for them was worth it because it was beneficial to the users. For that alone, I laud them and owe them beers and coffee.

    One of the many drawbacks of that mentality is the users are very much used to getting what they ‘want.’ The users think ‘This worked before, it will always work, therefore, it’s cool to do it now.’ Take (not for random example) the issue with the /blog/ folder in the main site of any subfolder install. (Switching to WordPress MultiSite Breaks Links – Mika Epstein, 14 July, 2010) Back in the 3.0 days, we had a work-around to fix this, but that was a ‘bug.’ We were all taking advantage of a flaw in the system, and that flaw was plugged (mostly) in 3.1. Of course, fixing the flaw meant breaking things, and those people who were not up to speed on the dev channels (which in this instance included me) went ‘Hey, what the hell!?’ We were angry, we were upset, and then Ron told me that it was bug and I stepped down.

    A lot of people are still annoyed by this, and while there is still a buggy workaround, it’s not something I would generally suggest be used for my clients (myself, yes). Then again, the original tweak wasn’t something I considered using for clients, since I was always aware that WordPress’s stated intent was to make that /blog/ slug customizable. And I hope they do.

    What does this have to do with the new dashboards? It’s another change WordPress implemented to ‘fix’ things people didn’t see as broken. The people are wrong.

    Now don’t get all het up, thinking I’m drinking the WordPress Kool-Aid. There’s a vast difference between being ‘WordPress is always right, WordPress can do no wrong’ and the acceptance that what WordPress did was for a good, understandable, reason. In software development, I’ve learned to distance myself from the all too personal feelings of investment in my product. Many times, the product needs to be designed in a certain way to work better for the majority of people, and many times, I am not that person. Look at JetPack. This is a fantastic plugin for people moving off WordPress.com and onto self-hosted WordPress. It has absolutely no meaning to me, and I won’t be using it. But it’s great for the target audience. I accept that I am not that audience, and I look at the product with as unbiased an eye as is possible.

    I have to look at the Network Admin and User Dashboard the same way.

    The Network Admin was moved from a Super-Admin sidebar menu to it’s own section, in order to provide a clearer delineation between Site Admin (in charge of one site) and the Network Admin (in charge of all sites). (Network Admin – Trac Ticket) (Network Admin – WordPress MustUse Tutorials, October 21, 2010) This is a basic, normal, every-day bit of separation in my everyday life. For one app I use, I even have a totally separate ‘Admin App’ to use when I want to control the whole network, versus just one part of it. It’s done for security, but also to kick our brains over and go ‘Hey, moron, you’re in the Network admin section!’ Our brains need that kick, and it lessens the human errors. In doing this, we also found the plugin management was separate. Per-site admins saw the non network-activated plugins only. The Network Admin had to go to the Network Admin section to see the network-activated plugins and the must-use plugins, though many plugins needed to be recoded to handle this move. (Adding a menu to the new network admin – WordPress Must Use Tutorials, November 30, 2010) While this is annoying and takes a little time to get used to, this is good, sound UI/UX. It’s called “Separate of Duties” in the buzzwords game, and it’s really a blessing.

    Once they moved the Network Admin, the devs took a shot at getting rid of the Dashboard Blog. (Personal Dashboard – trac ticket) Once you moved the super users off to their own network, there’s no need to sign-up users to a main blog. I assume this was originally done becuase you had to hook them in somewhere with 3.0, to make them be a ‘user.’ Well, now WordPress.org Multisite now behaves like WordPress.com. You sign up for a blog but unless you get assigned a role to the blog, you’re not a ‘member’ of the blog. And you know… that’s sensible. You have no real role as a psudeo-subscriber. Nor do you need on.

    As I pointed out, part of the goal with moving the menus to Network Admin is that the whole ‘Dashboard Blog’ concept was a massive annoyance to everyone code-wise and UI wise. Having to say “Oh yeah, the main site is the master site and it’s where I control the universe” is logistically unsound. Much like you cannot in-line edit posts, you should not be mixing up Admin and User areas. So to further that separation, your users are not assigned to any site when they register. I find I need to repeat, a lot, that in most cases, this has no effect on usability. It doesn’t affect my BuddyPress site at all, because the users are the users. They just don’t have blog access. They can comment, which is all they need to do for me, and they’re happy. If they need to make posts, I can add them if I want to. But now I have security, knowing they can’t accidentally get in and poke around.

    Like it or not, it’s not going away. And most of us won’t need it to come back. I do know that some people do need it, and are struggling to find a way to auto-assign users a role on their main site at ID creation, so if you know of a fix for 3.1, please share it!

  • Do Pretty URLs Still Matter?

    Do Pretty URLs Still Matter?

    With all the kerfluffle about the new Lifehacker (and the 2.0.1 redesign) one thing stood out for me as the more interesting fallout: a rush on articles about the dreaded Hashbang.

    Unix heads can skip this paragraph. For the rest of the world, a hashbang (aka a shebang) is the character sequence consisting of the characters number sign and exclamation point #!. You see this a lot at the beginning of scripts, and are used to define which ‘interpretor’ you’re using. Basically you’re telling Unix which program to use to parse the rest of the script. It’s more complicated than that, but this is not the place for goobley tech speak.

    A hashbang used in a web URL is also an interpretor, but of a different sort. The hashbang is a fake tool used to get URLs generated by javascripts and AJAX picked up by Google. Great, what does that mean? The # tells the browser that the rest of the URL is an ‘identifier’ and everything after that # is computer code for a part of the current page. So in the case of Lifehacker, URLs went from http://lifehacker.com/5753509/hello-world-this-is-the-new-lifehacker to http://lifehacker.com/#!5753509/hello-world-this-is-the-new-lifehacker.

    Most people missed the change and concentrated on the new layout. The #! in there manages to parse the right page, so why should we worry? After all, AJAX is the ‘future’ of the web. AJAX, by the way, is not the dim-witted but incredibly strong Greek hero. AJAX is shorthand for Asynchronous JavaScript and XML, but has very little to do with XML. Instead it’s used to create interactive web-apps. It’s exceptionally powerful, but a little dim-witted. Oh, okay, maybe my Illiad joke wasn’t so far off, after all!

    AJAX has a whole lot of problems, most important (to me) is the fact that it’s not easily read by screen-readers, which means your blind visitors get the shaft a lot from poorly written AJAX sites. You also can’t use the back button most of the time, nor are these sites easily read by web crawler bots, which means that you don’t end up on Google results. Your page isn’t ‘real’, therefore it’s not followed. Of course, the folks at Google recognized this problem and came up with a weird solution: The Hashbang! That’s right, Google introduced it as a way to get sites dependant on AJAX into their results. (A proposal for making AJAX crawlable – Google Webmaster Central, Wednesday, October 07, 2009 at 10:51 AM)

    Sounds great, though rather complicated and a little kludgey. Now you can have the best of both worlds. There are still drawbacks, of course, and I hear this refrain over and over:

    The main problem is that LifeHacker URLs now don’t map to actual content. (Breaking the Web with hash-bangs – Mike Davies (Isolani.co.uk) Tuesday, February 08, 2011)

    This initially perplexed me. After all, WordPress (which runs this site) is really a single page site (index.php) with a million identifiers, and it’s not like there’s an actual, physical, page on my server called do-pretty-urls-still-matter. What’s the difference? Turns out it’s all in how the URL is generated. The hashbang method is prone to error, as Lifehacker found out already, all you have to do is kill javascript and your entire site goes down. That I knew, but as I sat to write this I got myself in a bind. After all, in order for my pretty URLs here to work, I need .htaccess (which means my httpd.conf stuff is all in a row), PHP and SQL. Of course, all that is server side, and the user only needs the most basic of browsers to get to my site. But how were those failure points less than a javascript induced set?

    I appealed to Twitter and got a couple fast responses from @JohnPBloch:

    JohnPBloch: The difference is that one (js) depends on millions of different configurations working correctly; the other depends only on one […] for example, I’ve had chrome extensions that crashed JS on non-jQuery pages before.(JohnPBloch) (JohnPBloch)

    Thanks to John, my brain kicked back into gear. There is no browser dependency on my URLs. The only way for them to fail is if I screw up the backend (that could be my site setup or my server). Either way, the critical failure will always be me, and never my user. That’s as it should be. We’re so often told that our customers are always right. As a corollary, your customers should never be at fault for a problem (as long as they’re using the products as intended). There are good reasons to use AJAX. Like using AJAX and javascript allows for a responsive and truly interactive experience for you users, in situations where small parts of the interface are changing (think Gmail, Flickr and Google Maps). Gmail actually would be perfect. That’s something where I neither want nor need a ‘pretty’ URL past http://gmail.com because I’ll never share the URL.

    The rest of the time, though, URLs are important. If I want someone to visit my site, I tell them ‘Go to https://halfelf.org’ and they know what to do. Back in the Trojan War days, you’d hear “Login to AOL, keyword ‘Ipstenu’.” instead of a URL, and we’ve moved away from that to allowing people to claim their own domains and their own presence online. We shouldn’t be reliant on AOL, Google, or anything else to generate our representation. So, regardless of the problems with the hashbang (and there are a lot), your URL is important.

    We all know the difference between a ‘pretty’ URL and an ugly one on sight. https://halfelf.org/?p=1358 is ugly and https://halfelf.org/2011/do-pretty-urls-still-matter is pretty! Simple. Lifehacker (and Twitter for that matter) have always had ‘kind of’ pretty URLs, with a weird number string in there, but enough to make you go “Okay, this Lifehacker post is about monkeys” or “This tweet is by Ipstenu.” With the change to hashbang, both sites (temporarily) broke the cardinal rule of websites: never change your URLs. By now, both sites will redirect a hashbangless site to a hashbanged one, which is as it should be. You never get rid of your old URLs, which is why on older sites, you have a really freakish .htaccess with miles and miles of regexp. AJAX makes you jump through hoops for pretty URLs.

    But that really begs the question of whether or not pretty URLs actually matter anymore, or this just me being an old stick in the mud? Much like Google’s minor issues with AJAX/Javascript, they have minor issues with Dynamic URLs. Quickie explanation: static URLs are pretty, dynamic aren’t. Basically, they can and do crawl dynamic URLs, but static are preferred. (Dynamic URLs vs. static URLs – Google Webmaster Tools, Monday, September 22, 2008 at 3:20 PM) On the other hand, we know that shorter is better, and you can’t get much shorter than https://halfelf.org/?p=1358 when you get down to it.

    I would posit that, since the web is based on look and feel, the design of your site still relies, in part, on the ease of someone in understanding the URL.

  • Google vs Splogs – Part 2

    Google vs Splogs – Part 2

    Now that you know all about the myth of the duplicate content penalty, we can look into spam.

    This year, Google got slammed because the quality of their search was being degraded by spammers. Mostly splogs, I will admit, but Google rightly points out that their ability to filter out spam and splogs in all languages is actually much better than it was five years ago. (Google search and search engine spam – 1/21/2011 09:00:00 AM) No, Google isn’t getting worse, there are just more spammers out there. They also take the time to differentiate between “pure webspam” and “content farms.”

    “Pure webspam” is what you see in a search result when a website uses meta data or hidden content in order to bully their way into being highly ranked in unrelated searches, or just basically game the system. A decade ago, this was horrific. Now it’s nearly negligible. This type of spam grew pretty organically out of people trying to understand the algorithm behind search engines and manipulate it legally. As we gained greater understanding of meta keywords and in-context content, we came up with more and more tricks to legitimately make our sites more popular. There was a point in time where having hidden text with as many keywords related to your site was not only common place, but lauded. It didn’t last long, as shortly after the good-guys sorted that out, the bad-guys did too.

    “Content farms” are the wave of the future, and Google calls them sites with “shallow or low-quality content.” The definition is vague, and basically means a content farm is a website that trolls the internet, takes good data from other sites, and reproduces it on their own. Most content farms provided automatically inserted data. There is no man behind the scenes manually scanning the internet for related topics and copy/pasting them into their site. Instead, this is all done via software known as content scrapers. The reasons why they do this I’ll get to in a minute, but I think that Google’s statement that they’re going to spend 2011 burning down the content farms is what’s got people worried about duplicate content again.

    A content farm is (partly) defined as a website that exists by duplicating content. Your site’s activity feed/archives/post tags pages are duplicating content for the users. Does that mean your site will be adversely affected because of this?

    No. It will not.

    Google’s algorithm is targeting sites of low content quality. While your stolen post is a beautifully written piece of art on its own, it’s the site as a whole that is used to generating a search ranking. As I’ve been touting for a decade, the trick to getting your site promoted in Google searches is to make a good site. Presuming you made a good site, with good content, and good traffic, and it’s updated regularly, there is very little risk that Google will peg your site as being of “low content quality.” Keep that phrase in mind and remember it well. Your site isn’t highly ranked because of low content, remember! It’s the reverse. If you’re being ranked for good behavior, good content, and good work, you will continue to be rewarded. In a weird way, content farms are actually helping Google refine their search so that it can tell the difference between good sites and bad! (Why The Web Needs Content Farms – by Eric Ward on February 16, 2011)

    The next Google algorithm update will focus on cleaning content farms from positions of unfair advantage in our index. This will likely affect websites with considerable content copied from other online sources. Once this update is complete, preference will be given to the originators of content. We expect this to be in effect in no less than 60 days. (Google search and search engine spam – 1/21/2011 09:00:00 AM)

    What Google is doing is not only laudable, but necessary. They are adapting to the change of how spam is delivered, and doing so in a way that should not impact your site. The only ways I can see this affecting ‘innocent’ sites are those blogs who use RSS feed scrapers to populate their sites. This is why anytime someone asks me how to do that, I either tell them don’t or I don’t answer at all. While I certainly use other news articles to populate my site, I do so my quoting them and crafting my own, individual, posts. In that manner I both express my own creativity and promotion the high quality of my own site. I make my site better. And that is the only way to get your site well-ranked. Yes, it is work, and yes, it is time consuming. Anything worth doing is going to take you time, and the sooner you accept that, the happier you will be.

    For most small to medium sites, there’s not a thing you need to do in order to maintain your ranking. There are no magic bullets or secrets behind the SEO, to manipulate your site to a better ranking. In point of fact, doing so can be seen as gaming the system and can downgrade your results! Once again. Make a good site and you will be rewarded. Certainly, as I said yesterday, optimizing your robots.txt file and getting a good sitemap will help, and I really do suggest a Google Webmaster Tools account to help you with that. In 2011, Google is still king, so once you get your site well listed within Google’s machine, you’re pretty much going to be tops everywhere.

    Why do splogs and content farm game the system in order to get highly ranked? Profit. Some do it to get their domain highly ranked and then sell it for a lot of money, others do it to infect your computer with a virus, and then there’s the rare hero who thinks this will get them money because of the ads on their site. Sadly, this still works enough to generate just enough of a profit to keep the splogs going. This is also true of spam emails. Yes, that means your grandmother and Carla Tallucci are still falling for the Nigerian Princess scam emails. The only way to stop all of that is to stop those methods from being productive money makers for the spammers, and that is something that will take us all a very long time and a great deal of education to the masses.

    Your take aways are pretty simple. Make a good site with good content. Update it regularly. Use a sitemap to teach search engines what’s important. You’ll be fine. Don’t sweat internal duplication.