Half-Elf on Tech

Thoughts From a Professional Lesbian

Category: How It Works

  • Google vs Splogs – Part 1

    Google vs Splogs – Part 1

    I am not an SEO expert. In fact, there are only a handful of people whom I feel can claim that title without making me roll my eyes so hard I get a migraine. Anyone who tells you they have all the answers to get your site listed well in Google is a liar, because there’s only one good answer: Make a good site. That’s really it. How then do all those spam sites get listed in Google, Bing and Yahoo to begin with, and will the techniques the search engines are using to eradicate those sites hurt you?

    Everyone’s done that search for something and been nailed by a bunch of splogs. Like you look for ‘Laurence Fishburne’ news and you get 100 websites that claim to have news, but really it’s just total, useless crap? Those are splogs, a neologism for spam blogs, and they suck. Splogs are blogs where the articles are fake, and are only created for search engine spamming. They sometimes steal your hard work and use them, by scraping RSS feeds or who knows what else, and generating fake content. Why? Some people do it to infect your PC with a virus, and others do it to trick in into clicking on their ads.

    The problem is spam blogs are so prevalent that they’re adversely affecting search engines, making it harder and harder for you to find real, factual content. This year, rurmors started flying that Google was going to go on the warpath against Search Engine Spam, and in doing so, would downgrade perfectly valid sites with ‘duplicate content.’ Having read and re-read the articles posted by Google on the matter, I’m quite certain that, yet again, people are playing Chicken Little. Nowhere, in any of the articles I’ve read, has there been any discussion of the intent to penalize legitimate, valid, websites for containing internally duplicated content.

    In order to understand the duplicate content penalty problem, and yes, it is a problem, you need to understand how most content management systems (CMS – this includes sites like Drupal, Joomla and WordPress) display their data to the users.

    You write a blog post and the content is stored in the database, along with any tags, catgories, or meta data you put in. When someone goes directly to the blog post, they see. However, they can also see the post if they go to a list of posts in that category, with that tag, on that date, in that year, etc etc and so on and so forth. So the question a lot of new webfolks ask is “Is that duplicate content?” No. It’s not. Nor is having ipstenu.org and ipstenu.org point to the same page. In fact, that’s good for your site. The more, valid, ways you have of providing your user with information, the easier it is for them to find what they want, and they happier they are. Happy users means repeat users, which means profit (in that oh so nebulous “web = profit” theory).

    So what is this mysterious duplicate content penalty?

    Let’s take this from the horse’s mouth (or at least Google):

    Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.(Demystifying the “duplicate content penalty” – Friday, September 12, 2008 at 8:30 AM)

    Google goes on to outright state that so long as the intention is well meaning (like making it easier for people to find your valid and useful content), then you will receive no adverse effects in searches for your blog. That means 99.999% of you out there can relax and walk away. What about people who use things like WordPress MU Sitewide Tags Pages (which takes the excerpts of all posts on a WordPress MultiSite installation and duplicates them onto another site), or BuddyPress’s activity stream (which records everything in multiple places)? Again, the answer is the same. You’re doing this to make the site more available and accessible, ergo no harm ergo no foul.

    Google also makes the claim that since CMSs generally don’t handle duplicate content ‘well’ (their word, not mine), non-malicious duplication is common and fairly harmless, though it will affect search results. Here’s where things get sticky. Personally, I disagree with Google’s claim that CMSs handle duplicate content poorly. A well written CMS, knowing that no two people think the same way, takes that into consideration when crafting a site. You want an index, but if you know someone looks for things by subject matter or year, you need to have a way to provide that information for the reader. Google’s problem is that in doing so, you have also provided it for the GoogleBots who patrol your site and pull in the data for searches, which makes the dreaded duplicate content.

    Perhaps Google has forgotten (or not made the connection) that they do the exact same thing. They want to show you what you want to see, and while I may search for “Laurence Fishburne actor” and you might look for “Morpheus Actor”, in the end, we both want to see sites about this actor guy named Laurence Fishburne. How do you make sure we get the right information? You have the content sortable in myriad manners. Does that make it duplicate content? Of course not (unless you’re Bing, which is a whole different subject). Google points out:

    Most search engines strive for a certain level of variety; they want to show you ten different results on a search results page, not ten different URLs that all have the same content. To this end, Google tries to filter out duplicate documents so that users experience less redundancy. (Demystifying the “duplicate content penalty” – Friday, September 12, 2008 at 8:30 AM)

    Thankfully, you can eliminate redundancy by providing Google with a sitemap of your website.(About Sitemaps – Google Webmaster Central) With a good sitemap, you can tell search engines how to weigh your site’s content. Which pages are more important, which can be ignored, etc etc. With WordPress and a good plugin, this can be done automatically by making a few choices in an admin interface. You also want to spend a little time understanding your robots.txt file. Perishable Press has a great article on optimizing it for WordPress.

    Now that you know about the myth behind the duplicate content penalty, tomorrow we can get into content farms!

  • Has your site been exploited or victimized?

    Has your site been exploited or victimized?

    Nothing frosts my lizard more than someone saying ‘WordPress has been hacked!’ and I’ve finally decided it’s just a case of ignorance.

    I’ve been using WordPress since around the famous 2004 MovableType bait’n’switch, when they decided to go paywall. That was not what made me switch to WP. In fact, I had already installed a b2 site in order to allow anyone in my family post a story about my grandmother (and I should really put that back online one day…). It was a lot of little things that made me switch, and I don’t really regret my choice. MT is very useful, very cool and very impressive, but it wasn’t what I wanted or needed.

    Yesterday, Byrne Reese posted about how WordPress Won the Blog War. He’s a former Movable Type project manager, so I presume he knows what’s he’s talking about. As a former member of the MT community (under a non-Ipstenu handle) and current heavy user of WordPress, it’s very entertaining to hear a behind-the-scenes view of the ‘war.’ I never saw it as a war, and as anyone who knows me can attest to, I’ve never been a rabid fanboi for one OS or another, one product or another, simply because of who makes it. I like my iPad, but if it doesn’t work for you, I’m more than happy to help you find an alternative. I believe in finding the product that works for you.

    What really caught my attention in the post were the comments. The very first had this gem by Matt Haughey:

    Now that they’ve won the battle, I think the biggest problem for WP now is two-fold: One is the constant threat of exploits with your own WP install. It’s crazy and like running Windows 95 without patches. Everyone I know with a self-hosted WP has been exploited in the last year or two and worries about it regularly.

    Two facts:
    1) My WordPress install has never been hacked in the 7 years I’ve had it.
    2) I do worry about it constantly.

    About a year ago, my server was hacked. Ironically it came three days after I’d posted about WordPress security. How was I hacked? Because I followed insecure practices. I’ve touted, for a while now, that security is a tripod:

    • The Web Host is responsible for making sure the sever itself is up to date with the latest patches etc, and that the server is configured in a safe way.
    • Web-apps are responsible for not unleashing needless insecurities to the system.
    • The end-user we pray to the flying spaghetti monster that they’ve not done something to violate security out of ignorance.

    I was hacked because I violated security, which made my server open to attack, which thankfully resulted in my Web Host bailing me out (have I mentioned I love them?). I went to a website on an non-virus-protected PC (yes, Windows), I got what I thought looked suspicious pop-up in IE from a site I knew and trusted, and while the pop-up was there, I opened an FTP (not secure FTP!) connection to my server. I seriously could not have been stupider. Thankfully it was easy to fix, and I since turned off FTP (it’s SFTP or nothing). Actually I also wiped Windows XP off my computer, but previously it was required for my work.

    On Byrne’s post, Mark Jaquith (a WP developer) remarked this:

    I haven’t seen an up-to-date WordPress install get directly exploited in around five years. Seriously.

    I thought about this for a moment, and had to nod. This is true for me as well. Every WordPress install I’ve seen with problems has been due to the web-host or the end-user being insecure. Even when that end-user is me, I’ve yet to have WordPress itself hacked. This does not mean I think WordPress can’t be hacked, just that it’s actually a pretty secure little tool by itself.

    Then Mark went on to say this:

    All of the large scale instances of WordPress being compromised lately were because of web hosts who don’t prevent users on one account from accessing files on another account. In these cases, WordPress wasn’t exploited so much as it was victimized due to a lower level security issue on the server.

    He was far more succinct then I’ve been able to be on the matter, but I’ve touted for a long time that the problem is WordPress, but it’s not WordPress’s fault. Ask anyone in IT why Windows has more viruses than a Mac, and most of us will tell you it’s because Windows is more popular. More people use it, so more hackers/spammers/crackers target it. I wouldn’t say, in 2011, that Windows 7 is more vulnerable than OS X, but I would feel comfortable saying that it is targeted more.

    The answer is the same when I’m asked why WordPress gets so much spam. Because it’s used a lot! The more prevalent your product is (i.e. the more successful it is), the higher the likelihood is that some jerk with a kiddie script will try to attack it. This is just a fact of life, and I’m not going to get into how to solve it.

    What I feel we need to be aware of is the education of the user base for any product. My father once gave a memorable lecture I caught when I was about six or seven, about our expectations with computers and why AI was never going to be like we saw on Star Trek. “Ignore the man behind the curtain!” he said to the crowd. Back then, I had no idea what he meant. Today I realize that it was two-fold. On the one hand, we think ‘Automate everything! Make it all just work!’ That’s the magic box theory of computers. It all just works and we don’t have to do anything. The reality is that there is always a man behind the curtain, making the magic happen.

    The ‘two-fold’ meaning is that (1) we want everything to work perfectly without manual intervention, and that’s just not possible and (2) we don’t want to have to learn WHY it all works, just make it magically work.

    My savvy readers are, at this point, thinking “But if I don’t know why it works, how can I fix it?” To them I shrug and agree that you cannot be expected to fix anything you do not understand. Furthermore, the less you understand something, the more likely you are to inaccurately blame someone/something. Which brings us back to why I hate when people say ‘WordPress has been hacked!’ Actually, I hate it when they say anything has been hacked (Drupal, Joomla, WordPress, MovableType, etc etc etc).

    We have a few choices at this point. We can stop ignoring the man behind the curtain and learn how the levers work ourselves, or we can accept that we’re not clever enough and hire someone. Either way, we should always take the time to sort out what’s wrong. When my cat was, recently, in the kitty ER for bladder stones (she’s fine now), racking up a $1000+ bill for services, I wanted to know all about what caused them, why did the food work, etc etc. I’m not a vet. I would never make it through medical school (I don’t like blood). But I know how to use my brain. As my professor, Dr. Lauer, told me in high school, “We’re teaching you how to think, how to talk to adults while you’re a child, so you know how to be a person.”

    Teach people how to think. You’d never take your Mercedes Benz to Jiffy Lube for an overhaul, so why are you trusting a $5/month webhost without a phone number to support your business? You wouldn’t take your child to a back-alley doctor, so why are you hiring some guy with blink-tags on his site to fix your website? Use your brain. If your webhost tells you ‘Sorry, we can’t help you,’ then take your money someplace else. Website support should always include them taking backups at least every day (you may only get yesterday’s backups, but they should still have ’em). A good host will help you when you ask specific questions.

    My host (there’s a link on the top right) will answer the phone 24/7, they helped me craft a backup strategy, un-do the hack on my server, trace down what was using up so much CPU, bead mod_security into submission … the list goes on and on. My point here is not that you should use them (though if you do, tell them I sent you!), but that you should find a host who supports you to the level you need. The brunt of what you pay for hosting is an insurance policy. You’re paying them to bail you out when (yes, when) you need help, and if you’re only paying $5 a month, then you should only expect that level of return.

    Educate yourself, educate your host, but have realistic expectations.

  • Don’t Bother Disabling Right-Click

    Don’t Bother Disabling Right-Click

    Every now and then I see someone ask ‘How do I disable right-clicking on images on my site?’ My answer, invariably, is ‘You don’t.’ The real question I suppose is ‘How do I stop people from ripping off my work on the net?’ and the answer to that is still ‘You don’t.’

    Is it online? Yes? Then it can, and will, be stolen. Does that matter? Kind of. The only way to make your works un-steal-able is to never publish them anywhere.

    When the last Harry Potter book came one, some diligent nerd spent countless hours photographing every page of the book, uploaded it online, and oh look, we all knew how it ended. That did not stop everyone from buying the book though, and in the end, was pretty much an amusing footnote in the saga of Harry Potter. And anyone who thought Harry wouldn’t defeat Voldemort was clearly not paying attention.

    When I put my dad’s stuff up online, I told him it would be better to convert his PDFs to easily readable HTML. He didn’t want to because they could be stolen. I pointed out that the PDFs are actually easier to rip (download, done), and the HTML was just being generated by copy/pasting from the PDF anyway, so he was already there. The point was to make his work more accessible.

    Does this mean you shouldn’t protect your data? Of course not! But the point is if you publish it, online or offline, it can, and will, be stolen. The only thing online media does is make it ‘easier’ to steal and re-publish under someone else’s name. Without getting into the myriad laws of copyright, I want to point out that your publish work is only part of your brand. If someone steals a post I make here, yes, they’re taking away from my audience, but realistically, they’re not hurting my bottom line. The reason you’re here, reading my posts, is because I’ve reached you. Either you read my social media outlets, my RSS feed, or you know me and follow me directly. But the people who are going to read this, you, are here because of work I’ve already done. The work I continue to do keeps you here, and you become my promoters. The only thing the thieves do is hurt my search engine rankings, and not even that in my experience. A brand is more than just your work. It’s you, your image, your representation. Spending all your time worrying about your SEO ranking means you’re missing the point. Of course a high result on a Google Search is important, but that’s just one piece of the pie.

    Someone is bound to tell me that all of this is nice and dandy, but why, technically, is it a bad idea to try and protect your media/data?

    Disabling right-clicks prevents people from being able to download your media. If I then view the page source, I get the URL of your image, load that into a new browser window, and download your stuff. Or I can drag-and-drop the image to my desktop, if you disable view-source. Those don’t work? Why not check the cache of what my browser automatically downloaded when I visited your page? Or how about a screen shot of what’s active on my screen right now? That’s all stuff I can do without even needing to be code-savvy.

    Google for “download image when right click is disabled” and you’ll get millions of hits. There’s no way to protect your media, once it’s online. The best you can do is to brand it in such a way that even if someone does get a copy, it is permanently marked as ‘yours’. Watermarks are the normal way to do this and probably the only ‘good’ way to do it, as they tend to be the hardest thing to remove.

    Don’t bother disabling right-click, or trying to stop people from downloading/stealing your stuff. Don’t put it online unless you’re willing to have it nicked. Make your brand identifiable with your site, and people will know to come to you.

  • WordPress 3.1 and Network Menu

    WordPress 3.1 and Network Menu

    This one’s really fast, but the word should get out there. It’s extremely important if you’ve built a BuddyPress plugin, because the BuddyPress menu has been moved. If you do not do this, your BuddyPress menus will vanish, and you will have very angry users.

    WordPress 3.1 has moved the admin menu for MultiSite. Instead of having a Super Admin menu available on the sidebar of your Admin section all the time, there’s a new link in the header bar for Network Admin. Thats right, it’s its own page! The problem with this is that a lot of plugins aren’t ready for that and because of the move, their admin menu links vanished.

    Thankfully it’s an easy fix!

    The WP Codex Doc on Admin Menus tells us to use “admin_menu” when adding menus. Well, if you want your plugin to show up on the Network Admin page, you have to use “network_admin_menu” instead. (Trac #14435)

    See? I said it was easy.

    add_action( 'network_admin_menu', 'MENU_FUNCTION', ## );
    

    There’s on catch to this. What if your plugin is for Multisite and NON MultiSite? Honestly I don’t know if this matters, but just to be safe, I would do this:

    if ( is_multisite() ) { 
         add_action( 'network_admin_menu', 'MENU_FUNCTION', ## );
    } else {
         add_action( 'admin_menu', 'MENU_FUNCTION', ## );
    }
    

    Or this:

         add_action( 'network_admin_menu', 'MENU_FUNCTION', ## );
         add_action( 'admin_menu', 'MENU_FUNCTION', ## );
    

    That’s right! If it’s there and not-needed, it does no harm! I’m not 100% certain right now if you need to do this for the non-menu calls people make (like calling a function on admin_menu to check for installation), but I’ve been adding it in to no ill effects. I figure, the BuddyPress core code does it, so it can’t hurt!

  • Common IT Answers

    I actually have this sitting on my desk at work. It’s so old that the fluid has evaporated enough that it doesn’t work right anymore. But I keep it and use it. What is it? A magic 8-ball of tech support! Many moons ago, our CDW Vendor gave my boss a Magic 8-Ball for programmers showing the answer “IT’S NOT A BUG – IT’S A FEATURE”. The top of the ball says “For your most commonly asked IT requests.” Some of the answers are blatant CDW adverts, but the rest are answers I know I’ve used at least once:

    • Did you press the right button?
    • I can’t test everything
    • It worked yesterday
    • It works like I programmed it
    • It works on my machine
    • It’ll be fixed in the next release
    • It’s a Beta – What did you expect?
    • It’s an unlikely coincidence
    • It’s just an isolated incident
    • it’s not a bug, it’s a feature
    • It’s not supposed to do that
    • Please submit a formal request
    • Plug it in
    • Program works. Must be user error
    • Reboot
    • Someone changed my code.

    Sadly, the thing is dying. I may have to learn how to make a new one, since right now it’s stuck showing me a corner instead of a face.

    I’m not the only person who uses these, though. Eric Mack did as of 2004!

  • Blocking IPs – Don’t

    Blocking IPs – Don’t

    Here’s the thing. I don’t think blocking an IP address is a good idea.

    Will it prevent spammers from registering on your site? Yes. But much like CAPTCHA, I think it does more harm than good.

    See, IP addresses are numerical labels assigned to each device (e.g., computer, printer) participating in a computer network that uses the Internet Protocol for communication. Big words. Shorter version: The IP address is the phone number. DNS is Caller ID. When you dial ‘Home’ on your cell-phone, the phone translates that into a number and dials, right? Well on the internet, you say “I want to go to ipstenu.org” and it’s DNS that says “Okay, my big bad directory says that’s 67.227.208.52 so here you go!” (actually it says “You want extension on 67.227.208.52” if I can stretch that metaphor).

    The IP address for websites is pretty static. Just as most of us don’t want to change our phone numbers and teach all our parents new contact info, we don’t want to have to update all the DNS servers in the world with our new IP address. It’s a pain in the ass, it takes up to 72 hours to propogate to everyone (usually less), but that still means there’s a period of time where people can’t get to your website.

    This is all over-simplified, but you get the gist.

    Here’s where it gets weird. In order for the internet to know who YOU are and send you back the website you asked for, it has to assign you an IP address. And this changes. A lot. Most ISPs (the people who give you access to the net for moneies) have a ‘range’ of IP addresses which numbers less than the number of people they have who pay for internet access. In order to make sure everyone can get on, when you connect to the net, you get a new IP. Back in the day of dial-up, every time you dialed in, you had a new IP. This was normal, and was one of the many reasons no one bothered to block by IP. All the idiot had to do was reconnect. These days, my IP changes about once a week or so, and I have no idea when or why, but it does. That doesn’t bother me.

    Lately, I’ve gotten complaints and requests to make Ban Hammer or Register IP MS block people by IP address. And after playing with that a bit, I’ve decided I won’t. Not because I can’t (it’s really not that hard, and actually, Ban Hammer already does it for Single Site WordPress by accident), but because I don’t think it adds any value. The IP address can change too easily to make this a useful tool, and the odds are I’m going to accidentally block someone who should be able to access the site!

    This is not to say I don’t block IP addresses at all, just that I don’t do it the way people seem to want to go about it. Maintaining my own blacklist of IPs is insane, and stupid. I don’t need to waste my time clicking and banning spammers or auto-register bots. Instead, I block IPs using one of two tools that was designed to look for bad behavior. I detailed all this in Spam / Splog Wars. That’s how I stop spammers and it works.

    So no, I will not be wasting my time telling you how to edit my plugins to block people from the IP level. There are perfectly good ways of doing this that work without you having to field complaints from innocent users.

    Block the bots, not the people.