Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: seo

  • You’re not the boss of me

    After having my domains on three different servers for a long time, I mathed it out that it’d cost me the same to put ’em all on one VPS (virtual private server). After calling up my ISP (the fanfreakintastic LiquidWeb) they had me all moved over without me having to fuss! Combine two shared accounts into one VPS? Sure, done. I suspect my next bill will look … weird, but that’s okay. I’m sure that even if it’s all messed up, I can call them and get it sorted out.

    The first thing I did was make sure everything was running and then I left it alone for a day. Did anyone notice? No? Good, the fix was in!

    Then I started fiddling. I didn’t know a lot about VPS, having only mucked about with a RedHat distro before, and LiquidWeb provided me with cPanel and WHM, which I’d never used before. They also had the very familiar shell world for me to jump into. Google being what it is, I quickly found a VPS Optimization Guide that gave me some ideas to start.

    What I’ve Done So Far
    My memory usage, with one beefy site and two baby sites, was hitting 50% which, in my mind, was bad. Now the beefy site runs off WordPress which is known to have these issues. My CPU was barely passing 0.01 (yes, that’s right) though, so that was good. My first thought was to try WP-Super-Cache again, except last time I did that, CPU went through the roof and stayed there. Also, you lose dynamic feeds etc (unless you use AJAX) and I’ve heard great things about WP-Super-Cache but the fact that it’s not a locked in part of WP has always made me wonder as to it’s viability. If it really was that good, or the only solution, it would be built in. Not to knock it, but I consider it only one option.

    While I know I need to optimize WP, my first stab was to optimize the server. Except that I didn’t. I switched from Zend to APC. Now, I’m not really sure if that was the best thing to do. I find a lot of people clamoring that APC is better and since I’d had weird issues with Zend before (outright borking MediaWiki if not configured specially), I decided to give APC a shot. If someone has info on some benchmarks or a good link to why APC is better than other PHP cache tools, I’d like to see them.

    Then I removed Clamd (and ClamAV). Yes, I know it’s virus scan software, but I’ve never actually seen it catch anything. What I run on the server, and what my ONE (yes one) resold client will run, aren’t going to get caught by it. We run the same stuff. So call it a calculated risk. I also turned off EntropyChat (never gonna use it), MailMan (resource hog), Analog Stats and Webalizer (leaving AW stats, though personally I use Woorpa and Google for stats). Gave the server a bounce after all that and my memory dropped from the 50-th percentile to the 30s. I consider that a success.

    My only issue is that my phpinfo page looks weird… No idea what happened there.

  • Google’s Blog Search is Irrelevant

    Google is a great search tool to find a website or general information about a topic, but quite frankly I’ve come to despise their blog search engine and I’m seeing serious flaws in their ranking app. In specific, they now search blog links (aka the blogroll) and when you search blogs about a topic, you get unrelated posts.

    If you search for Laurence Fishburne because you saw him on an episode of MAS*H recently as a soldier with a racist CO, Google gives you two hits for IMDb, one for Wikipedia, one about news (GoogleNews that is), and then, finally, his official website. While Google claims they don’t adjust ratings (that is, they don’t give more or less weight to a website on their own) and allow their PageRank algorithm to sort all this out, it seems to me that any official website should be ranked first. Also, IMDb shouldn’t be listed twice. But that depends on what people are looking for and what Google offers.

    We stand alone in our focus on developing the “perfect search engine,” defined by co-founder Larry Page as something that, “understands exactly what you mean and gives you back exactly what you want.”

    With that in mind, as I look at their tech overview for people who aren’t super geeky, I think that they come to the process a little flawed. PageRank is a great idea, don’t get me wrong. The more pages that link to a site, the higher the site is ranked (in essence). Okay, that’s great! Until you have those damn splogs. You know the ones. Spam blogs that promise you information about a person/place/thing, but is nothing more than a ton of links and 100 popups.

    Why would I search blogs? Easy, a lot of news sites are using blogs these days, and I want to read those too. It’s not rocket surgery, it’s how news is disseminated in 2009, folks. And I, personally, like to search by ‘date’ because I want to know what’s newest.

    Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user’s query.

    This looks like it should take care of spam blogs, but if you’ve ever done a search on blogs about someone (let’s use Mr. Fishburne again), you know it’s a crap-shoot.

    A news search is actually pretty helpful. I get some articles of interest right up front. If I flip the bit and sort by date it’s still pretty useful. When I go to blog search (which is a sidebar link off news), it’s still mostly beneficial.

    But I dare you, I dare you, to make sense of the articles when you click sort by date. Three posts on that first page might actually be something worth reading. Good luck finding them, and I hope they actually are what you want. But at the end of the day, those spam blogs aren’t the problem that makes me hate the blog-search.

    No, the problem, as I see it, are the posts like this:
    splog-1

    That bit I circled for you means that the ‘label’ (tag, category, whatever) for ‘Laurence Fishburne’ has been used 4 times. Go to that post and you will not find a single thing on the page of use. 99.999% of these blogs are blogspot and, while I don’t begrudge them their posts, they’re getting false promotion! And your post that you lovingly crafted about how totally amazing Fishburne is, and how he acted the hell out of that scene last night is now 10th on the list, and bound for page 2 any second now.

    The only official Google respons I can find on the matter is a post by Jeremy Hylton in their google forums (dated November 2008).

    We expected some problems from blogroll matches, but may have
    underestimated the impact on searches using the link: operator or
    where the query matches a blog or blogger’s name. We do expect to fix
    the problem you’re seeing. We’ll use the full page content, but
    exclude the content that isn’t really part of the post. I’m not sure
    if we’ll be able to make the change before the end of the year, but we
    are working on it and are pretty confident that it can be solved.
    We’ll post an update here when we’ve got a solution.

    And no, there is no update to that post.

    The hoopla from other blog sites has died down, but as this is still a prevalent problem on the blog search, I would really like to see it heat up again. Google’s blog search is pretty much dead useless to me if I can’t find information I want. As finding what I want is the whole point of Google (they said it first), they’ve made themselves irrelevant.