Author: Ipstenu (Mika Epstein)

You’re not the boss of me

After having my domains on three different servers for a long time, I mathed it out that it’d cost me the same to put ’em all on one VPS (virtual private server). After calling up my ISP (the fanfreakintastic LiquidWeb) they had me all moved over without me having to fuss! Combine two shared accounts into one VPS? Sure, done. I suspect my next bill will look … weird, but that’s okay. I’m sure that even if it’s all messed up, I can call them and get it sorted out.

The first thing I did was make sure everything was running and then I left it alone for a day. Did anyone notice? No? Good, the fix was in!

Then I started fiddling. I didn’t know a lot about VPS, having only mucked about with a RedHat distro before, and LiquidWeb provided me with cPanel and WHM, which I’d never used before. They also had the very familiar shell world for me to jump into. Google being what it is, I quickly found a VPS Optimization Guide that gave me some ideas to start.

What I’ve Done So Far
My memory usage, with one beefy site and two baby sites, was hitting 50% which, in my mind, was bad. Now the beefy site runs off WordPress which is known to have these issues. My CPU was barely passing 0.01 (yes, that’s right) though, so that was good. My first thought was to try WP-Super-Cache again, except last time I did that, CPU went through the roof and stayed there. Also, you lose dynamic feeds etc (unless you use AJAX) and I’ve heard great things about WP-Super-Cache but the fact that it’s not a locked in part of WP has always made me wonder as to it’s viability. If it really was that good, or the only solution, it would be built in. Not to knock it, but I consider it only one option.

While I know I need to optimize WP, my first stab was to optimize the server. Except that I didn’t. I switched from Zend to APC. Now, I’m not really sure if that was the best thing to do. I find a lot of people clamoring that APC is better and since I’d had weird issues with Zend before (outright borking MediaWiki if not configured specially), I decided to give APC a shot. If someone has info on some benchmarks or a good link to why APC is better than other PHP cache tools, I’d like to see them.

Then I removed Clamd (and ClamAV). Yes, I know it’s virus scan software, but I’ve never actually seen it catch anything. What I run on the server, and what my ONE (yes one) resold client will run, aren’t going to get caught by it. We run the same stuff. So call it a calculated risk. I also turned off EntropyChat (never gonna use it), MailMan (resource hog), Analog Stats and Webalizer (leaving AW stats, though personally I use Woorpa and Google for stats). Gave the server a bounce after all that and my memory dropped from the 50-th percentile to the 30s. I consider that a success.

My only issue is that my phpinfo page looks weird… No idea what happened there.

6 April, 2009
Google’s Blog Search is Irrelevant

Google is a great search tool to find a website or general information about a topic, but quite frankly I’ve come to despise their blog search engine and I’m seeing serious flaws in their ranking app. In specific, they now search blog links (aka the blogroll) and when you search blogs about a topic, you get unrelated posts.

If you search for Laurence Fishburne because you saw him on an episode of MAS*H recently as a soldier with a racist CO, Google gives you two hits for IMDb, one for Wikipedia, one about news (GoogleNews that is), and then, finally, his official website. While Google claims they don’t adjust ratings (that is, they don’t give more or less weight to a website on their own) and allow their PageRank algorithm to sort all this out, it seems to me that any official website should be ranked first. Also, IMDb shouldn’t be listed twice. But that depends on what people are looking for and what Google offers.

We stand alone in our focus on developing the “perfect search engine,” defined by co-founder Larry Page as something that, “understands exactly what you mean and gives you back exactly what you want.”

With that in mind, as I look at their tech overview for people who aren’t super geeky, I think that they come to the process a little flawed. PageRank is a great idea, don’t get me wrong. The more pages that link to a site, the higher the site is ranked (in essence). Okay, that’s great! Until you have those damn splogs. You know the ones. Spam blogs that promise you information about a person/place/thing, but is nothing more than a ton of links and 100 popups.

Why would I search blogs? Easy, a lot of news sites are using blogs these days, and I want to read those too. It’s not rocket surgery, it’s how news is disseminated in 2009, folks. And I, personally, like to search by ‘date’ because I want to know what’s newest.

Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user’s query.

This looks like it should take care of spam blogs, but if you’ve ever done a search on blogs about someone (let’s use Mr. Fishburne again), you know it’s a crap-shoot.

A news search is actually pretty helpful. I get some articles of interest right up front. If I flip the bit and sort by date it’s still pretty useful. When I go to blog search (which is a sidebar link off news), it’s still mostly beneficial.

But I dare you, I dare you, to make sense of the articles when you click sort by date. Three posts on that first page might actually be something worth reading. Good luck finding them, and I hope they actually are what you want. But at the end of the day, those spam blogs aren’t the problem that makes me hate the blog-search.

No, the problem, as I see it, are the posts like this:

That bit I circled for you means that the ‘label’ (tag, category, whatever) for ‘Laurence Fishburne’ has been used 4 times. Go to that post and you will not find a single thing on the page of use. 99.999% of these blogs are blogspot and, while I don’t begrudge them their posts, they’re getting false promotion! And your post that you lovingly crafted about how totally amazing Fishburne is, and how he acted the hell out of that scene last night is now 10th on the list, and bound for page 2 any second now.

The only official Google respons I can find on the matter is a post by Jeremy Hylton in their google forums (dated November 2008).

We expected some problems from blogroll matches, but may have
underestimated the impact on searches using the link: operator or
where the query matches a blog or blogger’s name. We do expect to fix
the problem you’re seeing. We’ll use the full page content, but
exclude the content that isn’t really part of the post. I’m not sure
if we’ll be able to make the change before the end of the year, but we
are working on it and are pretty confident that it can be solved.
We’ll post an update here when we’ve got a solution.

And no, there is no update to that post.

The hoopla from other blog sites has died down, but as this is still a prevalent problem on the blog search, I would really like to see it heat up again. Google’s blog search is pretty much dead useless to me if I can’t find information I want. As finding what I want is the whole point of Google (they said it first), they’ve made themselves irrelevant.

9 March, 2009
Woop(ra)! There it is!

A couple months ago, I stumbled onto this statistic site called Woopra, and signed up to be a Beta tester. I already use things like Google Analytics and SiteMeter, which let me see how much traffic a site gets a day, based on about twelve different interpolations of the metrics.

Basically, I’ve learned I can tweak my results to make it look like I get a lot of traffic or a little, which serves no purpose. But I can also compare my sites to previous days, which I actually do find helpful. I can learn what days my sites are heavily hit, which days are good days to upgrade code because no one’s there, and so on and so forth. What you have to figure out is why you need stats. Statistics are meaningless for a site like ipstenu.org, because there’s no money to be made here. For jorjafox.net, I find that they help me understand trends and as that site averages about $.75 a day in ads, it’s beneficial.

Google Analytics and SiteMeter are both ‘yesterday’ code, however. I don’t get to see the current status of my site until the day after. Most of the time that’s okay. If I really am desperate for pageviews and such, I have other tools on my server to figure that out (and Google Adsense can be brute forced into helping). But sometimes you want to watch what people are doing as they’re doing it, in real time.

Enter Woopra.

With Woopra, I can sit and watch people ping the heck out of my sites and see what they do as they do it. It’s a little Big Brother, but honestly, if you didn’t know that someone can tell who you are when you visit their website, it’s too late for you. Woopra lets me watch as people from different countries sneak in and out, where they come from and where they go to when they leave. Like I find that the majority of my hits come from the Gallery (200 pageviews an hour, give or take), and most of the referrers are from the main site or the wiki. This is all stuff I knew, but it’s nice to see them in live tracking.

Do you need this stuff to run a good website? No, not at all. But if you’re starting to move your site from ‘good’ to ‘moneymaking’, then these are things you have to start to study and understand. Like that it’s okay to have an 11% drop in pageviews at noon, because the average at the end of the day will balance out. Or that you get a lot of traffic at 3pm from youtube. All these things help you better understand the Venn diagram that is your website, and the more you know …

Well there you are, then, aren’t you?

2 February, 2009
Caveman Tech Support Redux

Many moons ago I mentioned how Caveman Tech Support was no different than the stuff I do. This is still the case. I will now caveman up my recent phone call.

Grog: This fire help. Me Grog
Lorto: Me Lorto. Help. Firekit wrong.
Grog: You receive Firekit 2.0?
Lorto: Ugh.
Grog: Box include sticks, stone, flint?
Lorto: Ugh.
Grog: No broke sticks, stone flint?
Lorto: Ugh. All here. All good. Firekit wrong.
Grog: (sigh) If all there, what wrong?
Lorto: No spark, no fire, me confused. Not like directions.
Grog: *sigh* You missing items?
Lorto: I missing nothing.
Grog: You sure?
Lorto: Me sure. No missing.
Grog: You sure?
Lorto: Me have one thing missing. Box say ‘include flint’ but Lorto no have flint. This not make Lorto no have fire, ugh?

21 January, 2009
Cross Compatible

One of the things about the net that I love and hate is the development of freedom of expression. Don’t get me wrong, I’m a socialist at heart, and I love the fact that people can say what they want, how the want, in the USA. Well, mostly. Illegality being what it is.

But I digress!

The Browser Wars ended with a weird stalemate, and it wasn’t by choice of the users. The World Wide Web Consortium (W3C) and their modifications of what was and was not okay in the Web (yes, there are guidelines to web design) managed to reward early adopters for things like HTML 2 and XHTML. They, without ever enforcing rules, and without ever making a browser of their own, managed to finagle enough ‘power’ such that website developers wanted to proudly display their compatibility. No longer did we create sites like ‘Best viewed on IE’, but we aimed for these standards, and coerced our websites to look ‘Okay’ on IE and Firefox, Windows XP and Mac OS X.

But unless you have three computers with multiple boot sectors and multiple browser versions, either physically or virtually, how do you know what your site looks like?

My personal website I know is ‘okay’ on most browsers. It looks perfect, just as I want, on OS X in Safari and Firefox. It looks good on Windows in Firefox. And then there’s IE. I hate it. I hate it. It’s not safe, it ignores the W3C, and it just doesn’t do what I think it should. Browse Happy is a site dedicated to reminding people about the alternatives. Like Firefox, yes, which is my Windows XP browser of choice. But I can’t just ignore IE, even if I hate it. Oh, I ignore IE 6 and older, but 7 and 8 I need to pay attention to. So what do I do?

I hit up sites like BrowserShots, where they will go and snag a screenshot of what my site looks like in a freakishly vast array of browser/OS combinations. It’s not perfect, sometimes it hits weird errors where things I know look fine suddenly don’t. But if you want a quick shot to see what CSS stupidity certain browsers entertain, well, it’s good and free.

13 November, 2008
IMAP

When I first got email, it was long before Hotmail was a reality, let alone this concept of unlimited Gmail storage. Email was tightly controlled and maintained, with ‘free’ accounts being unheard of. If you didn’t get an email with your college, you didn’t get email. High School students, like I was in the baby-internet days, didn’t need email. This worked out to a lot of advantages, keeping kids off the nasty places in the ‘net simply by virtue of requiring an email address to log in.

Initially, we all used PINE for email and logged in via telnet into the server to access everything. ‘finger’ was a useful command, and you knew everything you needed to that way. I liked having all my email in one central location, since you always knew where to go to get it. But there were down-sides to this, of course. If the server was down, you had no way of reading your emails!

Shortly thereafter we got access to Eudora and POP3 email. Post Office Protocol version 3 (POP3) let you download your emails to your computer, putting the onus on you for maintaining and deleting your mail. It was deleted from the server once you downloaded it, and it was only on your computer. Initially, I was able to stash everything on a single floppy, or two depending on how I felt about things, and Eudora was just as useful as I wanted it to be.

For over a decade, I used POP3 and I was happy with it.

This last year, I’ve been using webmail on my laptop, while leaving my desktop running some lengthy process, and then later downloading everything to the desktop. This shortly became inconvenient, and while I could copy everything between the computers, I didn’t really enjoy it, and it was becoming a pain in my ass to run the sync. Not to mention webmail when you have one account is fine, but when you have 5 or 10 (long story) you want to shoot someone.

I’ve always known what IMAP did, but for whatever reason it never appealed to me. Basically, where POP3 downloads the email to your computer, IMAP (Internet Message Access Protocol) leaves it on the server and allows you to synchronize every time you access it.

E-mail clients using IMAP generally leave messages on the server until the user explicitly deletes them. This alets multiple clients to access the same mailbox. Most e-mail clients support either POP3 or IMAP to retrieve messages; however, fewer Internet Service Providers (ISPs) support IMAP. Basically POP3 offers access to a mail drop. IMAP4 offers access to the mail store.

Now there are downsides to IMAP. Once you delete that email and purge it, it’s gone. Forever. Quando is gone forever, sire (Only my father will get that reference *sigh*). And if your ISP takes a walk, you lose all your email forever. Except I really don’t keep a whole lot in my email any more, when I get around to it. I store bills and stuff for a while, but that’s really it. Everything else gets saved offline to a folder or deleted. If it’s something I need access to from multiple places, maybe I’ll keep it in the email for a while. But usually not.

So for now, it’s IMAP for me.

30 November, 2007

Author: Ipstenu (Mika Epstein)

You’re not the boss of me

Google’s Blog Search is Irrelevant

Woop(ra)! There it is!

Caveman Tech Support Redux

Cross Compatible

IMAP