Half-Elf on Tech

Thoughts From a Professional Lesbian

Author: Ipstenu (Mika Epstein)

  • Plug It In, Plug It In

    vilcus-plug-it-inI am not a great programmer by any means. I can hack around and muddle my way through with the best of the great net scapegraces. I’m not the genius who invents a brand new way of doing things. That said, I do, eventually, get annoyed with things enough that I force myself to learn how to code.

    Yesterday I was pissed off at WordPress because of it’s user management tools, and no plugins really did what I wanted. See, I have open registration. It lets me sync my blog and forum and let people post. But where it fails is that I can’t set users as ‘banned’ in WordPress. This is a simple thing, I feel. A user role that has no rights and is just banned from commenting. They can read all they want, but no comment. I’ve tried just about every tool out there, but they never work. In addition to that, spammers sign up to my blog.

    Since creating a ‘bozo’ user role is outside my ability, I decided what I wanted was a plugin to prevent people from registering if they were on my blacklist, similar to how I can prevent them from commenting on my comment blacklist. At first I was using TimesToCome Stop Bot Registration, which (among other things) uses StopForumSpam’s list of spammers as a stop-gap.

    The problem with TTC is that if you register with a bad email (jane132@gmail.com instead of jane123@gmail.com) and then try to register with the RIGHT email, it notes that the IP is the same and bans both emails and the IP. Which caused a couple people no end of problems on my site. It had to go.

    From there, I tried No Disposable Email, which checks against a list of known baddies. That was nice, but it was a text file list that you had to update by hand. But it got me thinking.

    I quickly converted it into Ban Hammer, which allowed me to update and edit the text file from a submenu inside my admin session. But that wasn’t enough. Why did I have to have two places to keep my jerk list? If someone was on my WordPress Comment Blacklist, I didn’t want them to comment. That implies they’re just not welcome at all. So why don’t I make Ban Hammer pull from that list. Which I did.

    I still have things I want to do to the code, like put in an option to use StopForumSpam’s list, and a way to edit the error message. But for now, Ban Hammer sits by my other plugins, Recently Registered (lists the last 25 registrations) and my bbPress plugin Spoiler Bar (adds in spoiler ‘code’ to bbPress) on my Google Code site. It’s not for ‘public’ release, but it’s there so my friends who have been helping me test out my ideas can easily download. What? I have nerd friends!

  • You’re not the boss of me

    After having my domains on three different servers for a long time, I mathed it out that it’d cost me the same to put ’em all on one VPS (virtual private server). After calling up my ISP (the fanfreakintastic LiquidWeb) they had me all moved over without me having to fuss! Combine two shared accounts into one VPS? Sure, done. I suspect my next bill will look … weird, but that’s okay. I’m sure that even if it’s all messed up, I can call them and get it sorted out.

    The first thing I did was make sure everything was running and then I left it alone for a day. Did anyone notice? No? Good, the fix was in!

    Then I started fiddling. I didn’t know a lot about VPS, having only mucked about with a RedHat distro before, and LiquidWeb provided me with cPanel and WHM, which I’d never used before. They also had the very familiar shell world for me to jump into. Google being what it is, I quickly found a VPS Optimization Guide that gave me some ideas to start.

    What I’ve Done So Far
    My memory usage, with one beefy site and two baby sites, was hitting 50% which, in my mind, was bad. Now the beefy site runs off WordPress which is known to have these issues. My CPU was barely passing 0.01 (yes, that’s right) though, so that was good. My first thought was to try WP-Super-Cache again, except last time I did that, CPU went through the roof and stayed there. Also, you lose dynamic feeds etc (unless you use AJAX) and I’ve heard great things about WP-Super-Cache but the fact that it’s not a locked in part of WP has always made me wonder as to it’s viability. If it really was that good, or the only solution, it would be built in. Not to knock it, but I consider it only one option.

    While I know I need to optimize WP, my first stab was to optimize the server. Except that I didn’t. I switched from Zend to APC. Now, I’m not really sure if that was the best thing to do. I find a lot of people clamoring that APC is better and since I’d had weird issues with Zend before (outright borking MediaWiki if not configured specially), I decided to give APC a shot. If someone has info on some benchmarks or a good link to why APC is better than other PHP cache tools, I’d like to see them.

    Then I removed Clamd (and ClamAV). Yes, I know it’s virus scan software, but I’ve never actually seen it catch anything. What I run on the server, and what my ONE (yes one) resold client will run, aren’t going to get caught by it. We run the same stuff. So call it a calculated risk. I also turned off EntropyChat (never gonna use it), MailMan (resource hog), Analog Stats and Webalizer (leaving AW stats, though personally I use Woorpa and Google for stats). Gave the server a bounce after all that and my memory dropped from the 50-th percentile to the 30s. I consider that a success.

    My only issue is that my phpinfo page looks weird… No idea what happened there.

  • Google’s Blog Search is Irrelevant

    Google is a great search tool to find a website or general information about a topic, but quite frankly I’ve come to despise their blog search engine and I’m seeing serious flaws in their ranking app. In specific, they now search blog links (aka the blogroll) and when you search blogs about a topic, you get unrelated posts.

    If you search for Laurence Fishburne because you saw him on an episode of MAS*H recently as a soldier with a racist CO, Google gives you two hits for IMDb, one for Wikipedia, one about news (GoogleNews that is), and then, finally, his official website. While Google claims they don’t adjust ratings (that is, they don’t give more or less weight to a website on their own) and allow their PageRank algorithm to sort all this out, it seems to me that any official website should be ranked first. Also, IMDb shouldn’t be listed twice. But that depends on what people are looking for and what Google offers.

    We stand alone in our focus on developing the “perfect search engine,” defined by co-founder Larry Page as something that, “understands exactly what you mean and gives you back exactly what you want.”

    With that in mind, as I look at their tech overview for people who aren’t super geeky, I think that they come to the process a little flawed. PageRank is a great idea, don’t get me wrong. The more pages that link to a site, the higher the site is ranked (in essence). Okay, that’s great! Until you have those damn splogs. You know the ones. Spam blogs that promise you information about a person/place/thing, but is nothing more than a ton of links and 100 popups.

    Why would I search blogs? Easy, a lot of news sites are using blogs these days, and I want to read those too. It’s not rocket surgery, it’s how news is disseminated in 2009, folks. And I, personally, like to search by ‘date’ because I want to know what’s newest.

    Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user’s query.

    This looks like it should take care of spam blogs, but if you’ve ever done a search on blogs about someone (let’s use Mr. Fishburne again), you know it’s a crap-shoot.

    A news search is actually pretty helpful. I get some articles of interest right up front. If I flip the bit and sort by date it’s still pretty useful. When I go to blog search (which is a sidebar link off news), it’s still mostly beneficial.

    But I dare you, I dare you, to make sense of the articles when you click sort by date. Three posts on that first page might actually be something worth reading. Good luck finding them, and I hope they actually are what you want. But at the end of the day, those spam blogs aren’t the problem that makes me hate the blog-search.

    No, the problem, as I see it, are the posts like this:
    splog-1

    That bit I circled for you means that the ‘label’ (tag, category, whatever) for ‘Laurence Fishburne’ has been used 4 times. Go to that post and you will not find a single thing on the page of use. 99.999% of these blogs are blogspot and, while I don’t begrudge them their posts, they’re getting false promotion! And your post that you lovingly crafted about how totally amazing Fishburne is, and how he acted the hell out of that scene last night is now 10th on the list, and bound for page 2 any second now.

    The only official Google respons I can find on the matter is a post by Jeremy Hylton in their google forums (dated November 2008).

    We expected some problems from blogroll matches, but may have
    underestimated the impact on searches using the link: operator or
    where the query matches a blog or blogger’s name. We do expect to fix
    the problem you’re seeing. We’ll use the full page content, but
    exclude the content that isn’t really part of the post. I’m not sure
    if we’ll be able to make the change before the end of the year, but we
    are working on it and are pretty confident that it can be solved.
    We’ll post an update here when we’ve got a solution.

    And no, there is no update to that post.

    The hoopla from other blog sites has died down, but as this is still a prevalent problem on the blog search, I would really like to see it heat up again. Google’s blog search is pretty much dead useless to me if I can’t find information I want. As finding what I want is the whole point of Google (they said it first), they’ve made themselves irrelevant.

  • Woop(ra)! There it is!

    woopra A couple months ago, I stumbled onto this statistic site called Woopra, and signed up to be a Beta tester. I already use things like Google Analytics and SiteMeter, which let me see how much traffic a site gets a day, based on about twelve different interpolations of the metrics.

    Basically, I’ve learned I can tweak my results to make it look like I get a lot of traffic or a little, which serves no purpose. But I can also compare my sites to previous days, which I actually do find helpful. I can learn what days my sites are heavily hit, which days are good days to upgrade code because no one’s there, and so on and so forth. What you have to figure out is why you need stats. Statistics are meaningless for a site like ipstenu.org, because there’s no money to be made here. For jorjafox.net, I find that they help me understand trends and as that site averages about $.75 a day in ads, it’s beneficial.

    Google Analytics and SiteMeter are both ‘yesterday’ code, however. I don’t get to see the current status of my site until the day after. Most of the time that’s okay. If I really am desperate for pageviews and such, I have other tools on my server to figure that out (and Google Adsense can be brute forced into helping). But sometimes you want to watch what people are doing as they’re doing it, in real time.

    Enter Woopra.

    With Woopra, I can sit and watch people ping the heck out of my sites and see what they do as they do it. It’s a little Big Brother, but honestly, if you didn’t know that someone can tell who you are when you visit their website, it’s too late for you. Woopra lets me watch as people from different countries sneak in and out, where they come from and where they go to when they leave. Like I find that the majority of my hits come from the Gallery (200 pageviews an hour, give or take), and most of the referrers are from the main site or the wiki. This is all stuff I knew, but it’s nice to see them in live tracking.

    Do you need this stuff to run a good website? No, not at all. But if you’re starting to move your site from ‘good’ to ‘moneymaking’, then these are things you have to start to study and understand. Like that it’s okay to have an 11% drop in pageviews at noon, because the average at the end of the day will balance out. Or that you get a lot of traffic at 3pm from youtube. All these things help you better understand the Venn diagram that is your website, and the more you know …

    Well there you are, then, aren’t you?

  • Caveman Tech Support Redux

    Many moons ago I mentioned how Caveman Tech Support was no different than the stuff I do. This is still the case. I will now caveman up my recent phone call.

    Grog: This fire help. Me Grog
    Lorto: Me Lorto. Help. Firekit wrong.
    Grog: You receive Firekit 2.0?
    Lorto: Ugh.
    Grog: Box include sticks, stone, flint?
    Lorto: Ugh.
    Grog: No broke sticks, stone flint?
    Lorto: Ugh. All here. All good. Firekit wrong.
    Grog: (sigh) If all there, what wrong?
    Lorto: No spark, no fire, me confused. Not like directions.
    Grog: *sigh* You missing items?
    Lorto: I missing nothing.
    Grog: You sure?
    Lorto: Me sure. No missing.
    Grog: You sure?
    Lorto: Me have one thing missing. Box say ‘include flint’ but Lorto no have flint. This not make Lorto no have fire, ugh?

  • Cross Compatible

    One of the things about the net that I love and hate is the development of freedom of expression. Don’t get me wrong, I’m a socialist at heart, and I love the fact that people can say what they want, how the want, in the USA. Well, mostly. Illegality being what it is.

    But I digress!

    The Browser Wars ended with a weird stalemate, and it wasn’t by choice of the users. The World Wide Web Consortium (W3C) and their modifications of what was and was not okay in the Web (yes, there are guidelines to web design) managed to reward early adopters for things like HTML 2 and XHTML. They, without ever enforcing rules, and without ever making a browser of their own, managed to finagle enough ‘power’ such that website developers wanted to proudly display their compatibility. No longer did we create sites like ‘Best viewed on IE’, but we aimed for these standards, and coerced our websites to look ‘Okay’ on IE and Firefox, Windows XP and Mac OS X.
    IE 7 looks pretty good!
    But unless you have three computers with multiple boot sectors and multiple browser versions, either physically or virtually, how do you know what your site looks like?

    My personal website I know is ‘okay’ on most browsers. It looks perfect, just as I want, on OS X in Safari and Firefox. It looks good on Windows in Firefox. And then there’s IE. I hate it. I hate it. It’s not safe, it ignores the W3C, and it just doesn’t do what I think it should. Browse Happy is a site dedicated to reminding people about the alternatives. Like Firefox, yes, which is my Windows XP browser of choice. But I can’t just ignore IE, even if I hate it. Oh, I ignore IE 6 and older, but 7 and 8 I need to pay attention to. So what do I do?

    I hit up sites like BrowserShots, where they will go and snag a screenshot of what my site looks like in a freakishly vast array of browser/OS combinations. It’s not perfect, sometimes it hits weird errors where things I know look fine suddenly don’t. But if you want a quick shot to see what CSS stupidity certain browsers entertain, well, it’s good and free.