Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: administration

  • Hotlink Protection

    Hotlinking is putting a link to someone else’s webpage’s graphic on your site. This is also called bandwidth theft. Directly linking to a website’s files (images, video, etc.) means that when someone accesses your website, they draw bandwidth from another. If you use an IMG tag to show a picture from someone else’s page on your blog, forum post, or website, that’s hotlinking. You’re stealing their bandwidth.

    There is a case in which this sort of ‘theft’ is ethically permissible, though some webhosts don’t like it. If you have multiple Yahoo! sites, and one is low on bandwidth, you can shuttle some of your content to the other site, and thus split up the bandwidth. This isn’t always a good idea, as if it’s against the Terms of Service on your host, they can kill you. Which is why you should always back up your websites on your on computer. If you own your own domains (like I do) and have multiple ‘subdomains,’ then it’s okay to share an image. ipstenu.org is considered a different website that photos.ipstenu.org, so I have to tell my server it’s okay to share between the two. But that’s code geeky.

    What the common websurfer needs to know is this: direct linking to a picture, movie file, or any other content on someone else’s site, unless it’s a simple URL link to that site, is bad form, ethically asinine, and impolite. It’s akin to stealing electricity from your neighbor by plugging into their outlets.

    But what do you do when someone’s hotlinking to your server? Most of us find out about this via a nastygram from our webhost saying we’re using too much bandwidth. Bandwidth controls how fast you can view the net from your home, as well as how much data a website can share with the world each month. Having more bandwidth is better all the time, but forcing users to use more bandwidth with image heavy sites and poorly coded web pages is not cool. Still, sometimes you have a moderate site and one image becomes super popular.

    This is where you need to learn about hotlink protection. The most basic code is this:

     
    # Simple Hotlink Protection
    
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$                   [NC]
    RewriteRule \.(gif|jpe?g?|png)$                             - [F,NC,L]
    
    

    This basically says ‘If you’re not from yourdomain.com, and you’re trying to see an image, you’re not me, go away. Sometimes I make that last line something like this:

    RewriteRule \.(gif|jpe?g|png)$ http://mydomain.com/hotlink.gif         [NC,L]
    

    Which shows them a ‘No, don’t do that’ image. If you’re going to do that, use a SMALL image, since that will use up some of your bandwidth.

    For most people, that works just fine, but I’ve run into a couple situations that were weird.

    Multiple Subdomains

    If you’re using a lot of subdomains (like, say, with WordPress MultiSite) you’ll find pretty quickly that the normal hotlink protection rule will block subdomain.yoursite.com from getting images from www.yoursite.com and we don’t want that! For one subdomain, it’s an easy fix:

     
    # Simple Hotlink Protection
    
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^$
    RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com(/)?.*$                   [NC]
    RewriteCond %{HTTP_REFERER} !^http://(subdomain\.)?yourdomain.net(/)?.*$               [NC]
    RewriteRule \.(gif|jpe?g?|png)$                             - [F,NC,L]
    
    

    But what about multiple sites? At 12 subdomains, you don’t want to have to add these links in manually every time! Thankfully, the geniuses at Perishable Press have created the Ultimate htaccess Anti-Hotlinking Strategy. You can read the whole post for the details, but here’s the basic code:

     
    # ultimate hotlink protection
    
     RewriteEngine on
     RewriteCond %{HTTP_REFERER}     !^$
     RewriteCond %{REQUEST_FILENAME} -f
     RewriteCond %{REQUEST_FILENAME} \.(gif|jpe?g?|png)$           [NC]
     RewriteCond %{HTTP_REFERER}     !^https?://([^.]+\.)?domain\. [NC]
     RewriteRule \.(gif|jpe?g?|png)$                             - [F,NC,L]
    
    

    Simple. Elegant. Genius. All you have to do is change domain to whatever your domain is. Notice there’s no .com or .net in there? There doesn’t need to be. This is the one I use for this site:

     
    # ultimate hotlink protection
    
     RewriteEngine on
     RewriteCond %{HTTP_REFERER}     !^$
     RewriteCond %{REQUEST_FILENAME} -f
     RewriteCond %{REQUEST_FILENAME} \.(gif|jpe?g?|png)$           [NC]
     RewriteCond %{HTTP_REFERER}     !^https?://([^.]+\.)?ipstenu\. [NC]
     RewriteRule \.(gif|jpe?g?|png)$                             - [F,NC,L]
    
    

    That’s it. Just change domain to ipstenu and I’m done.

    Letting Other Sites Use Your Images

    The other major gotcha to this is what about other sites where it’s okay if they link to you? For example, I have a livejournal site (I know) that’s a mirror of another blog. To take care of that, I added in this as my last condition:

     RewriteCond %{HTTP_REFERER}     !^http://ipstenu.livejournal\.   [NC]
    

    Here I specified the URL a little more, since I don’t want all of livejournal nabbing my images. Of course, ironically enough, the line where I call ipstenu has the funny side effect of allowing any URL with the name ‘ipstenu’ in it to access my site. Which is a risk I accept right now.

    If you’re using my first example, the simple protection, then just like you added in a subdomain, you add in your other URLs

     
    RewriteCond %{HTTP_REFERER} !^http://ipstenu\.livejournal\.com(/)?.*$               [NC]
    

    This will save you some headaches down the road, but just remember which one your using. Otherwise, like me when I made a new subdomain, you’ll sit there wondering why the heck the images are broken!

  • The dangers of an unchecked MultiSite?

    Blogetery was shut down, mysteriously, over the weekend. It was a WP MultiSite setup, with around 70k blogs. Not terribly abnormal to have an install that big, but the thing as an unnamed law enforcement agency shut them down. Details, such as they were, were posted at ReadWriteWeb: 70,000 Blogs Shut Down by U.S. Law Enforcement. Their shutdown reminded me of the hazards of running a website where anyone can register and make their own site and how important it is to be vigilant about what shows up on your website.

    Discussion of the situation spun up on Web Hosting Talk where it was determined that Blogetrey had been accused of hosting inappropriate content before. That probably meant they were hosting torrents or other illegal but not shut-down worthy. Copyright infringement. The site owner claimed that every copyright violation was removed within 24 hours. By the way, if you ever get slapped with a DMCA notice (i.e. a notice that your site has content copyritten to someone else), in order to be safe from a law suit, all you have to do is remove it. Done.

    So what on earth would cause BurstNET, their host, to shut down the site without warning or notice? That’s right, he had to ask ‘What happened to my site?’ and was told it was shut down, terminated, and here’s his money back.

    Turns out he had a link.

    From BurstNET’s statement:

    “It was revealed that a link to terrorist material, including bomb-making instructions and an al-Qaeda “hit list”, had been posted to the site. “

    That’s it. A link. One link. But it was enough for a warrant which then showed this:

    “Upon review, BurstNET® determined that the posted material, in addition to potentially inciting dangerous activities, specifically violated the BurstNET® Acceptable Use Policy. This policy strictly prohibits the posting of “terrorist propaganda, racist material, or bomb/weapon instructions”. Due to this violation and the fact that the site had a history of previous abuse, BurstNET® elected to immediately disable the system.”

    Now the previous ‘abuse’ was copyvio, which was all handled legally, but clearly BurstNET was feeling the pinch. They probably got slapped with a wwarrent and did the legal thing: They shut it down.

    Reagrdless of if it was fair or not to the other 69,999 sites hosted by Blogetery, it brings up the inherent problems of running an unchecked MultiSite. Anyone can make a blog/site, anyone can update it, and anyone can get you in trouble.

    It’s been a few weeks, but finally news is coming out about the whole story. CNET’s article was invectively titled Bomb-making tips, hit list behind Blogetery closure. That said, it explained this in more detail which let everyone get a grip on what was actually going on.

    I’m not going to get into the ethics of free speech and how it does (and doesn’t) apply to your website. Instead I want to use this as a reminder of the trouble you can get into, hosting websites. I host four, three are ‘mine’ and one is a site I like and visit pretty often. I’m very much aware of what’s going on all these sites and I monitor them frequently. This is not just to my benefit, but to everyone else’s on my servers. My host would be 100% within their rights to say “Ipstenu’s got a site that has kiddie porn! Kill her account!” and that would shut down everyone on my server.

    As I mentioned before, WordPress MultiSite makes it a lot easier for someone to host a thousand blogs, unchecked, but that also means it’s a lot easier for someone to post questionable content. For copyvio cases, you’re covered when you remove the material in question, but for porn and terrorism, it’s not actually under the same purview. Again. I’m NOT going to get into the why of this, nor the right or wrong about it. If you have a website, you have to accept that your host really has no interest in being involved with a legal dispute regarding kiddie porn or terrorism.

    This means it’s down to you to constantly and consistantly monitor your site for sub-sites and domains that are questionable. For me, if a site I host gets one Cease and Desist about copyvio, I take down the material, explain to the person who runs the site why, and ask them not to do it again. At this point, it’s their job to monitor their site. Should they fail to do so a second time, I give them a final warning of ‘If you can’t keep tabs on your site and your visitors, you can’t stay here.’ Third time and I close their account, refund them what’s left on their time, and offer to give them a copy of their site and database, intact.

    For the rest, though, it’s a no-warning termination, specifically because porn and terrorism are hot button topics. I’m within my rights to do so (I own the server, I make the rules) and I owe it to the other people. My ISP is in their rights to do similar, because they own the … land my server is on. If that makes sense.

    If all this sounds like too much work for you, then you shouldn’t be running an open, anyone-can-register-and-blog, multisite. Or you should hire some staff. Multisite is not a quick money scheme, it’s a job, and you have to take it seriously.

    This is not endemic solely of WordPress, but with the advent of MultiSite becoming mainstream, it’s something that’s going to start coming up more and more. Don’t say you weren’t warned.

  • MediaWiki – All Powerful, All Annoying

    Don’t get me wrong, I love MediaWiki. It’s ‘overkill’ for what I need, but then again, I wanted a stand-alone ‘encyclopedia’ where primarily text based articles were listed, without the ability to comment. And until someone can trim WordPress to run as fast as MediaWiki, I’m sticking with it. Well, that and they need an ‘import from MediaWiki’ tool, cause at 700-odd pages, I’m not doing it by hand. It’s a static website, and it does it’s job well.

    But right now, and every time I need to update it, I hate it.

    I don’t mind using command line to wget the latest version and unzip it, overlaying the new files atop the old ones. What I mind is having to manually visit the pages for all my extensions, and determine if I need to upgrade or not. It makes me wish for WordPress with the happy ‘Hey, that plugin needs updating!’

    See, there’s no admin ‘side’ to MediaWiki, like there is for WordPress, or ZenGallery, or anything else I run on my sites. MediaWiki is for the hardcore people who don’t mind getting their hands dirty. And as a user, I think this is the real problem with the whole thing. Until they make a user friendly admin side of the whole thing, MediaWiki will remain used by the nerdy, the geeky and the techie, rather than the whole world. Part of why WordPress became so popular is they made it not easy, but easier to run your own blog. It’s still got problems, sure, but they made it so you could easily learn how to manage your own site.

    And then there’s MediaWiki.

    MediaWiki sucks to admin. Like today I found out I could turn on File Caching. That’s great new, I think! I use it for my gallery and my blogs (runs faster among other things). Except that, unlike WordPress (where Donncha’s freakin’ amazing WP Super Cache can clear out files on a scheduled basis) or ZenPhoto (where it runs once a day, or whenever I press ‘clear!’), MediaWiki has no cache expiry. That blew my mind, but seeing as MediaWikis are ‘mostly’ static content, it makes a little sense.

    So I turned it on and ran $php maintenance/rebuildFileCache.php which force caches everything. All at once. This is awesome to get your site ‘started’ and all told, it took up about a moderate, but not huge, bit of space.

    Also, I was told ‘When you edit a page, the cache is refreshed’ except I did, and it didn’t. Then I was told ‘Add this to your page URL and it will prompt you to recache.’ (this being ?action=purge) except that didn’t either. If I was logged in, it did nothing. If I was logged out, it did, but then I went back and it was still the old page. Finally I sorted out that the cache pages had to be owned by ‘nobody:nobody’ (this isn’t too weird, BTW). The problem NOW is that if they were owned by that, then the script rebuildFileCache.php didn’t work!

    So, great, it now works, it now flushs when I edit and save a page. If I run the rebuild command, I’ll have to manually go in and chown the files to nobody, which annoys me, but I have godlike access to the server and I can always fix it. But what if I want to delete everything in the cache? Basically I have to dump the entire folder. Which is annoying, but at least it’s working now.

    Why would I have to flush the whole cache? Because I make a formatting change, let’s say. Also, I have advertising on my sites. How does this get affected?

    In the end, I’m going to keep the cache running for a month, see how it goes. But it still annoys me how much of this is lacking because of no admin ‘dashboard.’

    Then again, that’s MediaWiki. Function over form. All powerful, all annoying.

  • But If, Baby, I’m The Bottom, You’re The TOP

    Earlier this month I talked about how my server was acting wonky and how I fixed it using, among other tools, TOP.

    This week I was chatting with a fellow about CPU usage and his site. He runs a rather large WordPress blog and the database is about 500 megs. As a comparison, this site, with about 500 posts, is under 5 megs, and my big site, with thousands of posts, comments, and a forum, is 10 megs. The biggest site I run on my server is 850 megs (just down from 910 after some clean up). The difference between his site and mine is that his is slow and he knows it. As we discussed ways to speed it up, I had some thoughts on WordPress and how, at a certain point, you’re going to need to dig into the guts of your server and learn TOP.

    The ‘problem’ with most ‘How do I make my WordPress site run faster?’ tutorials, as I’ve seen it, is they address surviving the digg effect. That is, they talk about how to deal with having a high volume of traffic on your site and, for the most part, you can make it with just adding caching plugins.

    Once your site gets ‘big’ or ‘popular’ you’re going to have to move off shared/cloud hosting and over to your own server. For most of us, the first step is a VPS (Virtual Private Server). Shared Hosting means ‘You have an account on a server with a hundred other people.’ It’s great for small sites, inexpensive and easy to use. The problem is you could have terrible neighbors, who use up all the CPU. Think of it like those old New York apartments where someone’s a jerk at 5am and uses up the hot water so you, at 7am, have none. Yeah, it’s kind of like that. That’s the day you think ‘I want a house!’

    Only, well, we’re not all up for houses just yet. A house would be a dedicated server, where it’s just you. Cloud hosting, which I touched on earlier, would be the college dorms of webhosting. It has a lot of benefits for the really small sites, and actually some for large sites, but I’m not sold of their overall usefullness yet, so I’ll talk about them some other time. What I want to talk about are Virtual Private Servers, the condo-sub-leasing (or rent-to-own maybe) of website hosting, and how the new VPS user should really get on TOP of things (sorry, bad pun) to make their lives easier.

    TOP. Well ‘top’ really. Unix commands are generally all lower case like that.

    The top command is a system monitor tool that outputs a list of processes. Have you ever seen Task Manager in Windows? It’s kind of like that tab for ‘Processes’ that you look at and run away from. The default view of top is by percentage of CPU usage and the “top” CPU users are listed. See? The name made sense. You can also see how much processing power is being used, memory hogs and other cool things. Most modern Unix-systems let you sort the list, colorize it, etc, though you have to be command line savvy.

    Here’s what top looked like for me about an hour ago.

    top - 12:44:44 up 126 days, 23:13,  1 user,  load average: 0.12, 0.17, 0.17
    Tasks:  91 total,   1 running,  90 sleeping,   0 stopped,   0 zombie
    Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
    Mem:    524288k total,   358248k used,   166040k free,        0k buffers
    Swap:        0k total,        0k used,        0k free,        0k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    15616 nobody    15   0 94540  65m  20m S  1.0 12.8   0:00.27 httpd
    12261 ipstenu   16   0  1908 1012  780 R  0.4  0.2   0:00.30 top
    [...]
    28630 root      16   0  107m  86m 1096 S  0.0 16.8   1:06.30 /usr/sbin/clamd

    I wanted to point out clamd, which has been the bane of my existance. Thing won’t DIE. I ended up going in to /etc/exim.conf and manually commended out the clamd line (and restarted the service) to finally get it gone.

    But top, as you can see, has a freakishly large amount of information. My server is doing fine, at this point, so I don’t have a whole lot to show you. What you can see right away is that I can tell, with a glance, what’s going on. I could see, though and at this point I have a ‘nobody’ process. That just means someone’s accessing my website. No, really! That’s good! The CPU and memory usage seem high, but they vanish in a second. Basically, someone rang my doorbell and for that brief moment, electricity was used. The next thing I see is the top command, which is run by me (hi!) and down the line is that idiot, clamd.

    I actually scan top a lot at work these days, trying to understand what’s causing issues. It’s good for ‘right now!’ things, but not so much if I want to see what started a strange spike a couple hours (weeks) ago. For that you need a whole mess of tools.

  • I Haven’t Got Time For The Pain!

    Carly Simon and you should get the joke here Two months ago (give or take) I mused over photo gallery options for my sites. For Ipstenu, I’m now using WordPress and treating it like a photoblog. For JFO, however, I couldn’t answer it that easily.

    I really do like the Gallery project. I do! I learned a great deal about photography from it, and I’m thankful for it. But. I needed to move on as a user, a developer and a photographer. On that last one, I’m not a profession one, I’m just a goofy girl with a camera who likes to remember where she’s been. As a user, Gallery2 did the job well and without major issues. As a developer, it made me want to cry. Many times. Once I had to log into my friend’s server to fix his install. That just whomps.

    Even the developers admit that Gallery2 suffered from bloat:

    The code base is too complex and over-engineered because it was designed to fix every single thing that was wrong with Gallery 1 (Second System Effect) leaving its scope hazy and broad.

    The whole idea of it was “Your photos, your website.” And personally I love that. I hate having flikr or picasa in charge of MY photos. Let alone FaceBook. I have a blog on my domain for that same reason. But Gallery2 was too much. I never used half of it and it was 16+megs at its slimmest install. That the developers agreed with my feelings delighted me. And the Feature List was also exciting. As soon as G3 popped out, I grabbed a copy and started playing.

    With each version of Gallery3’s beta releases, I would get excited and then disappointed. Excited for the new toys and disappointed for how the overall effect felt. It just felt wrong for me. It wasn’t really Web2.0, even though it was, and the usage felt off. It didn’t make intuitively as much sense as G2, though it was still far better than Coppermine (which frankly I hate, and I know more people who argue with it than anything). At first I thought it was because I was so used to G1 and G2, but then I realized that over the last 10 years, I’ve used so many different systems that I’m fine with subtle differences. I’m savvy, I’m smart, I can code, so why did G3 feel wrong to me?

    It was too hard. Too much was built in and not plugable. Too much was hard coded in itself. Theming was impossible in the first release, and way too hard in the third. Understanding the theme system in G2 was easy, though implementing it was hard. Understanding it in G3 was hard and implementing was horrific. And before someone reminds me, AGAIN, that this isn’t even a beta product but an alpha, quite frankly that’s not an excuse. The basic things you need to be able to do with a first public release (be it beta, alpha or whatever) is to use it: Upload photos, change options, theme. That’s it. Those are the three things at it’s most basic that photo gallery software has to have, or you may as well be using an off-site solution.

    And while I may sound like I’m ranting, I’m not. I’m sad and frustrated and … You know, I really like Gallery! I really do. But it was starting to feel like Movable Type. They made a big shift and suddenly I wanted to know who peed in my coffee. The code felt wrong, it felt klunky, it felt raw. It was like starting over, and I didn’t like where it was going. And I realized the fact was that I was going to say goodbye to an old friend.

    Personally I’m all about the simplest, best, tool for the job. I wanted a way to update news on JFO and, when that was ALL I needed, I used CuteNews. When I realized the site was going to need something more, I weighed my options, tested software, and decided that while WordPress was a bit of overkill, I knew how to support it and customize it to be what I needed. In the end, that proved to be a perfect choice. When I had a forum (the first time around), it was IPB, which I liked, but it always felt too big. Now I use the very basic bbPress and it’s what I need and nothing more.

    If WordPress had PhotoPress, I’d probably have snagged that. Instead, I shopped around. I installed Coppermine, again, to test. I put up G3-alpha3 and then 4. I went to WikiPedia and dug out the compares and ended up in a head to head battle between ZenPhoto and Gallery3.

    ZenPhoto won by feeling better.

    Seriously, it’s asthetics at this point. There are only two features I miss: Being able to re-upload a picture and keep it’s MetaData, and having ‘new’ images show up with a different background color. But I can live without those.

  • Woop(ra)! There it is!

    woopra A couple months ago, I stumbled onto this statistic site called Woopra, and signed up to be a Beta tester. I already use things like Google Analytics and SiteMeter, which let me see how much traffic a site gets a day, based on about twelve different interpolations of the metrics.

    Basically, I’ve learned I can tweak my results to make it look like I get a lot of traffic or a little, which serves no purpose. But I can also compare my sites to previous days, which I actually do find helpful. I can learn what days my sites are heavily hit, which days are good days to upgrade code because no one’s there, and so on and so forth. What you have to figure out is why you need stats. Statistics are meaningless for a site like ipstenu.org, because there’s no money to be made here. For jorjafox.net, I find that they help me understand trends and as that site averages about $.75 a day in ads, it’s beneficial.

    Google Analytics and SiteMeter are both ‘yesterday’ code, however. I don’t get to see the current status of my site until the day after. Most of the time that’s okay. If I really am desperate for pageviews and such, I have other tools on my server to figure that out (and Google Adsense can be brute forced into helping). But sometimes you want to watch what people are doing as they’re doing it, in real time.

    Enter Woopra.

    With Woopra, I can sit and watch people ping the heck out of my sites and see what they do as they do it. It’s a little Big Brother, but honestly, if you didn’t know that someone can tell who you are when you visit their website, it’s too late for you. Woopra lets me watch as people from different countries sneak in and out, where they come from and where they go to when they leave. Like I find that the majority of my hits come from the Gallery (200 pageviews an hour, give or take), and most of the referrers are from the main site or the wiki. This is all stuff I knew, but it’s nice to see them in live tracking.

    Do you need this stuff to run a good website? No, not at all. But if you’re starting to move your site from ‘good’ to ‘moneymaking’, then these are things you have to start to study and understand. Like that it’s okay to have an 11% drop in pageviews at noon, because the average at the end of the day will balance out. Or that you get a lot of traffic at 3pm from youtube. All these things help you better understand the Venn diagram that is your website, and the more you know …

    Well there you are, then, aren’t you?