Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: htaccess

  • Switching to WordPress MultiSite Breaks Links

    As of WordPress 3.1 you can NO LONGER remove the blog slug via the method described in this post! – Feb 28 2011

    Warning: WordPress MultiSite is not for the newbie!

    Most people I know who are using MultiSite (formerly WPMU) are people who have always been using that fork of WordPress. With the advent of MultiSite, a built in way for people to move from single ‘blog’ installs of WordPress to a multiple site structure, more and more users want to move to it, which is great. But. There are some weird ‘problems’ with MultiSite that cause a lot of users to think it’s broken. It’s not, it’s just different.

    My friend Andrea wrote a nice tutorial about how to create a network, and generally when I need help, I scan WPMU Tutorials first. Or I tweet at her. One of the things we’ve talked about is the way WordPress MultiSite changes your blog links.

    When I changed https://ipstenu.org from single to multi, my links were changed from https://ipstenu.org/YYYY/MM/PostName to https://ipstenu.org/blog/YYYY/MM/PostName, which I did not want! Thankfully, I knew a lot of people had this problem beforehand and I fixed it quickly, which I will explain here, but first, here’s why it’s done that way.

    Normally, when we use pretty permalinks (that’s the part of your URL that looks ‘pretty’ instead of code-like) on a blog, all we have to worry about is not conflicting with post names and page names. If you have an about page, generally the URL is https://ipstenu.org/about. Since WordPress has to translate https://ipstenu.org/?p=2 into that to be pretty, you can’t also have a blog post with the same URL. How would the software know WHICH page you wanted? Simple enough. For details on that, you can read Otto’s post on Category in Permalinks Considered Harmful.

    When you add in MultiSite, suddenly the code has to check for something more: is this new Site named something that breaks my existing links?

    There are two kinds of MultiSite, and I use them both. You have SubFolders and SubDomains. Subfolders gets you URLs like https://ipstenu.org/code and SubDomains gets you https://halfelf.org/. If you use SubFolders, WordPress takes an extra step to make sure that pages named ‘code’ don’t break when you have a new blog with the same name. To help do that, it kicks your blog URLs from your main blog (i.e. https://ipstenu.org/) up a level to a new URL: https://ipstenu.org/blog/. For those of us who move from Single to Multi, this breaks our old links.

    Thankfully there are a couple fixes which can tide you over until WordPress 3.1 is released. The developers plan on having this ‘fixed’ or at least editable by then, so at the worst, you can go from having a blog slug to ‘news’ or ‘pasta’ or whatever.

    .htaccess

    If you want to keep the blog folder, but you want to redirect your old posts, .htaccess is your friend.

    If I wanted to redirect all YYYY/MM/DD posts to /blog/… I would add this:

     
    RewriteRule ^/([0-9]{4})/([0-9]{2})/([0-9]{2})/(.*)$ https://ipstenu.org/blog/$1/$2/$3/$4 [L,R=301]
    

    What I actually did on another site was this:

    RewriteRule ^/([0-9]{4})/([0-9]{2})/([0-9]{2})/(.*)$ http://domain.com/blog/$4 [L,R=301]
    

    And I changed my permalinks to just post name. This was risky, in a way, but it matches how I use the site better.

    Plugins

    There are a couple plugins that can help you here. Thinking in WordPress made a plugin for WPMU, but I’ve heard it doesn’t quite work yet.

    Andrea (and her husband Ron) have a Remove Blog plugin that can also help you.

    I’m sure there are others, but I mostly recommend Andrea and Ron’s. And not just because I like them.

    Hack

    If you’re dead-set on doing this, and you don’t want to use a plugin, there is a manual way to do this. Remember, by doing this you will BREAK the ability of WordPress to check for conflicts between your main site’s urls and any new site. If you’re using SubDirectory SubDomain MultiSite, this should be fine. Hell, I’m doing it. If you’re using SubFolder, I really don’t recommend this at all.

    You ready?

    • Login to your admin site.
    • Go to: Super Admin > Sites > Edit
    • Scroll down to “Permalink Structure” and remove the “/blog” part
    • Save

    That’s it.

  • Hotlinking

    There are two questions I need to spell out for folks before I get into the code bits, and I’ll try to keep this as light-tech as possible. If you run a website, or have a free site, or just want to post your pictures on the web, you need to know this. If you have your own domain, you need to know this. If you post pictures to a bulletin board, you need to know this. Basically, if you use the internet at all, read this. I’ll let you know when you need to stop reading.

    Things everyone should know
    Things every webmaster should know
    Things every ISP should know


    Things everyone should know

    Bandwidth

    Bandwidth means, for computer users, the data transfer rate, or how much data can be transferred in a given time period. The easiest example here is how you access the net. If you use a modem for dial up (and I feel for you), then you use 14.4, 28.8, 33.6 or 54 kilobytes (kb) per second. To give you an idea how small a kb it, one letter (that is ‘a’ for example) is a kb, roughly. That’s not an exact science, but it’ll give you a rough idea. At 14.4, your email downloads at roughly 14 letters a second. Which is why dial-up sucks. In the world of computers, bigger bandwidth is better. The more bandwidth, the faster you can download the preview of the new Batman movie.

    In addition to speed, bandwidth also means how much data you can transfer in a given time period. This website has an allocation of 30 gigabytes of data per month, and we average about 5. My other website has the same allocation and averages 18. If I go over my data transfer for a given month, I can either pay out the nose for extra bandwidth, or I can let the site be shut down till the next month. The reason this is important to know, is if you run a website, every time a page loads, you use bandwidth. On a site like Yahoo! GeoCities, you get 3 GB/month. Yeah, you think that’s great, but it really sucks if you want to post things like a blog and people click here a lot. This aspect of bandwidth is the reason why most sites I design are low on the graphics. More graphics means more data transfered means more bandwidth used. In the case of data transfer allocation, bigger sites does not equal better, though bigger bandwidth is king.

    Then again, the bigger your site, the longer it takes to download, and the less time it takes for people on 56k to get pissed and tell you that you suck. Finding a webdesign that’s a balance between your dream design and speed is why people like me have jobs.

    In summation: Bandwidth controls how fast you can view the net from your home, as well as how much data a website can share with the world each month. Having more bandwidth is better all the time, but forcing users to use more bandwidth with image heavy sites and poorly coded web pages is not cool.

    Hotlinking

    Hotlinking is putting a link to someone else’s webpage’s graphic on your site. This is also called bandwidth theft. Directly linking to a website’s files (images, video, etc.) means that when someone accesses your website, they draw bandwidth from another. If you use an >IMG< tag to show a picture from someone else's page on your blog, forum post, or website, that's hotlinking. You're stealing their bandwidth. There is a case in which this sort of 'theft' is ethically permissible, though some webhosts don't like it. If you have multiple Yahoo! sites, and one is low on bandwidth, you can shuttle some of your content to the other site, and thus split up the bandwidth. This isn't always a good idea, as if it's against the Terms of Service on your host, they can kill you. Which is why you should always back up your websites on your on computer. If you own your own domains (like I do) and have multiple 'subdomains,' then it's okay to share an image. ipstenu.org is considered a different website that ipstenu.org/blog, so I have to tell my server it's okay to share between the two. But that's code geeky. What the common websurfer needs to know is this: direct linking to a picture, movie file, or any other content on someone else's site, unless it's a simple URL link to that site, is bad form, ethically asinine, and impolite. It’s akin to stealing electricity from your neighbor by plugging into their outlets.

    In summation: Hotlinking is stealing bandwidth from someone else’s website, and is considered to be unethical.

    Things every webmaster should know

    Now that you’ve gotten this far, we’re going into heavy geekitude. I have actually once had my site nearly shut down because someone was hotlinking to an image, and I had to figure out how to prevent it. This is the knowledge I share with you.

    Hotlink Prevention for Apache

    Apache is the de facto webserver for Unix. I don’t like IIS (Windows webserver) and so few people use Netscape’s webserver, I won’t even consider that anymore. Pretty much, I use Apache and if you don’t, I haven’t a clue how to help you.

    On Apache (and in theory this works on IIS, but as I said, I don’t use it), there is a file in the root of your html folder called .htaccess. This is an Apache directives file, or a config file, that controls how Apache handles the folders in the same folder as the .htaccess file. Your website has a folder, usually called public_html. Inside that folder you have things like a file named index.shtml and a folder named cgi-bin. Below is an example of what my webserver’s root public_html folder might look like.

        .htaccess    blog      index.shtml    images      cgi-bin   robots.txt
        folder1      folder2   foldern

    The .htaccess folder controls how the subfolders (blog, cgi-bin, folder1, folder2, and foldern) are handled. If I look at my .htaccess file, and you can open it up in your text editor of choice, I see this at the very bottom:

        RewriteEngine on
        RewriteCond %{HTTP_REFERER} !^$
        RewriteCond %{HTTP_REFERER} !^https://ipstenu.org/.*$      [NC]
        RewriteCond %{HTTP_REFERER} !^http://ipstenu.org/.*$      [NC]
        RewriteCond %{HTTP_REFERER} !^https://ipstenu.org.*$      [NC]
        RewriteCond %{HTTP_REFERER} !^http://ipstenu.org.*$      [NC]
        RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
    

    This means that I’m telling Apache to turn on the mod ‘RewriteEngine’ and to only permit my webpage (the HTTP_REFERER) to access the images. The images I list are in the ‘RewriteRule.’ I could use variables like ‘jp?g’, but I know what the file extensions are for the files on my server, and I cheat that way. If I wanted to be really mean, and didn’t worry so much about my bandwidth, I’d change the last line to RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ images/nohotlink.gif [L] so that when you try and link to /images/jojo.jpg, you’d get some witty image about how hotlinking is wrong.

    I actually do that on my other server, but the gif I use is 2k so it’s not something I worry about. It also makes it easy for me to later go back and see who’s been hitting that particular GIF and find the mean people. Yes, I have been known to send nasty notes to them.

    Keep in mind, as with any .htaccess rewrites, you may block some legitimate traffic (such as users behind proxies or privacy firewalls) using these techniques.

    Now here’s the big problem. Not all ISPs let you use the Rewrite mod! Half the reason I switched to my current provider was hotlinking (the other was SQL). The rewrite mod (module, don’t you know?) “provides a rule-based rewriting engine to rewrite requested URLs on the fly.” It’s totally magic, and I secretly adore it. It’s complex as fuck, though, and I still don’t really get all that it does. I do know that it works.

    Things ever ISP should know

    You’re a fucking bastard if you don’t let your users use mod_rewrite.

    Was that harsh? Sorry, I mean to say ‘You don’t give a rats ass about bandwidth if you don’t let your users use this.’ I’m well aware there are security ‘concerns’ about what mean people can do with it, but let’s face it, if someone’s smart enough to figure out everything you can do with mod_rewrite, then you’re in trouble anyway. There is a performance hit as every request is checked against the rewrite rules, so if you’re running an image intensive site, this can suck. But the trade off between performance and bandwidth are, to me, minimal.

    Look, if a user have a website with images, and some dickhead out there is hotlinking to that user’s images, then you, the ISP, have to handle the bandwidth crisis, and the pissy user asking you why he can’t use this feature to stop the dickheads?

    And speaking of security, I can’t find any hack for it. So if the fear is ‘really smart, but really evil people utilizing my server for nefarious purposes,’ I think that should be pretty low on the list. I’d put ‘spammer’ and ‘virus distributor’ ahead of it.

    Hotlinking can act like a DDoS attack, and if there’s ever a way to prevent it, by G-d, do it! The mod takes five fucking minutes to install.

    SimpleNet, I’m looking at you.