Half-Elf on Tech

Thoughts From a Professional Lesbian

Category: How It Works

  • I Don’t Understand CloudFlare

    I Don’t Understand CloudFlare

    If you know the answer to all this, I’d love to hear it, because I can’t figure this out. What’s the real point of CloudFlare?

    Fairly recently I was reading Tony Perez’s post about CloudFlare vs Incapsula vs ModSecurity. As regular readers may know, I am frenemies with Mod_Security. I often want to kill it with fire, but I never disable it entirely because it protects my site from hackers. By using Mod_Security I limit my chances of having Bobby Tables kill my site.

    Using Mod_Security gives you some protection from simple SQL injections, but also XSS attacks. You can integrate it with things like Project Honeypot. As they put it:

    ModSecurity™ is an open source, free web application firewall (WAF) Apache module. With over 70% of all attacks now carried out over the web application level, organizations need all the help they can get in making their systems secure. WAFs are deployed to establish an external security layer that increases security, detects and prevents attacks before they reach web applications. It provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring and real-time analysis with little or no changes to existing infrastructure.

    And you know what? It really does all that.

    So what’s CloudFlare? It’s an intermediary between your site and the world which caches your site, compresses data, and gives people the fastest version of your site. In the event your site is down, they’ll serve cached versions. They even give you a pretty picture.

    CloudFlare

    The first time I heard about this, I arched my eyebrows in surprise and confusion. I’m going to make my site faster by putting more layers between the reader and my content? That means instead of just relying on my server and host to be fast, serve compressed pages, keep the lights on, keep a speedy connection to the Intertubes, and do all the things that needs to happen for the magic pipe between my website and you guys, I’m doing all that and trusting someone new to help me do it better. Interesting, Captain. How are they doing this?

    squire3 CloudFlare has a few tricks to do this: CDN (content loads faster if it’s stored local to the people visiting the site), content optimization (minimizes and compresses page content), security (protecting you from DDOS and SQL injection), and analytics.

    Except when I look at that list I think that I already use mod_pagespeed to minimize and compress my content, mod_security to protect me (also Config Server Firewall for the DDoS stuff), and analytics is done by my server or Google. For me, that means the only thing they’re offering that I don’t have is a CDN. I read up on CloudFlare’s CDN, and they tout not having the weight of 15 years legacy crap. That’s a tricky edge to dance on, since they also don’t have the experience of those 15 years, or the network. In fact, looking at their network map, they have nothing in South America. Guess what the number two location is for people visting my sites? Brazil.

    And this, my children, is why you study your stats to understand who is visiting your site, where from, why, and with what browsers. Right away I can see that CloudFlare, while interesting, doesn’t seem to have any benefit for me. If I decide that I want a CDN, it’ll probably cost me around $30 more a month, minimum, for my sites and what they have on them today. Oh but wait, you say, CloudFlare is free?

    Yeaaaah. I don’t trust free services very much. A free app, once I download it and put it on my server, I keep. A free service is hosted on someone else’s server, at their whimsy, and is supported as they see fit. Yes, this means I don’t trust Facebook or Twitter. A free service is interesting only in that it lets me try it before I buy it, and for that, I approve of how CloudFlare does it. But the problem is today I went to a website and saw this:

    cloudflareddos

    What did I do? I didn’t visit this website. They can brag about the whole 30ms response time all they want, but if I went to a website and hit a barrier like that, I stop because it’s getting in the way of my surfing. That was my initial quandary about CloudFlare after all. How can it provide all these awesome things without getting in the way? And it can’t for everyone. At first I thought it was because I was going through bit.ly and it worried I was a spammer (okay, fair enough), so I tried manually, and it was the same problem. I just went to the page normally now, and it’s been well more than “5 seconds” and the site still hasn’t loaded.

    I fundamentally dislike anything that causes my users to do ‘more’ to get to my content. I think that it’s more harmful than a slow site, and it’s more harmful than letting these bad eggs visit my site. The right place to block a naughty person is when they’re doing something naughty. If my IP is a range of DDoS attackers, that’s one thing. You shouldn’t be detecting as the page loads, delaying me almost 30 seconds, and then loading the page. This delay is supposedly for my protection (me the site runner, not the visitor). Okay then, what are they protecting me from?

    Part of CloudFlare’s service is something called a Web Application Firewall (WAF), which is fancy-speak for saying their computer looks at what people are coming to your site to do, what data they’re sending, and tries to figure out if they’re nice visitors (which it should let through) or naughty hackers (which it should block).(From WP Shine Cloudflare: Early Reports Question Effectiveness as Website Security Tool)

    WAF came up before, with Mod_security. And at this moment, I go to a picture. Here’s what Tony parsed from the data:

    Screen Shot 2013-03-20 at 10.10.03 AM

    He asked on Google+ what we took from that article, and my reply was “That the months I spent mastering mod_security was totally worth it.” If you don’t trust Tony’s numbers, you can read the full report on slideshare for yourself. Tony has the same feelings about Captcha as I do, by the way, though less strongly. I despise it more than I hate hotlinkers, and I hate hotlinking. Captchas are the worst barrier between content and consumer that was ever invented. They don’t work, they’re not accessibility friendly, and they are rarely implemented well. Hotlinking may be theft, but Captchas are shouting “No soup for you!”

    Which brings me to my point.

    What is CloudFlare doing? In plain english, can someone explain to me how it would benefit me? Ignoring the CDN aspect, the only WAF part I can see benefiting me is that CloudFlare (and Incapsula for that matter) essentially crowdsource the list of people who are ‘bad’ and shouldn’t access my site. Which is cool, and that I certainly like. It’s sort of like a Project Honeypot for baddies (and by the way, that would be a nice feature). Having the world bring in the list of bad people, as well as their patterns, and sharing that back out is a great way to keep everyone up to date quickly and seamlessly.

    I really just can’t see why I’d ever want to use CloudFlare. It would certainly be a cheap and easy way to put some possible gain on my site, but in the long run I feel that managing these things myself (or hiring someone to do it) would be a better business solution. It saves me from the dread blackbox spam killer, which means I always know what’s going on. Now I know not everyone is capable of handling all this themselves, but from what I’ve seen, most webhosts already have mod_security running. So lets drop the WAF argument from the table, and we come down to the best thing CloudFlare’s doing is acting as a CDN and compressing content. That’s not good enough for me. At that point, you may as well use Google’s PageSpeed Service

    I’m sure there are great reasons for using CloudFlare, but I just can’t see it.

    Quick ETA… Talking to a coworker, it occurred to us that I may just not be their audience. I’m too big already and I took care of most of what they do. I can look at this and think “If I just have a small site and I want to speed it up on a shared server where I have no root nothings” then it looks way more reasonable. But I’m not.

  • Version Control

    Version Control

    ControlI’ve noticed a lot of people don’t really ‘get’ what version control is, or how it works. This is most evident when you watch the WP plugin repository.

    Around 400 commits are made daily to the repository, which doesn’t seem like a lot until you compare it to core trac commits. Now WordPress Core never commits until they’re sure the version in trunk is functional (since many of us use trunk on live sites), something most devs don’t have to worry about unless they use trunk as ‘live’… Let’s not talk about that.

    Instead of going into code today, I’m going to explain a little theory and talk about what version control is, why you want to use it, and how to use it.

    If you’ve ever saved a document, had MS Word crash, and come back to be asked what version of the document you want to restore, then you actually already have an idea of what version control is. You’ve possibly just never had to do it yourself. Version control is a very simple concept: saving versions of a file so you can go back to the older ones. Sounds easy right? Yes and no. Most people are accustomed to a different way of saving.

    Traditionally, we’re taught about saving as something you do regularly and often. After all, we’ve all lost that crucial document by forgetting to save. Apple has been changing this on us, having a much more robust auto-save process, and a better restore. It coaxes you into a place where you don’t think about saving anymore. There’s no save button on my iPad, and I actually use it to write a lot of the time. That freaks me out still, but so far it’s worked really well.

    In contrast, I have my coding. I sit and write, save a lot, test a lot, and when I’ve got a change that works, I commit my code. You see, version control is like a second type of saving. One save is me working, and one save is me ready for someone else to test. But the best thing about version control is how it saves changes. You check in code, and SVN (or Git), and the server checks the changes and records what changed in a way that is easy to read. Here’s an example picked at random from the hundreds I skimmed today:

    An example of a diff
    KatePhiz’s checkin of changed code

    See how beautiful that is? Right away you can see what changed between versions! You, and anyone else who wants to troll a revision log, can see what you changed, which makes it easier for people to write a change log for you (if you happen to work that way).

    Where I see the most people making mistakes with version control is missing out on the two rules you need to adhere to, if you want to use them effectively.

    1. Never Delete Files
    2. Trust The Tool

    That’s it. That’s all you have to do. I see a lot of people correct files by deleting them and uploading new versions. This is flat out doing it wrong, but I can see why people think it’s what they should do. You have to trust your tool, and know that it will see the new file, tell the differences and record that. If you delete a file, it stops you from being able to compare it to earlier versions and that’s a problem. Bug tracking becomes much harder, as you no longer have a quick and easy way to see what changed. Sure you can compare two files, but you’ve now put up a barrier for the next guy who picks up your code. What if she doesn’t have that previous version? What if she has no way to get to it, and you were hit by a bus and are in a coma? Or not. But still, building barriers between yourself and the next person is lacking foresight. Worse, you build barriers for yourself.

    It’s hard to trust the tool, and just as it took me months to get used to the iPad’s way of saving files, you have to get used to how version control saves things. This isn’t FTP where you’re always replacing a whole file, you’re making incremental changes. A check in is not a delete and replace (unless you’re using some of the horrific tools I’ve had the dubious pleasure of using in the past). Getting past that one hurdle will make your life much better with any tool you use.

    As for never deleting, I come to this from working at a bank. We never deleted anything. Ever. Not once. You always keep all versions of your code, even the broken stuff, because legally you had to. But in the end, I feel that was a great practice to get into. Disk space is cheap and you won’t ‘run out’ of space any time soon. I keep a copy of every plugin’s latest version on my hard drive, and I’m not out of space.(I don’t have SVN checkouts, that would be a little much.) Also there’s no point to deleting your older code, since they’ll just show up in SVN anyway. It remembers everything you checked in. Ever.

    Now, I cannot give you advice on the best way to tag or branch code, as each person comes up with their own methodology. I will say this: Don’t use ‘trunk’ as your stable releases on SVN. Git’s a little different, but I wouldn’t use it there either. Instead, use trunk as your playground. When you have a functional change, commit that minor change. When you have a major change and it’s ready to go, tag it and release it. In that way, your beta testers can safely use trunk, and your normal users never get hurt. By the way, making frequent commits may feel like a smart move, but you don’t need to. Check in a working copy. Oh and don’t go back and edit your tagged versions, unless it’s a very minor change. Someone is sure to have downloaded it, just tag a new version and go forward. As I always explained to people at the bank “You don’t go backwards with code, you go forward.”

    I don’t expect this to be a perfect primer on how to use version control, nor was it meant to be, but I hope I demystified it a little. If you have great primers for people on SVN or GIT, aimed at first time users, please share them!

  • Dumping ms-files

    Dumping ms-files

    Trash Can On It's SideNOTE: You do not, under any circumstances, have to do this to continue using WordPress Multisite. I just wanted to see if I could.

    I have been toying around with this. Since WP 3.5 doesn’t use ms-files anymore, I wanted to see how much of a pain this would be. While I was at the DMV (I now have a California Drivers License), I started sketching out the methods and possibilities for how one might do this as safely as possible, without bolluxing up your site. I got about 90% successful with it, so this is something I would never suggest you do on a massive, live site. If you compare it to what I did to move images from uploads to blogs.dir for the main site, it’s far more complex and annoying.

    Do you have to do this? Again, no! Everything will work just fine. Can you do it? Sure. Why did I? Support. I already know all the mishegas with blogs.dir, but the new location, and it’s lack of ms-files, promises other, weird, errors. I want to repeat, you don’t need to do anything. The Core Team was fantastic with the work they did to turn off ms-files for fresh 3.5 and up installs is nothing short of phenomenal. This was just to assuage my curious nature, and learn more about the way things work.

    You ready? Here we go.

    I decided to move the images at http://test.ipstenu.org/ to start with, so everything will use that as my example. This is after I played with it on a local site and got my steps mostly solid. I knew that live is always different than test, so I backed up the DB first (always, always, backup first!) and went to town.

    Move the images

    This is obvious. Move your images from blogs.dir/SITENUM/files/ to /uploads/sites/SITENUM/ (or make an alias). I went and did it via command line in the /uploads/sites/ folder, doing this:

    $ cp -r ~/public_html/wp-content/blogs.dir/10/files .
    $ mv files 10
    

    Lather, rinse, repeat. I could have scripted it, but I was working out the kinks until I had two left.

    Edit all sites

    Upload Path, Upload URL Path and Fileupload Url. You can blank them all out.(Corrected thanks to Nacin.)

    Blank me out

    Since you’re blanking it out for everyone you can probably do this via SQL, but since I was doing the sites one at a time, I did them one at a time.

    Fix the Database
    Search/replace each posts table for each site, changing /files/ to /uploads/SITENUM/

    UPDATE wp_10_posts SET post_content = REPLACE (
    post_content,
    '="http://test.ipstenu.org/files/',
    '="http://test.ipstenu.org/wp-content/uploads/sites/10/');
    

    Full Trash, ColorWhy did I do it that way? Because of this blog. I talk a lot about code here, and I know I’ve talked about moving files around before. If you don’t do that, you’re okay with a less precise search, but why bother? This works, it’s safe, and I’d use it again.

    That got annoying really fast. I went and grabbed my favorite Search And Replace for WordPress (and any other database) tool. Seriously I love that. I used that to follow up, change everything, and it actually worked really well for me.

    Another DB Fix!

    One of the changes in 3.5 was turning off rewriting. This took me forever and a day to find. After I did that, my images showed up fine, but the little buggers kept uploading to /files! Turns out it was all because of the site option ms_files_rewriting

    The way I got around this was by putting the following in my wp-config.php file:

    define( 'UPLOADBLOGSDIR', 'wp-content/uploads/sites' );

    And then I ran this in SQL to turn off ms_files_rewriting. Or so I thought. More in a second.

    INSERT INTO `my_database`.`wp_sitemeta` (`meta_id`, `site_id`, `meta_key`, `meta_value`) VALUES (NULL, '1', 'ms_files_rewriting', '0');
    

    I came up with that after reading through /wp-includes/functions.php line 1515.

    For most sites, this worked, but in my later work, I determined that it actually wasn’t working. It was ignoring this. I don’t know why, but every test I did merrily ignored this setting, so I finally growled and wrote this mu-plugin function:

    function stupid_ms_files_rewriting() {
            $url = '/wp-content/uploads/sites/' . get_current_blog_id();
            define( 'BLOGUPLOADDIR', $url );
    }
    add_action('init','stupid_ms_files_rewriting');
    

    It’s stupid simple, it’s probably not a good idea, but it works for the three sites that have the stupids.

    Finish up .htaccess.

    .htaccess, remove the ms-files.php line for ms-files, or comment it out. This is pretty simple.

    Empty TrashWhy not move the main site to /uploads/?

    Because of the way I fixed the uploadblogsdir. It defaulted everyone to /sites/ and after an hour I said fuck it.

    Any weird problems?

    Yeah, two sites (this one and my grandmothers) decided that they wanted to be repetitious and spat out URLs like this: wp-content/uploads/sites/8/sites/8/

    Since that wasn’t right at all, and I was a little too eggnoggy to parse why, I did this:

    RewriteCond %{HTTP_HOST} ^taffys\.org
    RewriteRule ^wp-content/uploads/sites/8/sites/8/(.*) /wp-content/uploads/sites/8/$1 [L,R=301]
    

    I swear I have no idea why three sites got stuck with /files/ and two more decided to double down, Vegas style, but frankly I’m pleased I got through this far on my own.

    I can’t stress enough that you do not have to do this!

  • Subdomain vs Domain

    Subdomain vs Domain

    When two words are very similar, it’s easy to get confused. Which witch is which? Whether, weather, and wether. Affect vs effect… Okay, you know, English sucks. We have way too many words that will drive you to drink, and if you know anyone who’s learned English as a second language, please take time to tell them how amazing they are. My father’s wife is Japanese, bilingual in French, and is learning English. I know a smattering of French. Our conversations are fantastically amusing and thankfully we have great senses of humor.

    Because I’m that familiar with the crazy of my native language, I have no surprise that people get subdomains and domains confused. Here’s the basic statement:

    A subdomain is not the same as a domain.

    That’s it. But since I don’t expect everyone to know what the heck I just said, I’m going to explain. Remember, don’t think you’re stupid for not knowing this! You can’t magically know everything, you have to learn it, and there are people like me who want to help you. Where the confusion kicks in isn’t that we call it a ‘subdomain’ but that the official definition is “a subdomain is a domain that is part of a larger domain.” So we’ve just said a domain of a domain, and yet here I am pushing you and saying that a subdomain isn’t a domain when it clearly is.

    It is and it isn’t.

    • A domain is pretty simple: elftest.net is a domain. It’s the solid basis that all websites are built on.
    • A subdomain is a subset of the domain: tools.elftest.net is a subdomain on elftest.net.

    Notice how ‘tools.’ is in front of elftest.net? That extra period between tools and elftest is how we know this is a sub domain. The .net part is called the ‘Top Level Domain’ and any time you see www, that actually isn’t a subdomain, but a special term… You know what, let’s break this down with a picture.

    Domain Example

    You can ignore protocol for now (we can get into that another time). What we’re looking at is this:

    URL: http://www.example.com/index.html
    Top-level domain name: com
    Second-level domain name: example.com
    Host name: www.example.com OR example.com

    Why is www special? It has to do with a lot of boring history, but suffice to say that used to be how we knew it was a webpage! Now we use http:// to say ‘This will be a webpage’ so many of us (myself included) feel that www is unnecessary and just makes URLs longer. However because of history, http://www.elftest.net and http://elftest.net will forever point you to the same place. This actually means that www is a subdomain, but it’s a very special one that points to the same place as no www at all. In very rare cases, a fancy website will redirect www and non-www to different places, but this is the exception, not the rule. Good SEO practices are to have the www and non-www point to the exact same place.

    The meat of the matter is that most of the time, when someone asks ‘What’s your domain?’ they really mean ‘What’s your host name?’ My host name is elftest.net (or ipstenu.org or halfelf.org…. I have a lot of domains

    Now let’s look at a subdomain.

    Subdomain Example

    They look shockingly similar, except that instead of www in front, I have sub. So what’s the deal here? Well because I’m using something other than www, I’ve designated sub.example.com as a subset of example.com, and thus a subdomain. Yes, it’s backwards. Sub should be below or behind, but remember, we’re calling .com the top level domain, so right-to-left this makes more sense.

    I know. It’s all clear as mud. Even writing this I sat there and muttered “This stuff is nuts.” I know all this didn’t explain everything as clearly as I could wish, but I’ll break it down into the simplest terms that, while not 100% technically accurate, will tell every decent web tech what you mean:

    When someone asks “What’s the subdomain?” you answer ‘sub.example.com’

    If someone asks “What’s the domain?’ you say ‘example.com’ (sometimes they’ll ask “What’s the main domain?”)

    If you’re on Multisite and someone asks “What’s the mapped domain, and what subdomain does it point to?” you say ‘mappeddomain.com and it points to mapped.example.com’

    And never ever use domain mapping plugins for your subdomains. Those are for grownup domains only, not your subdomains.

    For extra credit: Third level domains are what you get when you see things like example.co.uk – example isn’t a subdomain here, it’s the main domain. co.uk is the TLD. Why third? Well, we’d already used sub and second, and we needed some way to say that this is part of the primary URL, and not a subset. Also geeks love to confuse people.

  • Cloudy With a Chance of Upgrades

    Cloudy With a Chance of Upgrades

    We’re all being seduced by the cloud. Amazon’s AWS has become so popular and so, seemingly, inexpensive, that people are looking at it to run a website, instead of traditional hosting. Understanding the cloud actually was the first post on this site, and while two years ago I struggled to comprehend it, today I find myself at a loss many times when asked ‘Do I need the Cloud?’

    I don’t need the cloud for webhosting, and you probably don’t either.

    My Sky

    Yes, I have a semi-cloudlike host on LiquidWeb right now, for seven of the ten domains I manage (the other three are all on their own hosts, one of which being my me Elf Dreams, where I talk about DreamHost stuff). I don’t plan on moving the other seven simply because it’s a massive effort. It’s easier to move yourself across country than it would be to move all my sites, re-build the server as I need it, get the code for the new OS (CentOS vs whatever I move to). Heck, I’m dreading upgrading to CentOS6 because of how annoying that is.

    The point is that I do know and understand what the cloud is and does. It’s very cool, if you’re big enough to need it. Most indie sites are not.

    How the cloud works, in general, is based on the shared resources principal. Everyone shares all the resources in the cloud at all times. If you think back to shared hosting, everyone shares all the resources on that server. The difference between the cloud and the shared is that the cloud has infinite expandability (kind of, but you get the idea), where as a shared host is limited to what it is physically. But when we were back on shared hosting, we used to have a problem with bad neighbors. You know, the other guy on your server who got tweeted by Felicia Day or Wil Wheaton, and suddenly all the sites on your server went down like a bad quiche.

    That can happen on the cloud too.

    It’s not exactly the same, but when you’re on a cloud, you’re on a server with a lot of virtual machines (VM). A VM is a “completely isolated guest operating system installation within a normal host operating system” which is a confusing concept. The reason they’re good is that a VM isolates your hardware in a funky way, making reboots faster, while letting your VM instances share a bunch of hardware and become faster. The flip side to this is that you’re still on a real server. Instead of everyone sharing the same CPU/RAM, you get partitions so I can only use X amount and you can only use Y, and thus we don’t kill each other, until we start slaughtering input/output.

    Disk I/O (input/output) can be best explained if you’re an older computer user. Remember when we used to play “Where in the World is Carmen San Diego?” and we’d press a button to ‘travel’ and that old floppy drive would grind like a cheap espresso brewer? That’s I/O. The disk is being read and the data is being input from the disk and output to your Apple IIe. The basic concept of this still exists, and it makes sense when you remember we have to read the data off somewhere. So if you happen to be on a box that is getting a lot of traffic, and using a lot of disk I/O, then you’re going to be slow.

    Hole in the SkyThere are ways to mitigate this, of course, but that isn’t the problem. The problem is that you don’t always know that you need to, or how you should. And even if you do know, it’s not always very easy to do it, so the little guy, who doesn’t have the resources to do it themselves, or the money to hire someone, are left hoping that they’re okay, and often end up paying more than they need to.

    And this is why you and I don’t need the cloud yet. The cloud is something I might need, but for now, there are no benefits I don’t already have with a well optimized server and a well built site. If I was a bigger site, or a company, I’d be looking at it as my next upgrade, and studying my past growth. It took me almost 10 years to grow to need a VPS, and it will likely take me at least 5 more before I need to seriously consider cloud. By then, something new will be the big thing, so the best thing I can do is study nginx and keep a finger on the pulse of web technology.

    Of coures, the one major advantage to the cloud, and the reason I still see it as being around for a while, is the ability to scale up and down. A webhost provider can cannily utilize that to provide scalable shared hosting, so if you have a bad neighbor, they can be scaled up with less impact to you. But so far as I know, we’re only there with VPS demi-cloud providers right now. Give it time, and the cloud as we know it today will be tomorrow’s low-end hosting.

  • Subdomains Back Where They Belong

    In my last post, I talked about how I did something dumb with my subdomains.

    How did I install subdomains?

    Stupidly. Or rather, stupid for DreamHost. See, many other hosts, when you make a subdomain, come up with this structure for your files:

    /home/ipstenu/public_html/
                             /subdomain1
                             /subdomain2
                             /addon.com
    

    So when I made my subs, I made them similar to that. What DreamHost does is this:

    /home/elftest/elftest.net/
                 /db.elftest.net/
                 /addon.com
    

    I suggest you do the default! If you didn’t, however, there is a way to fix it. It’s a two step process.

    1) Move the files.

    I found it easier to do this in Unix $ mv elftest.net/trunk trunk.elftest.net (this moves and renames all in one). If you wanted to do it via FTP, just drag and drop, then rename.

    2) Change the location in Panel.

    Go into your panel, edit the domain, and change elftest.net/trunk:
    before: elftest.net/trunk

    To trunk.elftest.net:
    after: trunk.elftest.net

    Give it 5 to 10 minutes, and you’re done!

    tl;dr to the whole thing is this: Trust the tool! (Sharon, that was for you!)