Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: performance

  • Blocking Referrer Spam Server Wide Sucks

    Blocking Referrer Spam Server Wide Sucks

    A while back I talked about Referrer Spam in Google Adsense and I mentioned how you could block referrer spam with some .htaccess calls. That’s cool, but when you have 12 sites on a server, making this one more thing to manage per site is a pain in the ass. Well okay, what can we do constructively? And sadly the answer is “Not much.”

    First of all, forget the idea of using a robots.txt file. If they were real seo crawlers, they would honor this. They don’t and that’s how I know they’re evil.

    Secondly, this will only work if you have server wide access. That should be obvious, but server wide settings need server wide access, and that’s just how it is. I say that it sucks because it can be a little complicated and messy to understand where you put things.

    If you have your own server, like I do, then you can make a custom VirtualHost template

    Since I’m using Apache 2.4, I made local templates:

    $ cd /var/cpanel/templates/apache2_4/
    $ cp ssl_vhost.default ssl_vhost.local
    $ cp vhost.default vhost.local
    

    If you’re using 2.2 then the files are in /var/cpanel/templates/apache2_2/ instead. In each file, I added this to the top of the VirtualHost settings.

      RewriteEngine On
      RewriteOptions Inherit
    

    What that does is it tells Apache that it should inherit Rewrite rules from the main server. That means each virtual host (i.e. each website) will abide by any rules in the local settings.

    Where you put this in that file can be weird. I ended up looking for this section, and putting it right below:

    [% IF !vhost.hascgi -%]
      Options -ExecCGI -Includes
      RemoveHandler cgi-script .cgi .pl .plx .ppl .perl
    [% END -%]
    

    Put that in both files. Because you use HTTPS, right? Then you need to bounce httpdconf:

    /scripts/rebuildhttpdconf

    Since I’m using WHM, the next step is to go in to the Apache Configuration section and open the Include Editor. Then you want to add your blocking directive in ‘Pre-Virtual Host Include’ for All versions. If you don’t use it, you’ll want to edit /usr/local/apache/conf/includes/pre_virtualhost_global.conf and bounce Apache after.

    As you can see, I have some content in there already.

    My pre_virtualhost_global.conf file

    I added this below:

    <IfModule mod_rewrite.c>
      RewriteEngine on
      RewriteCond %{HTTP_REFERER} spammerseocompany\.com [NC,OR]
      RewriteCond %{HTTP_REFERER} keywords-monitoring-your-success\.com [NC]
      RewriteRule .* - [F,L]
    </IfModule>
    

    Does it work? Yes. It blocks ‘spammerseocompany’ from all my domains on my server. I put in the other URL since that’s the one they have that’s currently spamming the heck out of my stuff. There are a lot of other options with Apache 2.4, like sending them a 403 and so on. You should read up on using mod_rewrite to control access and pick the method you find most sustainable. For example, you could single like it:

      RewriteCond %{HTTP_REFERER} (spammerseocompany|keywords-monitoring-your-success)\.com [NC]
    

    I find that a bit clunky.

    If you’re using nginx, you’ll want this I believe:

    if ($http_referer ~* "keywords-monitoring-your-success\.com|spammerseocompany\.com") {
        return 403;
    }
    

    A big note of caution here. If your list gets too long, you’ll end up slowing your server down. A lot. So keep it as simple as you can. I find that CSF does a dandy job of blocking the most of my trouble makers, and I only need this for the unnamed spammerseocompany because they don’t abide by the common rules of robots.

    If, one day, they do, I will stop blocking them and allow their robots. As it stands, they’re idiots and need to go away.

  • He Ain’t Heavy, He’s My Website

    He Ain’t Heavy, He’s My Website

    I get asked this a lot from a more technical perspective.

    All the myriad reasons you have to use, or not use, Multisite aside, the question that is often wondered and confused over is the one where we’re trying to balance out ease of support vs cost. You see, a lot of the time, people consider multisite because they’re on a webhost who only allows one domain for your site. You may be able to use add-on domains but they also may be limited, and the easiest way to run a hundred sites on one hosting plan is multisite.

    I have to remind everyone here, I would never, ever, in the history of ever even remotely consider running Multisite on anything less than a VPS. Yes. I said it. Keep in mind when I run a Multisite I’m always doing it to run a network of heavy duty sites. If I was just running a tiny private network, my goals would be different. But more than that, I keep in mind the realizations of the limitations of shared hosting. Shared hosted is tiny. Multisite is big. Match ’em up and you’ll be happier.

    What does this have to do with the heft of a network? Well if I have 100 separate sites and 100 sites on a multisite network, what’s the real, practical difference?

    • 100 separate logins vs 1 login on 100 sites
    • 100 separate sites to update vs 1 site

    That’s pretty much it. Yes, there are a dozen of little things (like it’s easier to restrict access to a single site) that come into play here, but when you start looking at the server itself, the practical differences when it comes to things like disk space, memory usage, process utilization, and emails, there is no negligible differentiation on your site’s performance.

    Yeah, 100 separate sites and 100 sites on a network will run pretty much the same on the same server, assuming the exact same level of traffic and use of plugins. That’s a pretty big assumption most of the time but in this one case, it’s safe. We’re trying to compare apples and apples, with only one difference: Multisite or Not.

    Of course, there are specific situations where a multisite will cause more damage to a server than a single site, especially if you’re doing a lot of cross-content manipulation (like including the RSS from one site into the sidebar of another). But it all really depends on if you have a lot of traffic. Yes, one Multisite blog getting hammered will hurt the others on the network, but it shouldn’t cause a significant CPU spike any more than two separate really massive single sites would on the same server.

    Foggy photo warning of heavy fog

    And there is one place where Multisite very much would do more work on the server than Single Site, and that’s with ms-files.php. That old magic that made your image URLs to be domain.com/files/2014/09/image.job. The way that worked was to pass images via .htaccess through the ms-files.php page and then generate the image. Yes, that caused more load. It’s part of why we don’t do it anymore, and why I suggest never trying that again.

    It’s funny, though. I’ve seen one host say that Multisite will use less by way of PHP processes because it’s one install, while another said it would use more because ‘Multisite’, and a third said there’s no difference.

    Obviously I don’t think there’s a difference.

  • La Vitesse 2: Cruise Control

    La Vitesse 2: Cruise Control

    Now you know all about caching and how it works. But what about speeding up the server itself?

    Near the end of the previous post, I mentioned that all the caching in the world didn’t really speed up the server. This is somewhat of a lie, since if, say, you’re using Varnish to cache your site, then your visitors won’t be hitting your WordPress install, speeding it up for you to do work. But it’s not the full picture.

    WordPress is big and it’s getting bigger and more complex and more impressive. So is Drupal and … well pretty much everything else. In order to make your site do more, like all that super fancy layout transformations, we have to upgrade and innovate. But then you start getting into extending these apps, like using custom fields and extra meta values to store more information so you can change search results in more impressive ways! Your site scrolls and changes backgrounds! Your site dynamically changes what products are available based on check boxes, without reloading!

    What did that have to do with caching? Well … how do you cache things that aren’t static?

    A Cruise Ship

    My coworker, Mike, likes to talk about things that should be cached and things that should never be cached. Things that have to be dynamic and run without a page refresh, like ajax and javascript, can be cached to an extent, since those plugins and Varnish will just keep the code in-line, which means it’ll still run. But when you start looking at dynamic things like shopping carts, we hit a new world and a new wall. But I’m not even talking about that level of caching. I’m talking about going back a layer into the part where WordPress (or any app) has PHP query the database. If we speed that up, caching safe content, can’t we speed things up? You bet we can!

    A few years ago I talked about APC and how I was using it to speed up PHP by having it cache things. Then less than a year later, I switched to Zend and memcached. I did those things because I decided that it would be better to have my server, a workhorse, do the hard work instead of asking WordPress to do it. And this was, in general, a pretty good idea.

    Memcached is an add-on for your server, and acts as “an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.” In English? It remembers what you did and can retrieve results faster because of it. Of course, you have to tell your apps to play well with it, but once you do, the backend of your site starts to speed up because, hey, caching! The Illustrated Memcache’d story is kind of an awesome explanation (the images on the archive page are broken, but the links work). And yes, I do use memcached and ZendOptomizer+ on my server, because it really does make things faster, even when two of the ten domains are having a 10k pageviews in a day.

    I keep telling everyone my server isn’t overkill….

    The point of that, though is the other end of speed is totally separate from your application code. When you install WordPress, you know it runs SQL and PHP, so if you can make those faster, WordPress will be faster. The same applies to speeding up Apache, maybe by putting Nginx in front of it, or maybe by tuning the hard to understand settings in httpd.conf to make sure that it elegantly handles the 300 people who come running to your site at once.

    But unlike plugins, this aspect of server speed is the hard stuff. When you look at WP Super Cache, you think it’s complicated until you see W3 Total Cache. Both are amazing and awesome, but they’re giving you a ton of options and asking you to make decisions. The same is true of the servers, but now you have even more options. The world is your oyster, and we haven’t even looked at hardware.

    For me, it boils down to how can I make my server do it’s job fast and efficiently. If that means I hire someone to install software to cache better, or I pay for more memory, then that’s what I do. I also trim the fat and get ride of everything I’m not using and don’t need, so my server doesn’t have to do more than it needs to. And one day, one day, I’ll be looking at nginx for everything.

  • La Vitesse

    La Vitesse

    A little bit ago I talked about Varnish, how to install and configure it, and why I’m not using it at the moment. The actual goal of all this stuff is to speed up a website. Site speed is an insanely fickle beast, and measuring it without going insane is nigh impossible.

    When we talk about site speed we don’t just mean how fast the site loads. We mean how well it performs on the front and back. Does it load everything it needs to be a page in a non-jumpy way? Does it load and then magically change to another format because you’re on an iPad? Does it hang and then load? We mean all those aspects that go into a site and make it zippy.

    Which brings us to caching. The goal of caching is blindingly simple: Don’t put extra load on the server while serving up webpages and make it faster. The how of caching is crazy.

    When I talked about Multisite Caching, I brought up the different types and why and where I saw each one being used. But I didn’t really explain why very well. In order to understand it, you need to understand why we need to cache.

    If your website was all plain, static, HTML files, it would be really fast. The web was (initially) built for that stuff and it was all basic. “I want this page.” And the server would reply “Okay, here it is and some images.” When we start adding in stuff like dynamic PHP (WordPress), we put more load on the server because for every time someone visits your site, it has to ask WordPress “Do you have this page?” and WordPress has to check, see if it does, generate the output, and then it loads. The more complex the site is, the more big images and javascript fancy stuff, the slower the site gets.

    Logical stuff, right? You’re also going to be limited by how fast your server is and how much of it you can use. If you’re on a dedicated server, the limit is your hardware and bandwidth pipe. If you’re on shared, though, the limit is lower, and really varied and complicated. While I mention a ‘bandwidth pipe’ and we techs always joke about the sturdy internet tubes, it’s not a fully accurate analogy, and even with all the bandwidth available in the world, the speed of your server is going to limit you far more.

    People sledding

    There’s a phenomena called the “noisy neighbor” which impacts people on shared hosts a lot and is a lot of why people get confused about the bandwidth thing. You see, if you’re on shared servers, you share services. If one of your neighbors uses a lot of memory, there’s less available for you. This makes perfect sense, and hosts combat this by limiting how much you can do. I know a lot of companies say that you have ‘unlimited’ space and bandwidth, and while that’s true, it doesn’t mean you get to use all the power available to the server. Basically on shared servers, when you see ‘unlimited’ you should read it as ‘unlimited until you start making other people’s sites run worse.’

    What does this have to do with caching? It’s the reason why we cache! WordPress does not make static HTML pages at all. If you look on your server for a file named ‘about’ you won’t find one. Instead, WordPress uses the .htaccess file to magically run your request for example.com/about/ through the index.php file, which then checks the database and pulls the content for that page. It’s entirely dynamic, and every single page request is run through the database. And yeah, that gets slow over time. The dynamism is fantastic though, and that’s why things like comments magically update the rendered page right away.

    Thus, in order to make our super dynamic websites run by WordPress run faster, we turn to methods to generate static file caches. Converting a WordPress page from the PHP queries to a static file is complicated, and in essence every single tool has to generate that dynamic page, copy the output, and save it to a location where it can be pulled from directly. At the same time, it has to alter the server in some way to say “If I have a static file, use that instead.” When you use a plugin, generally it does this via your .htaccess file.

    The obvious problem with this is that while the page may be faster for visitors, you’re still putting load on your server by having it generate these html files and serve them. And you, the logged in user, won’t get the cached page, generally, not even with something as cool as Varnish, so we have to still consider the rest of the server.

    Speaking of Varnish … the simplest explanation I can give you about it is this: Instead of having WordPress use a plugin to generate the page, Varnish lets WordPress load, takes a snapshot of the resulting page, and saves it somewhere else. That means that in-between your visitor and the WordPress install is the Varnish cached page, which means the load is off your server more! No more loading the html page, Varnish is going to do it and make it a little faster. You’ll still want a plugin to allow WordPress to tell Varnish to delete pages, but it can significantly run faster.

    But … what about the server speed itself? Is there a way to cache that and speed it up to? There is! But that’s a longer post, all it’s own.

  • Nginx Proxy

    Nginx Proxy

    Will you all quit nagging me now? I kid. Very few of you actually got on my case about nginx.

    Nginx is a HTTP server and reverse proxy, which is a really fancy way of saying “It runs websites.” Most people still use Apache, and Nginx is perceived as being faster with how it serves static files. My issues with it are twofold:

    1. I don’t just run WordPress here
    2. I need my .htaccess for those other things

    Yes, there are older webapps that don’t support nginx. But at the same time, what if I could have my cake and eat it too? What if I could use nginx to serve up the static files and apache for the rest? That would mean I would be able to install nginx on my CentOS 6 box as a reverse proxy.

    It’s actually not that hard, but let me explain why this is a good idea. Apache takes up a lot of server memory, which nginx does not. Nginx is awesome at static files, but not the best at dynamic, and you’ll need a module like php-fpm for that. But… One of the sites on this server has a gallery that takes up 50% of the webspace of all accounts on the server. Stop and ponder that for a moment. While I do have server-side caching (memcached and ZendOptimizerPlus) running, adding in nginx up front means my images would be served faster.

    Speaking of faster, by putting nginx in front, it makes it so only true http requests get passed on, which protects against attacks like DDOS and other brute-force attacks that aren’t nginx-related. I still have (and use) mod security, of course, as well as ConfigServer Firewall. More on that later. Let’s get this sucker installed!

    yum install nginx
    

    Nginx Community (background image enhanced)Install Nginx

    Oh, sorry? Was that supposed to be harder? It’s not. This is pretty much the simple part.

    Install nginxCP

    Normally my next step would be to configure nginx (see Ben Tasker’s CentOS: Using NGinx to serve static files and Apache for dynamic for details) and apache. Unlike Ben, I have WHM on my server, which means when I build apache it’s with Easy Apache, which means any edits I make to my httpd.conf file get lost when I rebuild, which I do for when I need to upgrade PHP. It’s not super rare, nor is it super common. Still, I don’t really want to mess with it more than I have to.

    So I took the time to research my options, and came up with nginxCP and cpnginx. The real difference is CPNginx comes with a service to help you (yes, it’s pay). Since I’m an Open Source woman, I did nginxCP. The cost ($55 a year) for cpnginx was not a deal breaker, since they had a trial.

    The install directions are quick:

    cd /usr/local/src
    wget http://nginxcp.com/latest/nginxadmin.tar 
    tar xf nginxadmin.tar
    cd publicnginx
    ./nginxinstaller install 
    

    A reboot of the httpd service and now all my non-SLL sites are served up on nginx!

    Configure nginxCP and Apache

    I love when I can just slap it on and go, but I run ConfigServer Firewall to save my ass from DDoS, so I knew I’d be getting a slew of ‘Suspicious process’ alerts from my firewall once I added in a new feature. The fix is to add this to csf.pignore:

    # nginxCP
    exe:/usr/local/sbin/nginx
    

    That was all I needed! Emails, by the way, from CSF have plummeted. I was worried I wasn’t getting any emails or logs from my servers for a while. But then I re-checked my logs to see what was happening. See I used to get a lot of emails like this:

    Failures: 5 (mod_security)
    Interval: 300 seconds
    Blocked:  Permanent Block
    

    They stopped, because nginx didn’t let them get to Apache and CSF. Mind. Blown.

    Results?

    Some server stats, showing a 0.00 load average for 1minEverything worked out of the gate, so I sat and watched my server load. Per-expected, since an httpd restart flushes my PageSpeed cache, the load spiked at .52 (this is still low). Then it dropped to .31 and then to .20, and it pretty much stuck around the .10 area. Memory, however, dropped for a while. That’s good! Slowly as things settled in, I made sure to post something new on a news/fan site, and watched things not go up very much.

    Nothing is weirder than watching your load drop to 0. The 93% memory is not a bad thing. On this server setup, I want to see somewhere around that at any point in time. The first time I saw it, though, I flipped out and my friend Benny (who is a cPanel goddess) calmed me down and explained how the ‘storm’ server worked. Real memory usage is about 40%, when load is above .40.

    Graph of server load, showing a big spike and then settling down

    The big spike was installing. The normal spikes at 14:00 and 14:30 were posts made in WordPress before nginx was installed. The ones at 16:00 and just after 17:00 are WP posts as well. That’s a nice change.

    Problems?

    SSL. Wouldn’t you know? Now you can use nginx as an SSL revese proxy, but it would obviate my cpanel add in, which right now seems a little smarter for long term sustainability than doing it 100% manually. The less I have to remember when I’m rebuilding apache, after all, the better.

    Still. Part of why I’m looking at nginx seriously is to speed up my SSL site which gets less caching by it’s nature. Weighing the pros and cons, I decided to stick with nginxCP as my theory is that by speeding up the rest, SSL will in turn be faster since there’s less memory being sucked up. Oh and I did check cpnginx about SSL, and they don’t support it either. That’s alright for now. I’m sure the future will change.

  • Cloud Experiment

    Cloud Experiment

    While I’ve mentioned that I don’t ‘get’ CloudFlare, I took the time to buttonhole the guys at DreamCon and explain my issues. Many thanks to Maria Karaivanova for her presentation at DreamCon, too, which helped me a lot.

    Now, in so far as a ‘traditional’ CDN (where they host my images) goes, I don’t need it, but as a Cloud Proxy, I both understand and like it! The deal with a proxy is pretty simple: It’s an extra firewall between your server and users. Why is this faster? Because they have more servers than I do, which means they can handle a DDoS better than pretty much anything on Shared Servers will ever be able to do.

    Keep in mind, a VPS can handle a lot of this on it’s own. I could install Varnish and use that for caching, but it wouldn’t give me the ability to have multiple servers serving my content, and that’s what I’m looking for with my experiment here.

    It would be remiss if I didn’t note the more well known alternatives: Incapsula (starts at $19.99/month), Sucuri’s Cloud Proxy (starts at $9.99/month), MaxCDN (starts at $9.99/month), and Fastly (starts at $50/month). CloudFlare starts at ‘Free’ but its first paid offering is $20 a month for one website, $5 for each additional.

    On My Server

    I use ConfigFirewall (CSF) so I had to add in the CloudFlare IPs into csf.allow and csf.ignore. Yes, both, otherwise I got weird alert. This is pretty easy, though.

    Next I installed mod_cloudflare because I wanted to preserve the IP address without having to muck with plugins on everything. This particular site is my ‘Not all WordPress’ site after all. The catch is if I do it all manually, I have to redo it every time I upgrade via EasyApache(Don’t judge me). I already have to do that for PageSpeed. That said, cPanel suggested I read Installing mod_cloudflare on cPanel, so I did that and then ran EasyApache:

    Screen Shot of EasyApache

    So that was easy! By the way, TL Tech is one of my standard resources. They have a lot of tricks, and I’ve bookmarked ’em.

    Finally I checked out if there were issues with PageSpeed and CloudFlare. CloudFlare says no, but indicates the redundancy. That’s okay. I did an extra step of telling PageSpeed to not modify caching headers, as that’s something we did for DreamHost and Varnish (DreamPress!). Just add this to your .htaccess section for PageSpeed.

    ModPagespeedModifyCachingHeaders off
    

    On CloudFlare

    whohostsThis was straightforward. Follow their directions and it’s fine. I went for free, and fiddled with my Security Settings a lot. I hate captcha. And I know, I knooooow, the users for this site will cry if they get hit by one, so I turned my security to “Essentially Off” – This is the only way to get rid of Captcha. Sad panda. I also turned “Browser integrity check” on for now.

    In Performance Settings, I made Caching level “Simplified” and left the rest as default. Then I set up PageRules for WordPress and my other apps. You only get three rules with free (and 20 with the first level plan) so I made sure to free up the admin tools.

    On my webapps

    Last up, tackling purging caching. I’m far more familiar with this now, as I support a Varnish plugin that does much the same (and I did consider installing Varnish). The official CloudFlare plugin, for some reason, only serves the same purpose as mod_cloudflare, in that it restores IP addresses. But what I really want is a way to purge my cache with a new post. Pretend I’m saying this in that voice I used at WCSF… there’s a plugin for that: CloudFlare Cache Purge.

    Sadly there isn’t a similar plugin/extension for my other apps. And this is why I ended up at my current conclusion…

    Current Conclusion

    Sadly, even after letting it bake for a few days I determined it wasn’t quite right for me. Everything worked, and if my site was more static, it would be perfect. But this brought up the same problem I’ve had with all caches: my dynamic content gets hurt.

    cachingWhat is static that I can and should cache? JS, CSS, font files, images. What is not static? Blog posts, comments that are happening all the time, fast and furious. A gallery that needs to update. A wiki that has a deadline. Worst of all, it prevented two of my apps from being able to make their own ‘static’ cache in the background. Now really that means I shouldn’t have to make my static cache at all, but this brought up another issue. Coordinated pushes of content, where four separate apps update 1-3 pages each at the same time means I need to be able to purge those pages, right away. And right now, there aren’t extensions to do that.

    Of note: I noticed the exact same problem with Fastly and Varnish, so it’s not just CloudFlare, it’s a function of how these things are supposed to work.

    What would I need to make these desirable? Basically I need a way to purge my cache on the proxy efficiently, quickly, and selectively. Now that I work on the Varnish Cache at DreamHost, I’ve seen how deep the rabbit hole can go with this, however, and I know fully how hard this is. Proxy Caching is not for everyone. When you have dynamic content that changes for logged in users on the fly, it’s a pain. I mean, I use PageSpeed to compress and cache CSS and JS, and I have to flush it when I update my site design. Caching your caching is always going to be tricky, unless there’s a simple, one click, way to say “I’ve updated these pages, please purge them.”

    We’re not there yet.

    Recommendation

    CloudFlare is pretty awesome, actually. If you’re ‘just’ running a blog on shared hosting, I would seriously consider using it, especially in light of the various DDoS attacks out there. A cloud proxy will help you, if you don’t have server level access to tweak mod_security. The fact that CloudFlare gives you a ‘free’ option to test with, without having to give anyone your credit card info, makes it great for experimentation and puts it above the other proxies right now.

    But with all things, keep in mind your personal usage. It’s not just “Does this make my site run faster?” but it’s a lot of “Does this make my usage of my site better?” For me, they win on the first and fail on the second. Maybe one day I’ll change my workflow so cloud proxy, or Varnish, can be the answer, but that’s not today.