Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: apache

  • CLI Like a Webhost

    CLI Like a Webhost

    For the last ten years, I worked for DreamHost, which meant I had access to a lot of awesome commands that everyone ran to diagnose things.

    Well now I’m gone and I’m still a webadmin for my domains. And I have, as you all know, a weird guy who keeps going after me. I also have been running fandom sites for longer than WordPress has existed. I’ve had to learn a lot of tricks to sort out ‘Is this person so-and-so again?!’

    Now… I’m going to tell you a secret. You ready? Okay, most of those scripts hosts run? They’re just cleaned up shell commands you run on the server via command line (aka command line interface aka cli). And those commands? They’re actually pretty common, well known, and public.

    So here are some of the ones I use and why!

    Before You Begin…

    I have to step back a moment.

    Do you know where your log files are? DreamHost posted in their KB how you do that, but you will want to check your hosts:

    There are three caveats, and I know one is weird.

    1. Logs rotate

    Server space matters, so logs are regularly deleted to prevent your data from killing things.

    Right now I see this:

    -rw-r--r-- 1 root      root      3.5M Sep 16 09:08 access.log
    lrwxrwxrwx 1 root      root        21 Sep 16 00:49 access.log.0 -> access.log.2022-09-15
    -rw-r--r-- 1 username  server    1.6M Sep 12 00:51 access.log.2022-09-11.gz
    -rw-r--r-- 1 username  server    1.4M Sep 13 00:54 access.log.2022-09-12.gz
    -rw-r--r-- 1 username  server    1.6M Sep 14 00:11 access.log.2022-09-13.gz
    -rw-r--r-- 1 root      root      9.7M Sep 15 00:21 access.log.2022-09-14
    -rw-r--r-- 1 root      root       11M Sep 16 00:49 access.log.2022-09-15
    

    Tomorrow I’ll loose the 9-11 log.

    2. You need to know what your logs look like

    Every host tweaks the format of apache logs in a different way. You’ll see I use things like print $1 in my code, and for me I know that means “$1 is the IP address.” But that may not be what your host does.

    Look at the logs:

    192.0.114.84 - - [16/Sep/2022:00:49:05 -0700] "GET /wp-content/uploads/2019/10/Pure.jpg HTTP/1.1" 200 257552 "-" "Photon/1.0"

    And then count things. IP is , URL is , and so on.

    It can be a pain so please feel free to experiment and mess with it to get exactly what you want.

    3. You may need to use http logs for everything

    This is specific to DreamPRESS (the managed WP hosting) and is the weird thing, you always have to use the http folder even if you use https.

    Why? Well that has to do with how the server processes traffic. DreamPress (as of the time of this post) uses Varnish to cache and Nginx as an SSL proxy. That means when you go to https://example.com the server has nginx check the HTTPS stuff and passes it to Apache, which runs HTTP. Those logs are your apache logs, not your Nginx ones.

    Can you view the Nginx logs? Not at this time. Also they really are pass-throughs, so you’re not missing much. If you think you are, please open a ticket and tell them what you’re looking for in specific. Those help-desk folks are awesome, but the more clear you are about exactly what you’re looking for, the better help you get.

    Okay! On with the show!

    Top IPs

    Sometimes your site is running super slow and you want to know “Who the heck is hitting my site so much!?”

    awk '{ print $1}' access.log | sort | uniq -c | sort -nr | head -n 10
    

    This command will list the top 10 IPs that hit your site. I find this one super helpful when used in conjunction with an IP lookup service like IPQualityScore, because it tells me sometimes “Hey, did you know Amazon’s bots are hitting the heck out of your site!?”

    You can change that 10 to whatever number of top IPs you want to look for. That tends to be enough for me.

    If you know you have a lot of ‘self’ lookups (like you wrote something that has your server do a thing) you’ll want to try something like this to exclude them:

    awk '{print $1}' access.log | grep -ivE "(127.0.0.1|192.168.100.)" | sort | uniq -c | sort -rn | head -10
    

    Sometimes you just want to know what pages are being hit, right?

    Remember how I said you actually need to know what your log looks like? For me, $7 is the 7th ‘item’ in my access log:

    192.0.114.84 - - [16/Sep/2022:00:49:05 -0700] "GET /wp-content/uploads/2019/10/Pure.jpg HTTP/1.1" 200 257552 "-" "Photon/1.0"

    Counting is weird, I know, but the 7th is ‘/wp-content/uploads…’ so I know that the command has to use $7. BTW Photon there just means I use WordPress’s image stuff via Jetpack.

    awk '{print $7}' access.log | grep -ivE '(mod_status|favico|crossdomain|alive.txt)' | grep -ivE '(.gif|.jpg|.png|.js|.css)' | \
     sed 's/\/$//g' | sort | \
     uniq -c | sort -rn | head -25
    

    That returns a unique list:

        862 /xmlrpc.php
        539 /wp-admin/admin-ajax.php
        382 /wp-login.php
         75 /wp-cron.php?doing_wp_cron

    And it’s not a shock those are the high hits. Nice try folks. I use things to protect me. But before we get into that…

    IPs Hitting a Specific Page

    Now let’s say you’re trying to figure out what numb nut is hitting a specific page on your site! For example, I have a page called “electric-boogaloo” and I’m pretty sure someone specific is hammering that page. I’ll do this:

    awk -F'[ "]+' '$7 == "/electric-boogaloo/" { ipcount[$1]++ }
        END { for (i in ipcount) {
            printf "%15s - %d\n", i, ipcount[i] } }' access.log
    

    That spits out a short list:

       12.34.56.789 - 3
      1.234.567.890 - 4

    It’s okay that the command spans multiple lines. Check those IPs and you might find your culprit.

    What ModSecurity Rule Hates Me

    I have a love/hate relationship with ModSecurity. My first WP post (not question) in the forums was about it. It’s great and protects things, especially when you tie it into IPTables and have it auto-ban people… Until you accidentally block your co-editor-in-chief. Whoops!

    For this one, you’ll need to ask the person impacted for their IPv4 address. Then you can run this:

    zgrep --no-filename IPADDRESS error.log*|grep --color -o "\[id [^]]*\].*\[msg [^]]*\]"|sort -h|uniq -c|sort -h
    

    That will loop through all the error logs (on DreamHost they’re in the same location as the access logs) and tell you what rules someone’s hitting. Then you can tweak the rules.

    Of course, if you’re not the root admin, you’ll want to ping your support reps with “Hey, found this, can you help?” They usually will.

    Don’t feel bad about this, and don’t blame the reps for this. ModSecurity is constantly changing, because jerks are constantly trying to screw with your site for funzies and profit (I guess). Every decent host out there is hammering the heck out of their rules constantly. They update and tweak and change. Sometimes when they do that, it reveals that a rule is too restrictive. Happens all the time.

    Long Running Requests

    Another cool thing is “What’s making my site slow” comes from “What processes are taking too long.”

    awk  '{print $10,$7}' access.log | grep -ivE '(.gif|.jpg|.jpeg|.png|.css|.js)'  | awk '{secs=0.000001*$1;req=$2;printf("%.2f minutes req time for %s\n", secs / 60,req )}' | sort -rn | head -50
    
    

    That gets me the 25 top URLs. For me it happened to list MP4s so I added that into my little exclusion list where .gif etc are listed.

    Who’s Referring?

    A referrer is basically asking “What site sent people here.”

    awk '{print $11}' access.log | \
     grep -vE "(^"-"$|/www.$host|/$host)" | \
     sort | uniq -c | sort -rn | head -25
    

    This one is a little weird to look at:

      15999 "-"
         31 "www.google.com"
          8 "example.com"
          4 "binance.com"

    The ‘example.com’ means “People came here from here” which always confuses me. More impressive is that top one. It means “People came here directly.” Except I know I’m using Nginx as a proxy, so that’s likely to be a little wonky.

    What are your favourite cli tools?

    Do you have one? Drop a line in the comments! (Be wary about posting code, it can get weird in comments).

  • The Revolution of .htaccess and Multiple Domains

    The Revolution of .htaccess and Multiple Domains

    I mentioned this in a post about jiggering Google and Multisite WordPress, and my buddy Jan went “WOW!” So I’ll start with the best thing ever in Apache, to me at least. Apache 2.4 allows for real if/else statements in .htaccess.

    Benjamin Franklin's glasses - because revolution jokes are funny

    In 2012, I wrote about how I did a lot of request header detection in order to make myriad blocks of checks to sort out my .htaccess anarchy. As I explained here, I have to do a check for the domain for each and every rewrite rule:

    Why did I duplicate the RewriteCond? Typically, you cannot use multiple RewriteRule statements following a single RewriteCond. That means for ever call I make to a domain, I can use but one rewrite rule. There are ways around that, but none of them worked well for me.

    It’s ugly on Apache 2.2. Since that time, however, I’ve moved to Apache 2.4, and the world is vastly different thanks to If/Else calls!

    Let’s take this old section I had from my move (about 6 years ago) from blog.ipstenu.org to ipstenu.org, as well as some permalink changes and a change to my uploads folder:

    # Ipstenu Moves
    RewriteCond %{HTTP_HOST} ^blog\.ipstenu\.org
    RewriteRule ^(.*) https://ipstenu.org/$1 [L,R=301]
    RewriteCond %{HTTP_HOST} ^ipstenu\.org
    RewriteRule ^blog/([0-9]{4})/([0-9]{2})/(.*)$ https://ipstenu.org/$1/$3 [L,R=301]
    RewriteCond %{HTTP_HOST} ^ipstenu\.org
    RewriteRule ^blog/(.*)$ https://ipstenu.org/$1 [L,R=301]
    

    Taking each block at a time, the first was easy:

    <If "%{HTTP_HOST} == 'blog.ipstenu.org'">
        RewriteRule ^(.*) https://ipstenu.org/$1 [L,R=301]
    </If>
    

    Pretty simple. If the host is blog.ipstenu.org, redirect. And it mostly worked. Except where http://blog.ipstenu.org/dsfasfsdf sent me to https://ipstenu.org/home/ipstenu/public_html/dsfasfsdf … Which isn’t good! The fix here is that you use RedirectMatch instead of RewriteRule, which gives us this!

    <If "%{HTTP_HOST} == 'blog.ipstenu.org'">
    	RedirectMatch (.*) https://ipstenu.org$1
    </If>
    

    Excellent, then on the second one which becomes this:

    <If "%{HTTP_HOST} == 'ipstenu.org' ">
    	RedirectMatch ^/blog/([0-9]{4})/([0-9]{2})/(.*) https://ipstenu.org/$1/$3
    	RedirectMatch ^/blog/(.*) https://ipstenu.org/$1
    </If>
    

    What’s the difference here? Well, RewriteRule is handled by Apache’s mod_rewrite, while Redirect is handled by mod_alias. I know I have mod_rewrite on, but I don’t know why it insists on tossing in the path statement.

    However I can assure you that these ifs work perfectly. I’m using them right here on this site. My whole .htaccess is wrapped with them.

  • Apache 2.4 Kiboshed SPDY

    Apache 2.4 Kiboshed SPDY

    I have a store running on SSL for security reasons. I mean, you kind of have to, right? The problem is you don’t really want to cache SSL pages, as I reminded myself lately. At best, I was able to work around PageSpeed’s idiosyncrasies and compress the HTML and JS somewhat, but still I know that there has to be a better way.

    Everyone told me to look at SPDY. Now… this came with some issues. I needed Apache 2.2.4 (I was on 2.2.2):

    	httpd >= 2.2.4 is needed by mod-spdy-beta-0.9.4.1-397.x86_64
    	mod_ssl >= 2.2 is needed by mod-spdy-beta-0.9.4.1-397.x86_64
    

    race car driving very fastWhat’s an elf to do? Well… what about Apache 2.4? After all, it’s the latest and greatest. This is when my eyebrows jumped. There’s no support for Apache 2.4. And the mod release is only on SPDY 2 when the release is on SPDY 3.1? What on earth is Google doing!? Apparently giving up on mod_spdy which is horrible. Love the open source community though. Patrick Buckley forked it. I cannot stress enough the requirements in life to check into some random stranger before you just download and use their code. Especially when we’re talking servers! Sadly, looking into his code I saw it would upgrade apache and SSL.

    Well. No. It’s not that I don’t trust this guy, the code looks okay. It’s trying to install HTTPD 2.4.7 which is not the latest and greatest for my server’s OS (currently 2.4.9). Not to mention some research on cPanel showed issues with mod_spdy and CentOS (including the note that Patrick’s code caused random coredumps). However. The odds are that when, eventually, the stars align and there is mod_spdy (or some alternative) for Apache, it’ll be for 2.4.x so I may as well put the effort into updating today.

    Sidebar. Yes I know about nginx. Yes I’m aware of the package for CentOS. Yes I know it’s faster for static files and CSS and JS (and arguably even for PHP). Yes I know it’s easier to use default nginx than to tune Apache. But. I like having my .htaccess file to edit, and I’m not ready to do a total switch yet since this is not my server for me alone. Eventually yes, I will. Today is not that day.

    So Apache 2.4! There aren’t a lot of Apache 2.4 issues, but what they have are major enough for me to sit up and pay attention. For example, MPM-itk is no longer provided as an easy install from cPanel, they wanted me to use mod_ruid2, which isn’t compatible with memcache. I really hate that. However. Many people informed me you can still use memcached, and besides which, Apache 2.4 doesn’t support Memcache. I still find it amusing that Cpanel outright says mod_ruid2 is just as dangerous as MPM-itk, but would rather use the one that’s less compatible. It’s not that I can’t install it on my own, of course, it’s as the amount of effort put into working around a problem gets large, the less pleased I am with that as a solution. Work smarter. By the way, mod_ruid2 is available on Apache 2.4. I learned a lot when I installed it myself, now I’ll learn more.

    There was a catch in things of course. I’d set up mpm.conf files in /usr/local/apache/conf/userdata/std/2/ and had to roll those back, as they borked deployment. Took me an hour to sort out that. Remember to read the complete errors, folks. Of course I tested things once Apache 2.4 was up, before starting to make sure all my modules etc were still running. I was lucky, I only had to configure pagespeed for Apache 2.4. Everything else worked out of the box. Since I was using MPM Prefork already (worker is not available due to mod_ruid2) I didn’t have to edit anything there.

    Devil food cakeWhat did I notice? Memory and load stayed the same. And you’d think that meant this was for nothing. I should mention this happened to be on the same day I got nailed by a 60% bump in traffic on my busiest site. So … that would be better then.

    I’m bummed that SPDY isn’t being actively developed for Apache right now, though. For folks who are pushing the HTTP 2.0 world, they seem intent on ignoring or not committing to getting others up to speed. While nginx is awesome, there will always be a reason for people to use other server types. I hope to either see mod_spdy get picked up and loved again, or for someone else (Microsoft’s HTTP S&M?) to pick up the thread and remember that abandonment doesn’t move things forward as fast as you’d think.