Half-Elf on Tech

Thoughts From a Professional Lesbian

Tag: risk

  • Whose Responsibility Is It?

    Whose Responsibility Is It?

    When WordPress 4.0.1 came out, a small number of sites broke.

    For a while, we’ve been touting that minor releases to WordPress core, the ones we auto-upgrade for you, are very safe, very tested, and very important. While all this is true, it has brought a few people complaining to me that obviously I was wrong.

    It’s true that the 4.0.1 release broke people. It was an object lesson in why I tell people not to reinvent the wheel. But this upgrade situation does not mean the upgrades aren’t safe, secure or smart. It does bring thoughts to mind, like what my friend David talks about when he considers WordPress at the enterprise level. I know people who are using this failed upgrade scenario as a reason to tout that WordPress isn’t ready for big business, but I think they’re looking at it from the wrong perspective.

    Caution Minefield sign

    Let’s step back.

    I used to work for The Man. I’m well aware of the machinations you go through to upgrade anything at a massive enterprise. One of the things they do is a code review. Every single upgrade is checked and tested and a dry-run of the upgrade is run to ensure everything works the way they think it should. By allowing WordPress auto-upgrades, you remove that ability. For a massive corporation? I would turn off the auto-update.

    But at the same time, this mythical major company running WordPress would have at least one person who knew WordPress. They would have someone who’s job it was to review every single bit of code that went into their WordPress site. Each plugin would be checked, tested, evaluated for security, and only installed if that WordPress Checker said it was good. Because that’s exactly what you must do in any and all enterprise situations.

    David’s viewpoint is that the vetting of a site should be delegated.

    My gut reaction to say that they should know better has to be tempered with the fact that no, they should not have to. It’s the job of every site owner to vet their system, but to make a platform that is truly global, that vetting should be delegated. Web hosts and security analysts should vet code for collisions and bugs. Theme and plugin shops should ensure that their products adhere to best practices. Putting accountability for the full stack on each site owner is not only inefficient, but impractical. Inherent trust should exist that code in the official repository maintains a baseline level of code, trust that is eroded when the problems that occurred with a subset of sites on this update occur.

    And here, he and I disagree somewhat.

    It’s the job of everyone who uses software to be aware of what they’re doing. Vetting the software before it goes in to your system has to be someone’s job. WordPress core does an amazing job of this for you. WordPress core is safe. The 45k plugins and themes in the world don’t always meet the same level of robust checking. Which means when you introduce WordPress to your environment, you absolutely have to seriously review those third party odds and sods you want to use because they’re so shiny and cool.

    Web hosts vet code for collisions, sure. We do at DreamHost. That’s part of my and Mike’s jobs! We know what’s going into WordPress and if it’s going to blow things up at DreamHost for our customers. But like the site owner who found her site down one morning because we’d upgraded her from PHP 5.2 to 5.4 and it broke her WordPress 2.5 site, we cannot account for everything.

    I think there’s a need for security specialists to review plugins, in a public forum, and point out who’s not doing things in the best way. I also think that there’s a need for developers to remember there’s a reason why we do things a certain way, and while it’s fine not to, you have to keep in mind that it’s now your responsibility to keep a close eye on anything that changes in core that might cause your code not to work as well.

    For example. If I wrote a plugin that worked around the shortcode API for whatever reason, I would have a custom query on trac for any ticket related to Shortcodes and have it as an RSS feed to monitor. Or I might even subscribe to the trac firehose and use a filter to pull out anything that so much as mentioned the word. Because I’ve now made a change that I know might be a problem someday.

    Every business owner should know the risks of all the software they use, be it website or desktop. This responsibility is the cost of doing business. The size of the business and the importance of the software will change what resources you can afford to allocate to that part of your business, but you absolutely cannot ignore it.

    While I really want to say that because WordPress core does due diligence you don’t have to, I would be a lying liar who lies. Even if we do as David suggests and have everyone in the world making sure things are vetted and checked and stamped, it still requires the owners of a site listen to that information and not use the code that’s less optimal. Enforcing that would be impossible unless you wanted to suggest that WordPress outright deactivate code that doesn’t use the proper APIs. That would put a lot of weight on WordPress and slow it down and be pretty annoying for people who are legitimately using non-standard methods of development and implementation.

    No matter what, at the end of the day, the person who is responsible for the code quality is the person who wrote and maintains it. But the person responsible for their site is the person using the code. You have to know what code you’re putting into the site and be aware of the risks you’re introducing to your environment by doing so. If your website is your entire business, you cannot afford to be cavalier about these things.

    Disasters happen. Understanding the risks will prepare you for dealing with them when they do.

  • Forget 100%

    Forget 100%

    Can I tell you a secret?

    Kids doing karateI hate the five nines. The Six Sigma Stigma has me wishing that everyone who tells me they’re a ‘black belt’ please die in a fire. It’s not that I don’t think that the process can work for some people, or that it’s useless as a whole, but that I think too many people treat it like an MBA. “I did this thing for a few months, I am now an expert.” I had a bunch of coworkers who did that. I hated them. I got to the point that if you said “We need five-nine reliability” I had a Pavlovian reaction that involved me rolling me eyes and tuning out.

    Now this doesn’t mean than I don’t think 100% ‘uptime’ in anything is a nice goal, but I see it as a lofty goal. Look, you know this stuff already. I will not walk down the stairs successfully 100% of the time. I will not have the next key I strike on my keyboard function as I expect 100% of the time.

    So moving this off and saying “I don’t need 100% uptime, I need 99.999% uptime.” doesn’t actually change anything. In fact, I’m willing to bet that a lot of people look at the five 9s and think that it’s so close to 100% that they should never see or notice an outage. Thanks, Six-Sigma people, you just made 99.999% synonymous with 100%.

    That wasn’t the secret, though.

    My secret is that I don’t care about 100% uptime in anything.

    I worked for too long in deployment to understand that there is no such thing as 100% uptime for anything. There are ways to minimize and mitigate downtime, and there are ways to make sure it causes as little impact as humanly possible, but there’s no way to avoid it. Ever reboot your computer? Of course! Ever upgrade WordPress? You have then experienced downtime. It’s a nature of life. I expect it, I don’t sweat it.

    So if I don’t care about 100% uptime, what do I care about?

    Reliability, accountability, responsibility, and timeliness.

    I was reaching for the words that end in “ibl”, but really the root ofable is my concern above all else. Are they able to handle it when things go pear shaped? Are they able to fix problems quickly, correctly, and efficiently? Are they able to prevent the exact same error from happening again? Are they able to own up to their mistakes?

    Two wood models fighting over moneyI don’t expect anyone to do all that 100% of the time, but I expect them to care about the things that are important to them as an entity. My webhost should care about the severs not being on fire and serving up webpages. My bank should care that my money is safe and available. My government should care that it’s … Too soon? Anyway, the point is that you should care about what you do, and provide the best service you can. Now, if 50% uptime is your best, maybe I’ll look for someone else. I am reasonable about these things. If email goes down, how fast did you get it back up? But to me 50% isn’t reliable unless I’m looking for something that, intentionally, only works half the time.

    All this said, I don’t actually look at the uptime numbers all that much, unless I feel that the reliability is sub-par. The actual numbers, the metrics, the absolute “This service is up 100% of the time or your money back” is not really what I count on. My friend Pippin said it wisely the other day:

    I have far more faith in a company that encounters occasional problems but responds incredibly promptly than one that has fewer issues but doesn’t respond half as well

    He happened to be talking about a brief (like 10 minute) outage on his site when all databases were inaccessible.

    Don’t bank on the percentage, bank on the ability to react and come back.

  • Two Factor Authentication

    Two Factor Authentication

    originalThis is something that Tony Perez and Sam “Otto” Wood both recommend, so you know I have to look at it seriously!

    I think I need to point out that I’m willing to accept that I’m wrong about things. After all, I can’t know everything, and I am well aware of that. But one of the things I work hard to do is learn, adapt, grown and get better at all this. The whole reason I started talking about tech on this site was I was trying to understand cloud hosting back in August of 2010(A lot of tech posts were ported over from Ipstenu.org after the fact.).

    The point is I do this site because I want to learn, and when I learn, even if I don’t understand all of a thing, I want to share what I’ve learned specifically because I know people will come and correct me. Next to answering people’s questions, this is the fastest way I know of to really understand things.

    So.

    I didn’t mention Two Factor Authentication in my security post. Using it certainly would have mitigated the brute-force attack, though not the DDoS implications of it, and that remains why I am a fan of ModSecurity. That doesn’t mean I didn’t just add another tool to my arsenal, or that I’m not willing to try something out.

    I am now using Two Factor Authentication.

    Two-factor authentication (aka multi-factor authentication, or TFA, T-FA, or 2FA) is a way to verify your authenticity by providing two (ore more) of the following factors:

    1. Something the user has – aka a possession factor
    2. Something the user knows – aka a knowledge factor
    3. Something the user is – aka an inherence factor

    For most of us, we authenticate only via knowledge – that would be your standard username and password. You “know” your password, thus you pass the knowledge factor. A PIN (like for your bank card) is the same thing. This is simple, it’s easy, and most of us can remember a password.

    Something you have is easy to explain if you’ve ever worked for a company and had a RSA ID or a keyfob with a random generated string. That’s the possession factor at work. In fact, your bank card (again!) is one of these too! It’s something else, something physical that you must have to prove you are actually you.

    Inherence factors are things like biometrics, so a fingerprint or retina scan. That’s all you need to know about that. Arguably it’s something you have, but it’s a part of you, something you always have with you, so it’s inherent or innate to your very person. Latin. You’re welcome.

    It’s pretty obvious that a strong password only goes so far. If I can’t log into my laptop without a USB keyfob, then my site is super secure. This is better than using the picture and keyphrase that a lot of banks use right now, but it’s also harder. It’s very easy for a company to have you pick a photo, a sentence, and a password and make you verify them when you log in. But to instead make sure you have a specific device with you that verifies who you are and that you’re you in this very second?

    drew_barrymore_04How, exactly, they work depend on which methods your using. There are myriad different methods of possession factors you could use, and how each one works is a little different. But we like multiple factors because if you needed (say) my retina scan and a password to log in and a titanium ring, and another person with those three items, then I’ve just described the plot of Charlie’s Angels: Full Throttle. I’ve also described a pretty tough nut to crack if you’re not Drew Barrymore.

    The issue with these methods is they’re not (yet) practical for the common man, and that’s really a large part of why I don’t like TFA very much.

    The knowledge factor is the most easiest to hack. We’ve see that. That’s the whole reason we want to use two or more factors to authenticate. I’m not arguing that. The possession factor is the easiest to break (lose your keyfob or be out of cell phone range). Unless there’s some backup to let me in even if I don’t have the second factor, I’m SOL in a lot of ways. Of course, once you have a backup method, then that’s vulnerable. The inherence factor is the least reliable so far and the hardest to implement correctly. There’s a whole Mythbusters on how easy it is to make a fake fingerprint. It’s not that this is easy to hack, it’s that it’s hard to protect.

    Okay, so what should we do?

    The Google Authenticator Plugin for WordPress comes recommended by my man Otto and I know I’m not Google’s biggest fan, but this is one instance where I think they did it right.

    The plugin uses open source code for Google Authenticator, which is not something Google really invented so much as perfected. In fact, my old keyfob at work did the same thing.

    Here’s how it works. The site you visit generates a string of characters called your Secret Key. This key can be a string (like hE337tusCFxE) or a QR code embedded with all the information from your site (like site name and so on). You enter the data into the app on your phone, and that uses secret string plugins the date and current time, to generate another random number string you use when you log into the phone.

    SNP_2909001_en_v0It’s like a password that always changes, and since your phone and your (say) blog have clocks running, they know what time it is, parse the math on login, and off you go. So yes, this will work if you’ve got no cell reception. But no, it won’t work if you’ve lost your phone (which remains an issue for me). Since each site has a unique key and time is always changing, the code is never the same twice. No two users or sites will have the same key either. There’s more math to it, and you can read what Otto commented about it.

    Now to log in to my blog I need the username and password, plus a random number I can only get at if I have my cellphone and know the passcode there too. In my case, if I lose my phone, I can’t get into my site. This is, most of the time, okay. If I’m on a strange computer, I need the phone anyway to get the password out of 1Password, and I tend not to log on when I’m not on my own computer or my iPad (which requires the use of an app password, less secure all around, but needed).

    To me, it’s not risk versus reliability, or even risk versus vulnerability. It’s risk verus risk. So far, the risk of losing my phone is less than the risk of what happens if I lose my website. After all, my website is my life.

  • The Dangers of Being Uneducated

    The Dangers of Being Uneducated

    This post is dedicated to Rachel Baker, who donated to help me get to WCSF. In lieu of Coke (and a sincere promise of no heckling), thank you, Rachel.

    Like many of these posts, it started with a tweet.

    Just six months ago, a WordPress plugin named RePress, hosted by all4xs, came on the scene. This is hosted at WordPress.org, see WordPress Plugin – RePress, and at the time it showed up, I was seriously worried about it.

    The plugin itself is made of awesome. It’s a proxy service, so if you happen to live in a place where freedom of speech is an unknown quality, you can use your site to serve up pages from other domains and read them, even if they’re blocked. Essentially, instead of going directly to wikipedia.org, you go to yourdomain.com/wikipedia.org, and the content from Wikipedia is requested by your server, not your local IP, so if your ISP is blocking the content, you can still see it. If you’re visual, it’s like this:

    How RePress Works

    This relies on two important pieces to work, however. First, whereever your site is hosted has to have access to where you’re trying to get (that is, if my webhost blocked Wikipedia, this won’t work). Second, you need to know what you’re doing.

    It’s that second point that worries me to no end.

    Look, I firmly believe in freedom of information. Once something has been invented, people are going to figure it out, so giving it to the world to improve upon it is sensible. Patents are just a weird concept to me. To say ‘I invented a thing, and no one else can invent the same thing, and you can only use the thing as I’ve made it!’ just blows my mind. We need to crowdsource our intelligence, share, and improve. It’s the only way to evolve.

    But that’s besides the point. The point is I worry like you don’t know about people being uneducated as to what this plugin does. Regardless of if it’s a good idea or not, it’s a dangerous thing because it has a great deal of power.

    The Pirate BayI have a slightly selfish reason for worrying about it. I work for a company where using a proxy to get to websites they’ve blocked is grounds for being fired. I’m not the only person who has this concern. The worst part about this is if I went to a site that used a proxy, without telling me, I could get ‘caught’ and fired. Oh sure, I could argue ‘I didn’t know!’ but the fact remains that my job is in jeopardy. This is part of why I hate short-links I can’t trace back. A proxy being ‘right’ or ‘wrong’ doesn’t matter, what matters is the contract I signed that says I will not circumvent the office firewall knowingly. Now I have to be even more careful with every link I click, but the uneducated who don’t know anything about this are at a huge risk.

    As Otto would say, we worry about the evil people, the ones who use this proxy to send you to virus infected sites, or places they could hack you. I really don’t worry about them very much. Evil is evil, and people are always going to be malicious. They know what these plugins do and how to use them, so again, my fear is for the uneducated who don’t understand. The people who still open those attachments from usps.com are the people who will be hurt by this. The rest of us will just deal with ‘You work on computers? Mine’s acting funny, can you look at it?’

    My main fear is for the people who don’t really understand how the plugin is dangerous to have on their own site. RePress, in order to prove that their plugin worked, hosted a proxy to The Pirate Bay, a popular torrent site. Near the end of June, BREIN told them to remove the proxy to The Pirate Bay. BREIN, to those of you who are wondering who they are, is the RIAA of the Netherlands. Essentially they’re a Dutch anti-piracy group, and they think that the proxy service to Pirate Bay is breaking the law. It may be. Greenhost, the hosting company behind RePress, and their webhost, is in the Netherlands, and it does fall under that law.(It’s nearly impossible to keep up with all this, but Wikipedia has a nice list of everyone who’s blocking The Pirate Bay, and their status. That’s a real Wikipedia link. In the US, so far only Facebook and Microsoft will edit your links to The Pirate Bay, and only on their services.) As of July 9th, all4xs/Greenhost lost the argument. A court order came in and now there is no more hosting on their site.

    It’s important to understand this Court order only impacts the proxies at Greenhost. There is no action against the plugin itself, and none at any other website using it.

    So why does it worry me?

    Screaming UserI do a lot of forum support, and I can easily envision people getting cease-and-desist orders from the Courts, telling them to remove their proxies. I can see webhosts shutting down sites because they don’t want to deal with the hassle, or because their servers happen to be located in a country where the site being proxied is blocked. And without any effort at all, I can see the users, who don’t understand the risk they’re getting into by running this proxy, screaming their heads off and blaming WordPress because they are uneducated. They’re not stupid, and they’re not evil, they just don’t see the big picture.

    It’s like when I had little sympathy for Blogetery, when it was shut down in June of 2010. They were running an open, unchecked, Multisite, and allowed anyone in the world to make a site, and didn’t monitor their users. Thus, after multiple copyvio issues, and now a terrorism claim, Blogetery’s webhost decided enough was enough and shut them down, impacting around 14,000 people (give or take, I wasn’t able to get the number of splogs on that site sorted out). The point there is that Blogetery screwed up by not taking care of their site. It’s your responsibility to do that, and the less people know about what they’re doing, the more likely they are to screw up.

    I’d be a lot happier if RePress’s plugin page explained the risks. Until they do, I give you my own:

    RePress will let your server to act as a proxy to any website you chose, allowing visitors who would be otherwise blocked by their country or ISP to visit those sites. Please investigate the laws of your country, as well as those of your webhosting company, to ensure you are not violating them. Also remember to review the terms of use for your webhost, and do not provide proxy service to any site (or type of site) that you aren’t permitted to host yourself. If your hosting company doesn’t permit porn, don’t proxy a porn site. While this plugin makes every effort to prevent cross-site scripting, you are expected to monitor the sites you proxy and be aware of their intention. Remember: If you put it on your server, you are responsible for what it does.

    (If RePress wants to copy that and use it as is, or edit it, they have my permission to do so. And they don’t even need to credit me if they don’t want to.)

  • Risk vs Transparency

    Risk vs Transparency

    There's a 'fucking close to water' joke here. This was written without any special insider knowledge. I’ve simply watched, paid attention, and kept track for the last two years. Often when I report a plugin, Mark and Otto are nice enough to explain things to me, and I’ve listened.

    Occasionally a plugin vanishes from the WordPress repository. As a forum mod I tend to see people freak about this more often than not, and the question that inevitably comes up is ‘Why doesn’t WordPress publicize these things?’

    Let’s go down the list for why a plugin is removed first. This list is very short, and boils down to three:

    1. It breaks the rules
    2. It has a security exploit
    3. The author asks for it to be removed

    That’s pretty much it. The rules cover a lot, though Otto and I have been known to sum it up with ‘Don’t be a spamming dick.’ I actually had the chance to talk to folks about this before the ‘expanded guidelines’ went live, and I think I have a pretty good understanding of what the rules are. The majority of plugins, that I see removed, are done so for the most obvious reasons:

    • Phoning home (i.e. sending the author data about you without your permission)
    • Forward facing links (i.e. opt OUT links on the front of your site when you use the plugin)
    • Affiliate links (i.e. the author gets revenue from the plugin without disclosure)
    • Obfuscated code

    None of those are security reasons, and most of them are ‘fixed’ by us reporting the plugin, the plugin repo mods contacting the author, the author making the fix, and all is well. When the author doesn’t reply, or in the case of a ‘phone home’, often the plugin is yanked from the repo pending review. So where are these ‘security reasons’ to yank a plugin, and why should WordPress disclose them. Phoning home is, sometimes, a security reason, depending on what’s actually being transmitted.Usually it’s a vulnerability or an outright backdoor that would be a reason to pull a plugin.

    I see what you're thinkingThere’s an argument that ‘Trust requires transparency’ when it comes to security (see Verisign’s recent rigmarole) and that would mean WordPress needs to publish things like ‘This month, these plugins were removed for this reason.’ Except WordPress doesn’t, and in fact, if you look, rarely do companies do this until they have a fix. The ‘problem’ with WordPress is they don’t do the fix, the plugin devs do, and a surprisingly high amount of times, the plugin author fucks off like a monkey.

    On the other side of this argument is FUD(Fear, Uncertainty and Doubt) which is something you never want to feed. Look at the plugin “ToolsPack,” helpfully shown up on Sucuri. Now that was never hosted on WordPress.org, but if it had been, it would have been removed for exploitation. But once the offending plugin is removed, should WP go ahead

    In October of 2010, WordPress.org ‘introduced’ a kill switch for plugins. Not really, but kind of. BlogPress SEO was spam. Yoast, one of the few true ‘SEO experts’ I know of, caught it and decided to fix it the best way he knew how. See, this plugin was never on the WordPress repository and so WP could do little about it. Yoast registered a plugin with the same name, gave it a newer version of the plugin, and everyone saw that as an ‘update’ and thus were saved. Sort of. Now, even Yoast admits this is abuse of the system, and I’ll leave the coulda/woulda/shoulda to someone else.

    The reason I bring it up is this shows there is a way to handle bad plugins. But it’s not very efficient, it’s not very friendly, and it doesn’t promise that it will work. First off, not enough people run updates, and secondly it’s putting a lot of work on a very small group of people. While the theme reviewers have a lot of folks helping out, the plugins do not. Should they? Yes, but the number of people who understand all the code that could be in a plugin is far smaller than for a theme. I suppose it’s saying ‘plugins are harder than themes.’ I may be wrong, but it’s how I feel.

    Traffic Jam!To fix all this, you’d need to basically reboot the plugins directory, turn them all off, review each of the 18,000+ plugins, and turn them back on. Then you need an Otto or Nacin going through each one to make sure every check in is okay, every update and every change isn’t spamming. Oh yes, that’s what happens to theme devs, didn’t you know? All releases are approved before they go live. Can you see the plugin developers agreeing to that? That’s a nonsense complaint of mine, actually. If tomorrow the rules changed, maybe half the plugins in the repo would vanish and never come back, but most of the rest would be fine. Of course, we would need a dedicated team of people to do nothing but review and approve plugins to keep up with the traffic.

    So accepting what we have today, the wild west, why isn’t there a running list of all plugins yanked from the repo, and why? The list itself isn’t a bad idea. Having a list to say ‘This plugin was disabled on this date’ would be nice for a lot of us, and more so, having the plugin page show ‘This was disabled.’ would be nice. I can even think of a couple code ways to do it, but all of them need a full time person to go through the ‘removals’ and put up a splash page with ‘If you used this plugin, please consider alternatives: .’ and ‘If you wrote this plugin, please contact plugin support.’ Also, this would increase emails to the plugins support account, not from the authors, but from people who want to know why a plugin was removed. And what about a day when a plugin is removed because of a bad thing, but the authors fix it? Did we create a false feeling of doubt in a plugin that had a typo?

    On paper, it all sounds like we should be keeping a public list for this still, though. Put it all up there for the public, disclose everything.

    Every time I write that sentence, I wince.

    It sounds nice on paper, and all I can think is about the people who will cry foul, complain, and want to know more. “Why was this plugin removed and not that one?” Well, most of the time it’s because no one mentioned that plugin. Right now, the plugins that get yanked are ones people stumble across or report.

    But why worry about a simple list of removed plugins? Because the first thing I would do, if I was a nefarious hacker, would be to script a pull from that list and scan the web looking for sites that use the plugins, thus implementing a vector for attack. See, we already know people don’t update plugins as often as they should (which is why Yoast’s ‘fix’ isn’t as good an idea as we’d hope), but now not only are we leaving people at risk, we’re opening them to even more risk. If we email them to tell them the plugin’s risky, we have the same problem.

    There’s no safe way to inform people without putting anyone who’s not up to date at risk. Given that the most dangerous day to have an unpatched system is the day of disclosure, the only way WordPress, or anyone, could keep a list like that would be if, like Chrome, WP auto-pushed updates right away, forcing anyone who opened the site to upgrade. And that’s fraught with it’s own issues.

    Until then, I can’t advocate anyone keeping a list of removed plugins. It’s too risky.

  • How Likely Is It That My Upgrade Will Fail?

    How Likely Is It That My Upgrade Will Fail?

    My father, Woody, is a risk analyst. So I asked him, knowing my math skills, where should I start learning about how to analyse and assess risk. With the personal commentary removed, here’s his answer.

    Math is not very important, at least not at the beginning. Risk assessment is really just thinking hard about answers to the 3 fundemental questions: what can go wrong; how likely is it; what are the consequences?

    Look at what you do at work. How can good answers to the three questions mitigate the (bad) consequences of poor decisions?

    Do a pilot study with an up-coming decision.

    Remember that what can go wrong? means an analysis from a choice or intiating event (like a 3-day power black-out in Chicago) of the sequence of events and failures of systems to control the events, bad human decisions, etc. Each sequence ends up in a bad situation or an ok situation. How likely is it? is just the likelihood of that sequence occuring, usually measured by a probability for each event in the sequence, either through data or expert judgement. What are the consequences? means that for every bad ending of a sequence, what are the consequences of that bad state.

    Make a decision-tree or event tree to enumerate the sequences. Each branching point (or top event) can have a fault tree to represent how that branch point fails or succeeds, or just expert judgement.

    Represent the likelihood of failure as a number between 0 and 1 (then success will equal 1-failure).

    Choose an end state for each sequence. Multiply the numbers for each branch point to get the likelihood of the sequence.

    Add up all the sequence likelihoods for the same end-state.

    That’s all there is to it.

    When you put it that way, it does look pretty simple.

    So I went through a proof of contcept process.  This is my first time making a fault tree, and I didn’t bounce it off my father.

    Fault Tree of a WP Upgrade

    As you can see, this is pretty basic. What can go wrong? A lot actually, and I wasn’t really doing more than picking the common problems. But this is a fault tree, not a decision tree. Are they different? They are! A fault tree is basically what you use to suss out why things go wrong. A decision tree though, we make a list of decisions and spin out what the liklihood of a failure is. So my decision here is “How should I upgrade WP? Stable or Pre-Release?

    WP Upgrade - Decision Tree

    Here you’ll see this is a similar enough, but wait! I have funny numbers! That’s my guesstimate at how likely these are to cause problems. See if you don’t have high tech skills, using SVN to upgrade is higher risk. In this world, you want a lower number. Like if you look at the stable release, you’d see that it adds up to a .4 failure, or a 1% that it’ll fail because of the upgrade tool or the user’s tech skills, but a higher 2% for ‘breaks’ (by which I mean you have a crappy plugin or theme).

    Now I left off things like for SVN/Nightly/Beta/RC you get the cool toys early, mostly for space and since this is a poof of concept. It’s clear that SVN is something only experienced people should play with, but it’s very possible I’ve scored Beta/RC too high. They’re sort of a break-even point, though. While Stable will always be recommended, I did a quick revamp of Nightly and Beta/RC. Nightly’s are more risky because you run a risk of getting an incomplete build (that is, some of my bored maniac friends may be checking in code, and not be 100% done when you run your update – a common weird issue with SVN and why I always svn up before I consider reporting a bug). But a Beta/RC is a ‘very nearly done’ cake, just missing the icing.

    WP Upgrade - Decision Tree Take 2

    Version two is, you can see, very similar. Personally I consider this a ‘start’ to understanding the risks inherent in a WordPress upgrade. If you held a gun on me and demanded I explain where I got the numbers, I would call them educated guesses, based on the forums, the mailing lists and my personal experience. Dad would say ‘Expert Judgement.’

    My next steps are to read up more on the process of using decision trees, directly in relation with software. While I certainly will also be looking into how a tornado in downtown Chicago would impact my office (can I get to work? No? Okay, so VPN. Can it take 5,000 people at once? Based on Snowmageddon last year, no. etc etc and so on), understanding the logic trees behind the forms is always my first step.

    To my WordPress friends, please let me know if I scored things too high or low in this one! To the rest of you, if you use these sorts of things in your jobs and, if so, how. I’d love to see some real-world applications outside the financial world!