My father, Woody, is a risk analyst. So I asked him, knowing my math skills, where should I start learning about how to analyse and assess risk. With the personal commentary removed, here’s his answer.
Math is not very important, at least not at the beginning. Risk assessment is really just thinking hard about answers to the 3 fundemental questions: what can go wrong; how likely is it; what are the consequences?
Look at what you do at work. How can good answers to the three questions mitigate the (bad) consequences of poor decisions?
Do a pilot study with an up-coming decision.
Remember that what can go wrong? means an analysis from a choice or intiating event (like a 3-day power black-out in Chicago) of the sequence of events and failures of systems to control the events, bad human decisions, etc. Each sequence ends up in a bad situation or an ok situation. How likely is it? is just the likelihood of that sequence occuring, usually measured by a probability for each event in the sequence, either through data or expert judgement. What are the consequences? means that for every bad ending of a sequence, what are the consequences of that bad state.
Make a decision-tree or event tree to enumerate the sequences. Each branching point (or top event) can have a fault tree to represent how that branch point fails or succeeds, or just expert judgement.
Represent the likelihood of failure as a number between 0 and 1 (then success will equal 1-failure).
Choose an end state for each sequence. Multiply the numbers for each branch point to get the likelihood of the sequence.
Add up all the sequence likelihoods for the same end-state.
That’s all there is to it.
When you put it that way, it does look pretty simple.
So I went through a proof of contcept process. This is my first time making a fault tree, and I didn’t bounce it off my father.
As you can see, this is pretty basic. What can go wrong? A lot actually, and I wasn’t really doing more than picking the common problems. But this is a fault tree, not a decision tree. Are they different? They are! A fault tree is basically what you use to suss out why things go wrong. A decision tree though, we make a list of decisions and spin out what the liklihood of a failure is. So my decision here is “How should I upgrade WP? Stable or Pre-Release?
Here you’ll see this is a similar enough, but wait! I have funny numbers! That’s my guesstimate at how likely these are to cause problems. See if you don’t have high tech skills, using SVN to upgrade is higher risk. In this world, you want a lower number. Like if you look at the stable release, you’d see that it adds up to a .4 failure, or a 1% that it’ll fail because of the upgrade tool or the user’s tech skills, but a higher 2% for ‘breaks’ (by which I mean you have a crappy plugin or theme).
Now I left off things like for SVN/Nightly/Beta/RC you get the cool toys early, mostly for space and since this is a poof of concept. It’s clear that SVN is something only experienced people should play with, but it’s very possible I’ve scored Beta/RC too high. They’re sort of a break-even point, though. While Stable will always be recommended, I did a quick revamp of Nightly and Beta/RC. Nightly’s are more risky because you run a risk of getting an incomplete build (that is, some of my bored maniac friends may be checking in code, and not be 100% done when you run your update – a common weird issue with SVN and why I always svn up
before I consider reporting a bug). But a Beta/RC is a ‘very nearly done’ cake, just missing the icing.
Version two is, you can see, very similar. Personally I consider this a ‘start’ to understanding the risks inherent in a WordPress upgrade. If you held a gun on me and demanded I explain where I got the numbers, I would call them educated guesses, based on the forums, the mailing lists and my personal experience. Dad would say ‘Expert Judgement.’
My next steps are to read up more on the process of using decision trees, directly in relation with software. While I certainly will also be looking into how a tornado in downtown Chicago would impact my office (can I get to work? No? Okay, so VPN. Can it take 5,000 people at once? Based on Snowmageddon last year, no. etc etc and so on), understanding the logic trees behind the forms is always my first step.
To my WordPress friends, please let me know if I scored things too high or low in this one! To the rest of you, if you use these sorts of things in your jobs and, if so, how. I’d love to see some real-world applications outside the financial world!
Comments
6 responses to “How Likely Is It That My Upgrade Will Fail?”
I’d say you scored the risk of running ‘unstable’ versions too high, especially SVN. My website has been running on trunk for a while now (just under a year, I think) and I’ve never noticed a significant problem with the stability of the code. Occasionally there is a plugin conflict here and there (like JJJ’s snack menu overwriting the ‘My Sites’ menu in the 3.3 beta), but nothing crazy.
What’s most unstable about using the pre-release versions in my experience is developing on top of them, since many of the new functions, APIs, etc. may be changing rather drastically before final release; sometimes the old ones too.
I’m also entirely open to the possibility that my experience is wildly out of touch with reality.
😉
I based the svn that high because of the number of trac tickets I’ve opened due to little things like ‘No full screen option in HTML view’ or ‘can’t add any links.’ I run trunk, sync’d four times a day, but I do run into problems and oddities that need fixing. Most of the time a followup ‘up’ fixes them, but they very fact that I have to go in and manually do that raises the risk. Basically by svn’ing, I’m in a constant state of ‘this could break at any moment.’
Stability doesn’t mean ‘My site doesn’t crash’, here, it means ‘My site works front and back without any issues.’ And that simply cannot be said of SVN (or nightly for that matter). It mostly works, but it does have problems. Otherwise we’d not have 4 betas before RC-1, eh?
I seem to have buggered my own post…
There’s a reason we rely more on automated code deployments than anything manual (not that I don’t think WP manual upgrades are safe as houses). Basically by using svn, I’m in a constant state of ‘this could break at any moment.’ And by factoring in the extra software (svn) we again raise risks. The more you SVN, the less risky it becomes though, which is a totally separate post.
How much does the complexity of the site matter? Does it have a place in your assessment?
Not as of yet, since I was taking a ‘standard single site install of WP’ as my benchmark. Though the basic tenets stay the same: The more complex the site, the more chances for breakdowns. But the basic risk of the upgrade (in this case ‘should I use SVN, Stable, etc when I upgrade’) would remain the same – Use Stable if your system’s complex.
Well… really ‘Test a lot if your system’s complex.’