One of the biggest challenges of project management is dealing with risks and opportunities. How do I build a work breakdown structure if I don’t know what’s going to be a problem down the road?
How do I build my team if the challenges are unknown? How do I effectively leverage good news? The difference between success and failure can come down to proper management of your risks and opportunities.
First, it’s important to understand the difference between an issue and a risk. A risk is something that might happen, while an issue is something that has happened. For example, climate change is an issue; an extinction-level asteroid strike in the next 100 years is a risk.
The first step to managing risks is to identify them early and take actions that reduce them, known as mitigations. A key tool to help you manage risk is a Failure Mode Effects Analysis (FMEA). You use this to find and prioritize your risks.
There are four stages to building your FMEA:
- What are the risks that might arise during the project?
- How important are these risks?
- How should we respond to these risks?
What are the risks that might arise during the project?
Gather a group of people, some who understand the project well, but also a couple pairs of fresh eyes. All the disciplines that are involved in the project should be represented, possibly including engineering, manufacturing, sales, support, and marketing.
Once you’ve gathered the group, it’s time for some brainstorming. Just ask, “How could something go wrong (or right) that is a deviation from our plan?”
I like to break the problem into categories, like mechanical and electrical, to help make sure every area of the project is touched upon. No risk is too small, and everything is added to the list without filtering. There’s just enough discussion that the team agrees what the risk is and the text is unambiguous.
Let everything sit for a few days. It’s not unusual for someone to email you with more ideas after the meeting. That’s great, it’s a sign that the team is taking the exercise seriously. Just add everything to the list.
As an example, if you’re building an electrically powered device with a pressure vessel, your risks might include:
|Mechanical||Pressure vessel fails|
|Electrical||Power demand exceed requirements|
|Thermal||Temperature exceeds material limits|
|Mechanical||Weight exceeds requirement|
Table 1: List of risks from brainstorming session
How important are these risks?
After a few days, convene the group again. Add any ideas for risks that have occurred to people since the last meeting, but don’t restart the brainstorming.
Then it’s time to get to the hard part, quantifying the risks for severity and probability. First, set the ground rules. Each risk should be rated on a 1 to 5 scale for severity and probability. Some teams like a 1 to 3 or 1 to 10 scale, but I find 1 to 5 provides the right amount of precision.
The exact definitions can vary, so it’s important to define them for the project you’re scoring. You should have something like this:
|1||Minor impact to lifespan of product or functionality. Probably not noticed by end-user.|
|2||Some impact to lifespan of product or functionality. May be noticed by end-user, but not sufficient to create a poor impression.|
|3||Impacts the lifespan of product or functionality. Will be noticed by end-user and may lead to poor reviews.|
|4||Significant impact to the lifespan of product or functionality. Will lead to warranty returns and poor reviews.|
|5||Creates a safety hazard to the end-user.|
Table 2: Severity will be different if the worst case is triggering a Richter 7 Earthquake or having to reboot your computer. Severity 5 is the worst thing that can happen and scale down from there.
|1||Happens 1/1,000,000 per unit per year|
|2||Happens 1/100,000 per unit per year|
|3||Happens 1/10,000 per unit per year|
|4||Happens 1/1,000 per unit per year|
|5||Happens 1/100 per unit per year (if you have 100 units in the field, you’d expect to see this once per year|
Table 3: The probability should be scaled based on the number of units you expect to deploy. A probability of 1 should be rare, but not never, while a probability of 5 should be often, but not all the time. The above example would be appropriate for a device that sells about 100,000 per year.
Some teams include Discoverability (i.e. the difficulty of determining if the risk has happened). I find that adds more complexity, but not a lot of value. It also lowers the relative RPN of risks that are easy to discover, which might lower the priority of a critical risk.
You now go through each risk and rate how likely you think it is to happen and the severity (positive or negative) if it does.
Everyone needs to understand that these are Scientific Wild-Ass Guesses (not to be confused with a Stupid Wild-Ass Guess): Don’t spend a lot of time arguing whether it should be a 3 or a 4. Just put down a consensus number and keep moving forward.
When in doubt, go with the higher number. You also need to calculate the Risk Priority Number (RPN), which is the Severity x Probability, for each risk. When you’re done you should sort the risks by RPN.
Do not discuss how to solve the problems at this meeting.
|Thermal||Temperature exceeds material limits||5||2||10|
|Mechanical||Pressure vessel fails||5||2||10|
|Electrical||Power demand exceed requirements||2||3||6|
|Mechanical||Weight exceeds requirement||3||1||3|
Table 4: The list of risks after you’ve rated the Severity and Probability of each one. The RPN is a measure of how important each risk is.
How and when should we respond to these risks?
After a few more days of letting things settle, bring the group together again. Your rules should state a threshold and all risks with an RPN above that number must have a mitigation that should move the RPN below the threshold. Looking at our example we’ll create mitigations for risks with an RPN more than 5.
Temperature exceeds material limits
It’s hard to imagine a situation where the temperature of the device exceeding your material limit not being very bad, so we’re better off attacking probability than severity. A solution to this might be to select a material that can tolerate very high temperatures or add a design element that effectively limits the maximum temperature.
Pressure vessel fails
Like the temperature risk, we’re better off going for probability. One possible approach is to do a finite-element analysis (FAE), which is a computer simulation of the forces on the device. If the FEA says our design is safe with a margin of 100%, we can set the probability to 1.
Power demand exceed requirements
Our initial calculations suggest that the power demand is likely to exceed what our marketing team wants by 20%. The marketing team says that 97% of the market can provide 30% more power than the current requirement and they’re okay with changing the requirement. Often the mitigation is changing a requirement, not the design.
Weight exceeds requirement
The RPN here is below our threshold, so there’s no need to create a mitigation plan.
When you’re done, you should have something like this:
|Thermal||Temperature exceeds material limits||5||2||10||Explore material choice and mechanism to limit maximum temperature|
|Mechanical||Pressure vessel fails||5||2||10||FEA|
|Electrical||Power demand exceed requirements||2||3||6||Increase power requirement|
|Mechanical||Weight exceeds requirement||3||1||3|
Table 5: It’s common to include the Sev and Prob numbers you expect after mitigation in this chart.
Mitigate and Monitor
It’s now time for the team to implement the mitigations. After an agreed amount of time the same team should meet and review the situation.
- Have we selected a material or a control mechanism to lower the maximum temperature?
- What did the FAE show about the forces on the pressure vessel?
- Have we updated the requirements document with the new power requirements?
- Have we learned of any new risks that we neglected previously?
In my example, I’ve focused on how things can unexpectedly go wrong, but the same process can be used on how sometimes things go better than expected. Capturing opportunities is almost as important as resolving risks.
The intention is to reduce risks and uncertainty early, because some of the mitigations might force a significant redesign, which is a lot easier early in the project. The FMEA is just the starting point. Go forth and reduce uncertainty and successfully deliver to your stakeholders’ delight!