“We’ve totally lost control of the plane! We don’t understand at all!”
– David Robert, co-pilot, Air France flight 447, June 2009
“I’ve had the stick back the whole time!”
– Pierre-Cedric Bonin, co-pilot
A few weeks ago, the final report (PDF) on the tragedy of Air France flight 447 – that plane that disappeared inexplicably over the Atlantic in 2009 – was revealed. The key factors leading to the disaster were failure of air speed indicator instruments followed by pilot error – “the pilots were overwhelmed.”
But CBS News, interviewing famous former US Airways pilot Chesley Sullenberger, notes an apparent design decision that appears to have been a critical contributing factor: the cockpit design is different on an Airbus aircraft (Flight 447 was an Airbus A330) than on Boeing aircraft, with a tragic result.
This video is very worth 8 minutes of your life to watch as it illustrates the importance of design affecting decision-making under stress. Go ahead, I’ll wait:
Note how the Boeing aircraft, simply by the design of the yoke, forces the two pilots to always be aware of the yoke’s position – and therefore, the angle of attack (nose up or down) of the airplane. From the flight data recorder, it is apparent that neither of the crewmembers in the Airbus-built cockpit realizes that one of them is inadvertently pulling back on the yoke – raising the nose of the airplane and stalling it – until it is too late. There is no other direct indicator of the angle of attack of the airplane in an Airbus A330, so there was no way for Mr. Robert to know that Mr. Bonin had been inadvertently stalling the airplane the whole time.
Poka yoke is a term from the Toyota Production System that means “mistake-proofing” – or designing tools or processes in such a way as to make errors less likely. Many examples are startlingly simple:
- 3-prong electrical outlets that only allow plugs to be inserted in one way.
- The memory cards in most phones and digital cameras are designed specifically so that they can only be inserted in one way, preventing the “error” of inserting it upside down.
- The safety bar on most lawnmowers today turns off the mower when the bar is released.
That simplicity, in retrospect, seems obvious. But somewhere along the way, an engineer or a designer made the decision to intentionally choose a design that makes the error impossible. That decision is taken for granted by users of those products.
Usually, poka yoke is seen in consumer devices – as above – or in manufacturing processes where human error is likely (e.g. on an assembly line, packaging screws into one-time-use kits for each product, making it easy to see if parts are left over). But it also applies to other lines of work where complex decision-making under stress may lead to errors.
In many professions, the assumption is that humans are trained – like pilots – to handle complex situations. Design decisions are probably made under the assumption that the user is already highly trained and experienced. You don’t need to “idiot proof” everything when your user is an expert, do you?
But Flight 447 illustrates that this assumption can fail when the “expert” is under severe stress – say, a critical instrument like an air speed indicator stops working – and the expert fails to comprehend why the complex system is not responding as he believes it should. Boeing’s design can compensate for such failures, at least in part. Their decision to leave the yokes connected, and physically right in front of both pilots while visible to both, makes sense in retrospect. In contrast, Airbus’ design disconnects the yokes and makes the actions of one pilot harder for the other to perceive and requires the pilots to communicate verbally. This adds a crucial step that, in this case at least, seems to have contributed to a tragic disaster.
This principle holds true in other highly trained professions. Expert surgeons have been known to make egregious errors like amputating the wrong leg from a patient. These errors can easily be prevented by intentional action like using a marker to clearly identify which leg is to be removed, or taking a “timeout” to confirm everything before the procedure begins.
Cultural Impediments to Effectively Using Poka Yoke
What’s tricky about poka yoke isn’t the design decision itself. In retrospect, it’s pretty obvious that an electrical outlet should only fit one way, right? It’s obvious you should double check before you cut off someone’s leg, right? It’s obvious that if the airplane’s stall warning goes off 75 times that you’re probably pulling back on the yoke, right?
No, what’s tricky is to recognize that the situation calls for “idiot proofing” at all. This is where culture comes in. Too often, we don’t even think that design decisions to reduce errors are necessary. We’re dealing with highly trained experts, right?
Here’s five ways cultural issues can interfere with preventing errors through design decisions:
- Assuming people won’t make mistakes. Your organizational culture should choose design decisions that, where possible, take risk out of the equation entirely – even if the user is an expert. Too often a professional will take offense at an attempt to “design away” a mistake they might make. You must fight this tendency.
- Arrogance by team members. Everyone on your team must be humble enough to accept that anyone on the team is susceptible to errors. People who see themselves as lone-gun heroes – as experienced or brilliant as they may be – are the most dangerous members of a team if they can’t accept that they, too, can and will make mistakes. If they can’t accept that, they will usually fight against possible poka yoke attempts to design away the risk.
- Toxic environments that prevent communication about errors. Your organizational culture must allow anyone to call out mistakes – or the risk of mistakes – by anyone on the team, regardless of rank or experience. Fear of reprisal or fear of offending a senior team member can lead to catastrophic error, like airplane crashes or surgical mistakes. Nurses, for example, may be hesitant to call out an expert surgeon in a toxic culture, fearing ridicule.
- Performance improvement separate from the work itself. Many knowledge workers make the assumption that performance improvement is something that happens outside the work team. For example, if you have an external process improvement group, other team members may not see it as their responsibility. This is a critical failure, and should be addressed by putting performance improvement – and error prevention – into the cycle of work the team itself is already doing. In software engineering, this is done using retrospectives after each iteration.
- Too much reliance on “process” or “training” or Standard Operating Procedures to prevent errors. While all of those are valid and useful tools, your organizational culture should not rely on Standard Operating Procedures as the default error prevention tool, when it is possible to design away the risk of error. It is easier to simply write a procedure than it is to truly examine a process. Take the time to consider if the procedure should even be necessary and if there is a simpler, idiot-proof way to do it instead. But just because it is easier, doesn’t make it more effective: to the contrary, writing SOPs for everything simply adds to the training burden of your organization and does little to prevent the error from occurring. For example: sure, we could train the world to only put electrical outlets in one way – or we could simply design the problem away entirely. Which is the better option?
It is perhaps worth noting that Airbus has not, at this time, made a decision to change its cockpit configuration, relying instead on additional training for pilots in handling the loss of the air speed indicators during flight and managing stalls at high speed.
Application to Software Engineering at Geonetric
The various examples of errors in this article clearly demonstrate that being an expert surgeon or a highly trained pilot does not mean you can’t make a mistake. This holds true in software development as well. In prior posts I’ve been talking about how teams need to be responsible for performance improvement – and Kevin recently posted two examples of poka yoke design decisions that Geonetric’s software engineering team uses to help us prevent errors:
- Clothespins: Yes, wooden clothespins – to prevent too much Work In Process from being accepted into the system, reducing work velocity.
- Sprint Confidence Rating tool: A bunch of smiley faces and numeric indicators used daily to force clear communication between team members, improving the predictability of our engineering speed.
These, too, are startlingly simple solutions to problems. Kevin’s explanations seem obvious and intuitive, and yet both are common and frequent problems on software engineering teams. These seemingly minor improvements – when you add them up over years of continual improvement – can and do make processes ultimately more reliable and less error prone.
We’re fighting hard to make sure that we have a culture that actively seeks performance excellence, through poka yoke techniques and a culture where performance improvement is an expected and regular part of everyone’s roles. If you’re interested in working with a team like that, we have open positions listed on our website.