Friday, March 16, 2012

The "Blame Spiral": How a blame culture destroys Projects and what to do about them.

James T. Reason has a very well developed model of the "Blame Cycle", e.g. "Diagnosing “vulnerable system syndrome”: an essential prerequisite to eVective risk management" (2001, Qual Health Care 2001;10:ii21-ii25 doi:10.1136/qhc.0100021) and "Managing the Risks of Organizational Accidents" [1997].

It is based on:
  • The Fundamental Attribution Error: misidentifying the root cause of an event (a person who chose to do it, rather than a multi-factorial Organisational Error).
  • A "Person Model" not "Organisation Model" of errors, and
  • if informed, people will just stop making mistakes.
  • [and there is much more to it than this]
The remedy to the "Blame Cycle" is creating a "Safety Culture" which is where, in Deming's words, "Drive out Fear", is conscientiously and consistently practised.

All of which is correct, but doesn't explain three things:
  • Why after around 25 years of writing, research and implementations by Reason and Perrow and around 75 years since H.W. Henrich's "Industrial Accident Prevention, A Scientific Approach" (1931) are Blame Cultures still the norm, rather than the exception, even in High Safety environments like Healthcare. Aviation and space flight (e.g. NASA) seem to be leaders in the implementation and practice of the "Safety Culture" approach.
  • After more than a century of definitive, proven Management Science theories, why does the Default Management Style, of which the "Blame Culture" is one aspect, still prevail? It isn't just that better techniques/systems aren't known or aren't practiced, but that organisations revert from their good practices. World leaders, like Kodak and General Motors, stop their successful practices and go back to known worst practices and suffer terminal decline. How can this be so in a rational, well-informed world?
  • Individuals in teams and projects start out with good intentions and high hopes, only to end up mired in the tarpits of Blame. How can this happen over and over again? What is the common, systematic element, or where are the payoffs?

Here is my description of "The Blame Spiral", how things work in the Real World with the Default Management Style in I.T. Projects.

Jim McCarthy in "Dynamics of Software Development" (1995, 2006) made many pertinent observations about the software development process (positing 54 'laws'), which I've seen confirmed repeatedly in many settings. Two of which are:
  • The Team is the Software, the Software is the Team. All of the assumptions, biases and limitations of the individuals in the team and the dynamics/relationships of the Team show up in the Software. To have successful Software produced, it is first necessary to have a successful Team.
  • "Don't flip the Bozo Bit". If there is an error or an individual seems to do something stupid ('be a Bozo'), do NOT blame them, ever. It is toxic to the team and because the Team is the Software, it will destroy the project.
The effects of Blame operate at four levels:
  • identifying and fixing the real cause of the error, fault or failure,
  • the blamed individual(s),
  • those who've avoided blame (this time around), and
  • within the Team Context or the "management" view.

ROUND 1. The First Blaming.

The first individual to be assigned blame, rightly or wrongly, will feel bad in some way. Humiliated or guilty perhaps, resentful definitely, more than likely end up "with low self-esteem" (worse if not resilient) and will be guaranteed to be disengaged, disaffected and ambivalent towards the Organisation, their Manager and "management" in general, the Project/Product and possibly all or some of the other team members. Special vitriol and even hatred will be felt, even manifested, towards the person perceived to have "fingered" them, to have identified them as blameworthy.

Externally, the First Blamed may be stoic, even accepting of the situation. Internally, they will have at least withdrawn their commitment and may be actively seeking, if not revenge, then 'satisfaction' through either passive-aggressive acts or active undermining and sabotage, depending on their proficiency and predilection to "playing politics". They will not be contributing their best work and if not ostracised, will be spreading dissatisfaction and ill-will.

The others, non-Blamed Persons, in the team will feel a mixture of Relief (it wasn't me!) and Caution (I need to protect myself). They've learnt the workplace is not a completely Safe Environment and messages must be 'tempered' with a little self-interest.

The Management view is that "the squeaky wheel has been fixed", that a Problem (the Blamed) has been identified and swiftly and effectively dealt with before it had a chance to escalate and create real harm. Perhaps there the proactive Project Manager has publicly berated the First Blamed to "show what happens when you mess up around here". Senior Managers will view the Project Manager as decisive and effective.

Meanwhile, the root cause has not been identified nor corrected.
The initial problem has been papered over, ready to come back to haunt the team and undermine the project in every successive round.

ROUND 2. The Second Blaming.

Because the root cause has not been diagnosed and corrected, another Blameworthy event must arise (a an Error, Fault, Failure, Deadline Overrun or Delivery/Feature Shortfall).
It is unlikely to be seen as caused by the First Blamed, they should've been re-assigned to other duties.
The Second Blamed will go from feeling pleased with themselves to feeling worse than the First Blamed initially felt - they know what is coming and will experience dread at the thought. They experience a greater drop in self-esteem than the First Blamed - self-doubt, self-criticism and disaffection/disengagement will be greater in this individual.

Now the First Blamed has an ally and someone to commiserate with. They will form a mutual admiration society, bitching together about their 'team' mates, their Managers, the Organisation and everyone they've felt wronged by... The chance of escalating to active undermining and sabotage grows.

The rest of the 'team members' will probably feel a little more superior, they've dodged a bullet twice now, but their level of Caution/Concern over becoming Blamed will escalate, at some point to real Fear.  With difficult creative cognitive tasks, Fear immobilises and robs them of their abilities.
In this round, the 'Team' is now dissolved - it's "every man for himself", there is an underground "anti-management" faction, almost everyone is deliberately not doing their best work and every individuals creative potential is seriously compromised.

Truth is the first victim of Blame.

No project member will now be open and honest about problems they are experiencing, so they cannot openly ask for help and guidance, they will deliberately fudge their numbers/estimates and deliberately hide any evidence of problems in their code: to the point of active deception (faking test results, cooked interface responses, lying about work completed).

The Project Manager (PM) will see a very busy group, "just humming along" without any interruptions. On every measure, including notional progress, they will appear to be The Perfect Team comprised solely of World's Best Coders. The PM will be exceedingly happy with the effect his "tough hard-nosed approach" has created and will crow about his prowess to anyone that will listen, especially his superiors. Bonuses will be considered, the Project Group will be held up as the epitome of performance, quality and success and overall everyone but those on the Project will be patting themselves on the back admiring how clever they all are.

That the original problem was never found and fixed is long forgotten.
It has spawned a dozen work-arounds and a legion of smaller problems that nobody can or will identify. Each of these spawning more... The pace of work, frightened atmosphere and rush-to-deadline mean that all deep inspection and significant Quality reviews will be forgotten, avoided, faked or circumvented. Every day more faults are introduced and undetected, quickly rising to the point that the daily committed future work, in the form of time needed for future bug-fixes, exceeds the daily rate of progress.

The Project has entered the "Nett Negative Progress" zone. Every day of "production" pushes the delivery date further off, taking the "Team", the Software and Organisation farther away from its Goal every day... But all the metrics, estimates and reports being "fed up the line", tell a different story, a wonderful fantasy land of glowing results, outstanding progress and wonderful Zero Defect Software.

The Team and hence the Software, has disintegrated and turned Toxic, its only downhill from now on. The individuals are quite rationally protecting themselves as best they can and saying anything but the Truth.

ROUND 3. The Rest of the Blamings.

The project is now an official "Death March". Everybody at the coalface knows it is dead and irretrievable. They are all going through the motions, faking progress or pushing their problems "over the fence" to Integration Testing (an ideal Cover-Your-Ass (CYA) ploy. "It's was perfect when it left here"). Everyone is running scared, looking for ways out or up and cliques are forming for mutual self-protection. On the walls you're likely to see the age-old adage:
 "The Floggings will continue until Morale Improves!"

Meanwhile, management will have its Golden Haired Favourites, "super-programmers" it can wheel into any crisis and whom will beat any and all problems into submission in a trice. But this hacking and fudging only makes the Software worse in whole, pushing back further any possible completion date because old bugs are hidden or moved to unlikely places or new, very subtle bugs are introduced. The Golden Ones are "teflon coated", nothing can ever be laid at their feet, even if fully documented and proven.

Management at some stage has secured extra-funding and employed more people in an effort to "push this important project out the door". [The Standish Group reports that abandoned projects consume on average 200-300% of original budget before being cancelled]. The "Team" has grown significantly and has been broken into multiple groups working on 'sub-projects', usually led by the original members of the team. That "Brooks Law" has been documented for a half-century ("adding more people to a late project makes it later") is ignored. Somehow it doesn't apply to this group or The Project.

All cliques and power-groupings will still be submitting unrealistic schedules and estimates. This is a game called "Schedule Chicken" - whose lies will be uncovered first, or who can't effectively shift Blame onto others, loses. The best players, not coders, are the ones who can fudge their numbers so they never have to admit overshooting their deadlines. When another group is declared the loser ("look, you can't meet your deadline!"), all the other groups then use the extra time to work towards their deadlines. Of course, none of this appears in the projects' reports and metrics.

A wonderful side-effect is that overheads (communications) increase super-linearly. To double the output of any group, you have to triple the number of people in the group. With every hierarchical level added to the Project, the ratio of communications overhead increases. More meetings are needed, more decisions need to be explained, more "Compliance and Good Governance" steps are needed and the proportion of productive time spent programming declines...

Which, when you're in the "Nett Negative Production" zone, is A Good Thing, it actually slows the rate the deadline pushes back every day.

For "creatives", like programmers, stress and exhaustion, the inevitable result of organisational pressure to perform and long work hours, have a triple-whammy effect:
  • The absolute rate of production slows. The number of Lines of Code per person-day falls, often dramatically.
  • "Creatives don't do good, let alone their best, work when 'stressed and tired'". The "degree of difficulty" of problems that individuals can solve when chronically tired and stressed declines, as is well known in many areas where "Human Factors" are seriously studied, like NASA (think Apollo 13 astronauts at the end of the trip) or in various war-fighting specialities. The project might notionally have a band of high-performing Senior Programmers, but they've been "derated" to Ordinary or Junior capability.
  • The Undetected Error Rate (causing Rejects, Rework and Returns) increases super-linearly ('exponentially' is the colloquialism). Every extra hour work per week increases the average fault-rate for every hour worked. Every 5-10 hours additional worked at least doubles the average fault-rate. [At 35 hours, fault rate is '1unit/hr', so 35 units/week/coder. At 40 hrs, 80 units/week/coder. At 45 hrs, 180 faults/week/coder, 50 hrs, 400 units/week/coder...]
The Undetected Error Rate, creates committed future work of fault correction (rework). All rework also  has at least the same error injection rate, often much higher due to "distance" effects. Tired and Stressed coders are less capable in finding and fixing faults, with the added problem of unintentionally making things worse, often with rookie mistakes.

No single fault in any commercial programming environment takes under one-half programmer day to fix (ancillary time to check, document and track faults is routine administration and should not be on the Projects' critical path).

Rephrased, commercial programmers can pick up, analyse, find, fix, document and pass-on at best 2 faults per day. This becomes the (feedback) loop stability criterion for Software Development:
 The faults introduced per coder-day must be less than the Fault Fix Rate per coder-day for the Project to ever deliver.

The Project Manager will be working 120 hour weeks, barking at everyone and threatening "the direst of consequences" for anyone found not to be "pulling their weight". His Senior Managers will be mightily impressed with both his dedication and forceful 'control' of the project, lining him up for Bigger and Better Projects and possibly a path into Senior Management itself.

The ultimate Blamings, Firings, may now start.
The hard-thrusting PM will be wanting to "make an example of them for others" and "to show we are serious about achieving our targets".

Senior Management will be exceedingly impressed with this "standing up to the Unions/Workers" approach.

Those on the coalface will become more disengaged, fearful and disenfranchised. Many will give up trying because "what's the point? I work 80 hour week for terrible pay and can do nothing to please anyone".

Once the purges have started, those that can leave, the best, brightest and most knowledgable, will leave, first. The Project is now getting rid of exactly those people that can save it...

ROUND N. The End Game.

Finally, the Death March is ended, The Great Project is somehow wound up.
Senior Management have had many gut-wrenching meetings and "taken an extremely hard decision" to cancel, postpone or "Reset" (code for start-over, completely afresh) The Project.

Sometimes in Bureaucratic Double-Speak, the Project is declared finished and "A Great Success". The Project Office is shutdown, awards and bonuses given, promotions made and contractors and temps 'let go'. With the software never being deployed or going into full-scale production. A variation is some small functional subset, usually created in the first 6-12 months, will be put into production and this declared to be The Project Deliverable.

The bean-counters will have been consulted to create the best fiscal-reporting effect, especially if the "Great Success" route was taken.

The Project Manager will be feted and covered in praise.
Senior Management, who funded and allowed this rolling disaster, will do a Project Review and decide that "we encountered problems much harder than we anticipated", "there were severe technical issues we were unable to resolve" and "we need tighter processes in future to avoid the same pitfalls", etc...

None of which comes near the reality: Blame is toxic, it will kill every Project it is allowed to enter.

Which has led to this widespread cynical view from the trenches:
The five project phases:
  • Unbridled enthusiasm: Unrealistic promises 
  • Disillusionment
  • Panic
  • Investigation: Dodge blame, Search for scape-goats
  • Punish the innocent, Promote the guilty, Reward the uninvolved

There are two topics left to cover:
  • The promise in the title, "Blame Spirals ... and what to do about them.", and
  • "Blame Spirals" elsewhere.

What to do about Blame Spirals?

Follow Deming exhortation to "Drive out Fear", which if it was easy and simple, would be the norm not the exception. Generally, this behaviour can only flow from the top...
  • If you're on the coalface, you can work to become a Golden Haired Favourite, find a better project to work on, play Office Politics better than everyone else or take an extremely risky option: try to float above it all by demonstrating Open, Honest Communication and refusing to buy into the Blame Game. The downside is you'll set yourself up as a target.
  • If you're the Project Manager, you need to model Open, Honest Communication and actively try to engender Trust within the Team. This isn't simple or easy and generally takes courage and dedication. My best advice: get expert help and assistance, it does exist and unfortunately needs more than a one-day talk or reading a book.
  • If you're somewhere above the Project Manager, if your Projects don't reliably come in on-time and on-budget, especially if your Project teams have high churn, then you've got a problem. Again, get expert help and assistance, especially someone that can offer an on-going Mentoring/Coaching service.

"Blame Spirals" in non-Project contexts.

Projects are different to "Business as Usual" operations:
  • The first challenge is "can we complete this as designed at all, or even close enough"?
  • they are condensed, high-pressure and by definition non-routine, not "fully specified" and mostly ill-defined.
  • All required people/skills may not be available when needed, or at all, and necessary resources/tools may need to be built or created. The scheduling task runs opposite to routine production. Something not discussed at all outside the Industry is the 1000:1 (thousand-fold) performance variability in individuals and "nett negative producers", people who every day do more harm than good.
  • there is no working process to start with, as with "Business As Usual" (BAU). Projects are building the deliverable (product or process), constructing it in layers (dependencies). The effects of underlying problems are magnified due to consequential problems.
  • Deadlines are aspirational because the design and build task is undefined and uncertain, not because 
  • Every project is different and the exact solution path unknown, with challenges waiting to be discovered.  By definition, it's a voyage of discovery, otherwise it's a defined, repeatable "Business As Usual" task. Projects are about dealing with uncertainty, 'discovery', challenges and the unknown.
I.T. Projects suffer all these constraints, along with:
  • the deliverables are intangible, invisible and often unmeasurable/unquantifiable. Outsiders can't turn-up on site and see progress, making the normal large I.T. Project methodology, "Big Bang" (a single-event deliverable of 'everything', versus frequent small/incremental releases) doubly wrong. Problems and delays are only visible at the end when its too late to do anything about them: redefine goals/deadlines or correct course.
Projects are about keeping Promises that usually someone else has made on your behalf.
They aren't just full of Rumsfelds' "Known Unknowns" but are guaranteed to be riddled with "Unknown Unknowns".

What is essential to Project 'success' is the antithesis of the simplistic model:
"Plan, Schedule, Control, Deliver" (guaranteed to fail in the face of even minor challenges).
To navigate the shoals of "Unknown Unknowns" inherent in I.T. Projects, the Project Manager, the Project Sponsor, the team and the Methodology must be:
  • flexible,
  • adaptable,
  • responsive,
  • innovative,
  • truthful and courageous.
Being rigid and inflexible, unable to request assistance or clarifications, or unable to negotiate changes is the kiss of death for any I.T. Project.

So how does the "Blame Spiral" look on a Production Line, office or providing routine services?

Most work environments, including I.T., cannot suffer the extreme breakdown experienced in I.T. Projects because some, or all, the pre-conditions are missing. If you know how to reliably and economically deliver products, services, tasks on-time and to-specificiaion, the consequential dependencies required for the "Blame Spiral" are not possible.

But the Blaming component of the "Default Management Style" still exists. It will express itself in the less extreme "Blame Cycle" of James Reason and/or institutionalised workplace bullying.

BAU tasks are not without variability and uncertainty, but it is constrained and completely new situations are extremely rare - experienced staff can anticipate and correct for "process deviations". An oil refinery needs to be constantly adjusting its process to the feed-stock, the products made and the various maintenance/upgrade activities. The focus is on maximising Plant performance, economics and meeting delivery schedules rather than "just make it work".

There are two obvious Industries where the full "Blame Spiral" can develop, Aviation and Healthcare.
They have aspects in common with I.T. Projects and a few of their own:
  • There are intangible, undefined Outcome Measures: "Safety" and "Quality of Care/Service"
  • "Not twice the same." No two cases or services are identical, hence cookie-cutter solutions will turn deadly for a significant minority of people.
  • "Unknowns", both "Known" and "Unknown" are endemic and coping easily and flexibly with them is central to success.
  • "Success" is ill-defined and a slippery concept.
  • "Efficient and Effective" performance is impossible to recognise in the absence of precise data collection and careful analysis - all of which will be resisted as "a waste of time" is the system is in overload. If data is collected and analysed, the delay means inefficient temporary staff (necessary when overloaded) won't be detected in a timely manner.
  • Problems don't present simply. Correct diagnosis is difficult because there is no simple, consistent mapping of symptoms to diagnosis and treatment can be time-critical. In both Aviation and Healthcare a trivial problem can become deadly very quickly if just one usual constraint is changed. e.g. Aircraft doesn't have room to manoeuvre or patient is highly allergic to the usual drugs.
  • There is no "finish line". If staff are efficient and effective hence create some "discretionary" (vs committed "reactionary") time to improve systems, management will "for efficiency reasons" cut hours/resources until staff are again overloaded (100% committed to "reacting").
  • Rewards are perverse. Inefficient areas, constantly overloaded and in continual crisis are given more management attention and increased, but insufficient, resources - taken from efficient areas. It's a management sin to underbid the yearly budget, it is doubly wrong for a manager to underspend in any year - not only do they lose the money in that year, but in every future year.
  • Heroic performances are lauded and praised, whilst the unglamorous act of incident-free service from good planning and preparation is dismissed as "you had it easy".
James Reasons' "Blame Cycle" is detailed, correct and useful, but misses two important points clearly seen in I.T. Projects:
  • The interaction of Blame with the non-rational, uninformed "Default Management Style", and 
  • the psychological dimension: the predictable reaction of individuals, groups and organisations to Blaming in circumstances that can spiral out of control.
Simplistic Safety and Quality systems, based on formulaic, inflexible action/response "protocols" not only cannot cope with the complex, variable everyday challenges of systems with intangible, undefined Outcomes, but push the organisation down the "Blame Spiral" into Toxic collapse and overwhelm.

Demings' exhortation of "Drive out Fear" is the solution, but must be imposed from the top down. This requires determination and consistency of purpose all through the management chain. Along with the identification and elimination of perverse incentives and outcomes.

No comments:

Post a Comment