Performance Based Regulation (PBR) and Detecting the Pathogens
At a time when Performance Based Regulation (PBR) is a hot topic in the aviation industry, a series of rail accidents in North America help demonstrate the type of poor performance that PBR must successfully detect. These accidents were what James Reason, Professor Emeritus, University of Manchester described as ‘organisational accidents’ in his classic 1997 book Managing the Risks of Organizational Accidents. Reason explained that:
Organizational accidents have multiple causes involving many people operating at different levels of their respective companies.
Such accidents result from ‘latent organisational failures’ that are, according to Reason, like pathogens that have infected the organisation. A key challenge for an organisation’s Safety Management System (SMS) is detect latent pathogens before they cause harm. PBR needs to give the regulator assurance that the organisation’s SMS is vigilant and effective at doing that.
As we recently discussed, the US National Transportation Safety Board (NTSB) published a special investigation report into the organisational factors that emerged after five accidents at Metro-North (discussed at a special hearing). Metro-North is the second largest commuter railroad, and one of the busiest, in the United States. Between May 2013 and March 2014 Metro-North had five significant accidents resulting in 6 fatalities, 126 injuries and more than $28 million in damages. In 2012 the Federal Rail Road Administration (FRA) issued a Notice of Proposed Rule Making (NPRM) that would require a ‘system safety program’, which the NTSB likens to an SMS in other industries. The NPRM states:
Since most of these are procedures, processes, and programs railroads should already have in place, the railroads would most likely only have to identify and describe such procedures, processes, and programs to comply with the regulation.
Similar statements have been made in other similar rule-making initiatives in other industries. They help defuse potential complaints about extra red-tape but do raise the question: ‘so is there a real benefit to the proposed regulation’? The prime benefit is of course ensuring that organisations that would not operate an effective SMS voluntarily, at least have to justify their SMS performance to an independent regulator. NTSB observe that:
Metro-North has for many years had an SSPP [System Safety Program Plan] that presumably will fulfill the proposed regulatory requirement for such a program. However, while the NTSB investigations found Metro-North had a written SSPP, its implementation was very limited and represented little more than a paperwork exercise. Few Metro-North employees even knew the program existed. The identified deficiencies in the Metro-North SSPP implementation provide a cautionary example to FRA as it finalizes the proposed regulation.
They also note that:
A management systems approach will require cultural change at the [regulator] as well as in the industry.
The US Federal Aviation Administration (FAA) is has introduced their own Part 5 Safety Management System (SMS) requirement Part 121 carriers (as we discussed in June 2015) so the NTSB have issued a timely reminder. We have however previously expressed concerns that the FAA’s fondest for fines may undermine that implementation.
The Transportation Safety Board of Canada (TSB), in its final report on the crude oil train derailment and fire that killed 47 people on 6 July 2013 at Lac-Megantic, Quebec expressed concerns about how the regulator, Transport Canada (TC) dealt with SMS regulation (emphasis added):
…the first SMS audit to assess the effectiveness of the company’s safety management processes took place in 2010, which was 7 years after the company was found to be in compliance with the SMS Regulations. During this audit, inspectors were informed that the SMS had not yet been implemented because the company was awaiting regulatory approval. TC then clarified with MMA [Montreal, Maine & Atlantic Railway] that TC does not approve a railway’s SMS.
A second SMS audit was conducted in 2012, and focused on a very limited subset of SMS elements. …many of the deficiencies in MMA’s SMS that came to light through the audit process were never resolved. For example, weaknesses in MMA’s risk assessment process were identified during TC’s pre-audit in 2003. The 2010 audit found that risk assessments were being conducted only for major operational changes. Since that time, very few risk assessments had been conducted…
The absence of an internal audit procedure at MMA was first identified during TC’s pre-audit in 2003, and again in the 2010 SMS audit. An internal audit procedure had not been developed, and no internal SMS audits had taken place at MMA. Other weaknesses in MMA’s SMS, including the fact that the toll-free number for reporting safety concerns was not being used…
Although TC inspections identified problems at MMA between 2003 and 2010, and it was clear to TC that MMA’s SMS was not effective, no SMS audits were conducted in that time frame. The 2010 TC audit determined that MMA had not implemented its SMS. The limited number and scope of SMS audits that were conducted by TC Quebec Region, as well as the absence of a follow-up procedure to ensure MMA’s corrective action plans had been implemented, contributed to the fact that systemic weaknesses in MMA’s SMS remained unaddressed.
An organization with a strong safety culture is generally proactive when it comes to addressing safety issues. MMA was generally reactive. There were also significant gaps between the company’s operating instructions and how work was done day to day. This and other signs in MMA’s operations were indicative of a weak safety culture—one that contributed to the continuation of unsafe conditions and unsafe practices, and significantly compromised the company’s ability to manage risk.
When the investigation looked carefully at MMA’s operations, it found that employee training, testing, and supervision were not sufficient, particularly when it came to the operation of hand brakes and the securement of trains. Although MMA had some safety processes in place and had developed a safety management system in 2002, the company did not begin to implement this safety management system until 2010—and by 2013, it was still not functioning effectively.
TSB identified 18 distinct causes and contributing factors, “many of them influencing one another” and many of whhich should have been detectable by an observant regulator:
This isn’t the only example of a stalled regulatory audit programme. We previously discussed one at a Canadian air operator: Culture + Non Compliance + Mechanical Failures = DC3 Accident. In that accident report TSB comment:
While a move towards SMS has great potential to enhance safety by encouraging operators to put in place a systemic approach to proactively manage safety, the regulator must also have assurances of compliance with existing regulations, particularly for operators that have demonstrated a reluctance to exceed minimum regulatory compliance.
In order to assess regulatory compliance, and hence whether risks are sufficiently mitigated, inspectors must have appropriate processes and carry out detailed inspections of actual operating procedures and practices.
The current approach to regulatory oversight, which focuses on an operator’s SMS processes almost to the exclusion of verifying compliance with the regulations, is at risk of failing to address unsafe practices and conditions.
If TC does not adopt a balanced approach that combines inspections for compliance with audits of safety management processes, unsafe operating practices may not be identified, thereby increasing the risk of accidents.
Of course the organisations discussed above should be the rare exception. A key to PBR is directing regulatory attention to the weaker organisations with marginal compliance, poor systems and weak cultures before an accident.
Prof Sidney Dekker comments on the danger that an SMS can become a “self-referential system”: a system that just exists for itself and is a sponge for data but one from which intelligence never emerges.
For more on the general topic of PBR see this 2002 paper from the Harvard John F Kennedy School of Government: Performance-Based Regulation Prospects and Limitations in Health, Safety and Environmental Protection
Also see this piece on lessons from the formation of the UK Military Aviation Authority (MAA): Regulatory Reflections & Resisting the Seduction of the Risk Management Process
UPDATE 9 Sept 2015: The Buffalo accident was used as an SMS case study at a European Aviation Safety Agency (EASA) Workshop in Cologne. It was stated that EASA would address the lessons from this accident by:
- Phased approach
- Stakeholder involvement
- Maintain compliance backstops
- Balance the split between rules and AMCs
- Combine safety management system assessments with audits for regulatory compliance
We have also subsequently discussed other examples of possibly lax regulatory oversight in Canada:
- UPDATE 19 June 2016: HEMS Black Hole Accident: “Organisational, Regulatory and Oversight Deficiencies”
- UPDATE 17 July 2016: Fatal Flaws in Canadian Medevac Service
- UPDATE 19 August 2016: Canadian KA100 Fuel Exhaustion Accident This accident highlights important human factors, competence and regulatory oversight issues.
UPDATE 28 August 2016: We look at an EU research project that recently investigated the concepts of organisational safety intelligence (the safety information available) and executive safety wisdom (in using that to make safety decisions) by interviewing 16 senior industry executives: Safety Intelligence & Safety Wisdom. They defined these as:
Safety Intelligence the various sources of quantitative information an organisation may use to identify and assess various threats.
Safety Wisdom the judgement and decision-making of those in senior positions who must decide what to do to remain safe, and how they also use quantitative and qualitative information to support those decisions.
UPDATE 31 December 2016: TC has imposed C$409k of civil penalties in 2016.
UPDATE 17 January 2017: We discuss new UK CAA guidance: Performance Based Oversight: Accountable Manager Meetings (CAP1508)‘
UPDATE 19 January 2018: All 3 MMA rail workers acquitted in Lac-Mégantic disaster trial: Locomotive driver and 2 others found not guilty of criminal negligence causing 47 deaths
UPDATE 26 January 2018: The night the Swiss Cheese holes lined up
After nearly three months of emotional courtroom proceedings and nine days of jury deliberations, three former employees of Ed Burkhart’s now-defunct Montreal, Maine & Atlantic Railway were found not guilty of all charges arising from the 2013 Lac-Mégantic oil train disaster.
One of the best assessments of the Crown’s weak case came from Railroad Workers United (RWU). In its celebratory news release after the not guilty verdict, RWU stated, “While the prosecution focused largely on a single event —the alleged failure of the locomotive engineer to tie enough handbrakes—they were tripped up at every turn by their own witnesses—government, company, ‘expert’ and otherwise—who, by their testimony, incriminated the company and the government regulators rather than the defendants.”
On the night of the disaster, it is likely that if only one of the management decisions had been different, or if only one of the equipment conditions had not been present, the trio of railroaders would have never been in a courtroom.
The mechanical condition of lead locomotive no. 5017 leads to one the event’s many “If Only” situations:
- If only operations manager Dematrie had not dismissed the July 4, 2013 report by engineer Francois Daigle that 5017 was belching black exhaust plumes.
- If only 5017 not been rewired in a way that violated TC safety regulations.
- If only 5017 had not caught fire.
- If only an MM&A employee with air brake system knowledge had been sent to check on the train after 5017’s fire had been extinguished.
MM&A’s “generally reactive” approach to safety, rather than a proactive one, was a major construct of the shoddy safety culture identified by TSB, which also found, “There were significant gaps between the company’s operating instructions and how the work was done day to day.”
The testimony of MM&A’s former safety and training supervisor provided an unflattering portrait of MM&A management. Michael Horan was on the witness stand for six days. During one particularly rigorous cross-examination, Horan told the court he had no formal training in safety education, had no budget, and needed prior authorization to use his company credit card.
MM&A had implemented its SMS in 2002. However, TSB stated TC never audited MM&A’s SMS until 2010, and that other prior inspections showed “clear indications” the SMS was not working properly. One of those clear indications was the discovery by Canadian investigators of another improperly secured MM&A oil train on July 8, 2013, while Lac-Megantic’s downtown was still smoldering.
UPDATE 5 February 2018: However MMA and a number of workers were convicted on lesser charges: MMA and former employees plead guilty, fined $1.25 million in Lac-Megantic case
UPDATE 8 February 2018: The UK Rail Safety and Standards Board (RSSB) say: Future safety requires new approaches to people development They say that in the future rail system “there will be more complexity with more interlinked systems working together”:
…the role of many of our staff will change dramatically. The railway system of the future will require different skills from our workforce. There are likely to be fewer roles that require repetitive procedure following and more that require dynamic decision making, collaborating, working with data or providing a personalised service to customers. A seminal white paper on safety in air traffic control acknowledges the increasing difficulty of managing safety with rule compliance as the system complexity grows: ‘The consequences are that predictability is limited during both design and operation, and that it is impossible precisely to prescribe or even describe how work should be done.’
Since human performance cannot be completely prescribed, some degree of variability, flexibility or adaptivity is required for these future systems to work.
- Invest in manager skills to build a trusting relationship at all levels.
- Explore ‘work as done’ with an open mind.
- Shift focus of development activities onto ‘how to make things go right’ not just ‘how to avoid things going wrong’.
- Harness the power of ‘experts’ to help develop newly competent people within the context of normal work.
- Recognise that workers may know more about what it takes for the system to work safety and efficiently than your trainers, and managers.
UPDATE 16 February 2018: Quebec’s Director of Criminal and Penal Prosecutions will not appeal the not-guilty verdicts reached by the jury on the three rail workers in the accident.
UPDATE 3 April 2018: Lac-Mégantic disaster: No criminal charges will be filed against MMA. It appears difficult to pursue a company under Canadian law if no employee has been convicted, though in this case only front line staff were prosecuted.
UPDATE 28 July 2018: Performance-based regulations have changed oversight responsibilities: When it comes to regulating the aviation industry, focusing on an organisation’s performance can pay large safety dividends, says Stephanie Shaw, the UK CAA’s Head of PBR.
UPDATE 9 February 2019: Meeting Your Waterloo: Competence Assessment and Remembering the Lessons of Past Accidents No one was injured in this low speed derailment after signal maintenance errors but investigators expressed concern that the lessons from the fatal triple collision at Clapham in 1988 may have been forgotten.