Categories
Functional Safety software safety

Updating Legal Presumptions for Computer Reliability

TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice!

Background

The ‘Horizon’ Scandal in the UK was a major miscarriage of justice:

Between 1999 and 2015, over 900 sub postmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions carried out by the Post Office. Other sub postmasters were prosecuted but not convicted, forced to cover Horizon shortfalls with their own money, or had their contracts terminated. The court cases, criminal convictions, imprisonments, loss of livelihoods and homes, debts and bankruptcies, took a heavy toll on the victims and their families, leading to stress, illness, family breakdown, and at least four suicides.

Wikipedia, British Post Office scandal

‘Horizon’ was a faulty computer system, produced by Fujitsu.  The Post Office had lobbied the British Government to reverse the burden of proof so that courts assumed that computer systems were reliable until proven otherwise.  This made it very difficult for sub-postmasters – small-business franchise owners – to defend themselves in court.

A 1984 act of parliament ruled that computer evidence was only admissible if it could be shown that the computer was used and operating properly. But that act was repealed in 1999, just months before the first trials of the Horizon system began. When post office operators were accused of having stolen money, the hallucinatory evidence of the Horizon system was deemed sufficient proof. Without any evidence to the contrary, the defendants could not force the system to be tested in court and their loss was all but guaranteed.

Alex Hern writing in The Guardian in January 2024.

This shocking miscarriage of justice was based on an equally shocking presumption.  One that anyone with a background in software development would find ridiculous. 

Introduction 

Legal experts warn that failure to immediately update laws regarding computer reliability could lead to a recurrence of scandals like the Horizon case. Critics argue that the current presumption of computer reliability shifts the burden of proof in criminal cases, potentially compromising fair trials.

The Presumption of Computer Reliability

English and Welsh law assume computers to be reliable unless proven otherwise, a principle criticized for its reversal of the burden of proof. Stephen Mason, a leading barrister in electronic evidence, emphasizes the unfairness of this presumption, stating it impedes individuals from challenging computer-generated evidence.

It is also patently unrealistic.  As I explain in my article on the Principles of Safe Software Development, there are numerous examples of computer systems going wrong:

  • Drug Infusion Pumps,
  • The NASA Mars Polar Lander,
  • The Airbus A320 accident at Warsaw,
  • Boeing 777 FADEC malfunction,
  • Patriot Missile Software Problem in Gulf War II, and many more…

Making software dependable or safe requires enormous effort and care.

Historical Context and the Horizon Scandal

Dating back to an old common law principle, presuming the reliability of mechanical systems, the UK Post Office also lobbied to have the principle applied to digital systems. The implications of this change became evident during the Horizon scandal, where flawed computer evidence led to wrongful accusations against post office operators. Repealing a 1984 act further weakened safeguards against unreliable computer evidence, exacerbating the issue.

International Influence and Legal Precedents

The influence of English common law extends internationally, perpetuating the presumption of computer reliability in legal systems worldwide. Mason highlights cases from various countries supporting this standard, underscoring its global impact.

“[The Law] says, for the person who’s saying ‘there’s something wrong with this computer’, that they have to prove it. Even if it’s the person accusing them who has the information.”

Stephen Mason

Modern Challenges and the Rise of AI

Advancements in AI technology intensify the need to reevaluate legal presumptions. Noah Waisberg, CEO of Zuva, warns against assuming the infallibility of AI systems, which operate probabilistically and may lack consistency.

With a traditional rules-based system, it’s generally fair to assume that a computer will do as instructed. Of course, bugs happen, meaning it would be risky to assume any computer program is error-free…Machine-learning-based systems don’t work that way. They are probabilistic … you shouldn’t count on them to behave consistently – only to work in line with their projected accuracy…It will be hard to say that they are reliable enough to support a criminal conviction.

Noah Waisberg

This poses significant challenges in relying on AI-generated evidence for criminal convictions.

Section 5: Proposed Legal Reforms

James Christie is a software consultant, who co-authored recommendations for an update to the UK law.  He proposes two-stage reforms to address the issue.

The first would require providers of evidence to show the court that they have developed and managed their systems responsibly, and to disclose their record of known bugs … If they can’t … the onus would then be on the provider of evidence to show the court why none of these failings or problems affect the quality of evidence, and why it should still be considered reliable.

James Christie

First, evidence providers must demonstrate responsible development and management of their systems, including disclosure of known bugs. Second, if unable to do so, providers must justify why these shortcomings do not affect the evidence’s reliability.

The Reality of Software Development

First of all, we need to understand how mistakes made in software can lead to failures and ultimately accidents.

Errors in Software Development

This is illustrated well by this standard BS 5760. We see that during development people, either on their own or using tools make mistakes. That’s inevitable. And there will be many mistakes in the software – as we will see. These mistakes can lead to faults or defects being present in the software. Again, inevitably, some of them get through.

BS 5760-8:1998. Reliability of systems, equipment and components. Guide to assessment of the reliability of systems containing software

If we jump over the fence, the software is now in use. All these faults are in the software but they lie hidden. Until that is, some revealing mechanism comes along and triggers them. That revealing mechanism might be a change in the environment and operator scenario or changing inputs that maybe the software is seeing from sensors.

That doesn’t mean that a failure is inevitable because lots of errors don’t lead to failures that matter. But some do. And that is how we get from mistakes to false or defects in the software to run time errors.

What Happens to Errors in Software Products?

A long time ago (1984!), a very well-known paper in the IBM Journal of Research looked at how long it took faults in IBM operating system software to become failures for the first time. We are not talking about cowboys producing software on the web that may or may not work okay, or people in their bedrooms producing apps. We’re talking about a very sophisticated product here that it was in use all around the world.

Yet, what Adams found was that lots of software faults took more than 5,000 operating years to be revealed. He found that more than 90% of faults in the software would take longer than 50 years to become failures.

‘Optimizing Preventive Service of Software Products’ Edward N. Adams, IBM Journal of Research and Development, 1984, Vol 28, Iss. 1

There are two things that Adams’s work tells us.

First, in any significant piece of software, there is a huge reservoir of faults waiting to be revealed. So if people start telling you that their software contains no defects or faults, either they’re dumb enough to believe that or they think you are. What we see in reality is that even in a very high-quality software product, there are a lot of latent defects.

Second, many of them – the vast majority of them – will take a long, long time to reveal themselves. Testing will not reveal them. Using Beta versions will not reveal them. Fifty years of use will not reveal them. They’re still there.

[This Section is a short extract from my course Principles of Safe Software Development.]

Conclusion

Legal experts stress the urgency of updating laws to reflect the fallibility of computers, crucial for ensuring fair trials and preventing miscarriages of justice. The UK Ministry of Justice acknowledges the need for scrutiny, pending the outcome of the Horizon inquiry, signaling a potential shift towards addressing issues of computer reliability in the legal framework.

Hopefully, the legal people will come to realize what software engineers have known for a long time.  Software reliability is difficult to achieve and must be demonstrated.

Categories
Blog Functional Safety System Safety

Identify and Analyze Functional Hazards

So, how do we identify and analyze functional hazards? I’ve seen a lot of projects and programs. We’re great at doing the physical hazards, but not so good at the functional hazards.

Introduction: Identify and Analyze Functional Hazards

So, when I talk about physical and functional hazards, the physical stuff, I think we’re probably all very familiar with them. They’re all to do with energy and toxicity.

Physical Hazards

So with energy, it might be fire, it might be electric shock. Potential energy, the potential energy of someone at height, or something falling. The impact of the kinetic energy. And then of course, in terms of toxicity, we’ve got hazardous chemicals, which we have to deal with. And then we’ve got biological hazards, plus smoke and toxic gasses, often from fires. Or chemical reactions.

So those are your physical hazards. As I said, we tend to be good at dealing with those. We’re used to dealing with that stuff. And most projects I’ve been on have been pretty good at identifying and analyzing that stuff. Not so for functional hazards.

Functional Hazards

I’ve been on lots of projects still today where functional hazards are just ignored completely or they’re only dealt with partially. So let’s explain what I mean about functional hazards. What we’re talking about is where a system is required to do something to perform some function. For example, cars move. They start, they move and they stop, hopefully.

Loss of Function

But what happens when those functions go wrong? What happens when we don’t get the function when we need it? The brakes fail on your car, for example. And so that’s a fairly obvious one. When functional hazards are looked at, it’s usually the functional failures that get attention.

But if that is the obvious failure mode, the less obvious failure modes tend to be more dangerous and there are the two.

Other Functional Failure Modes

So what happens if things work when they shouldn’t? What if you’re driving along on a road or the motorway, perhaps at high speed, and your brakes slam on for no apparent reason? Perhaps there is somebody behind you. Do you have a collision or do you lose control on the road and crash?

What if the function works, but it works incorrectly? For example, you turn the temperature down but instead, it goes up. Or you steer to the left, but instead, your vehicle goes to the right.

What if a display shows the wrong information? If you’re in a plane, maybe you’ve got an altimeter that tells you how high you are. It would be dangerous if the altimeter told you that you were level or climbing, but you were descending towards the ground. Yeah, we’ve had lots of that kind of accident.

So there’s an overview of what I mean by physical and functional hazards.

The Webinar: Identify and Analyze Functional Hazards

See the whole webinar at the Safety Engineering Academy. (You can get discounts on membership by subscribing to my free emails.)

Course Curriculum

  1. Introduction
  2. Preliminary Hazard Identification (PHI)
  3. Functional Failure Analysis
  4. Functional Hazard Analysis (FHA)

There are 11 lessons with two-and-a-half hours of video content, plus other resources. See the Foundations of System Safety here.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Blog Functional Safety

Functional Safety

The following is a short, but excellent, introduction to the topic of ‘Functional Safety’ by the United Kingdom Health and Safety Executive (UK HSE). It is equally applicable outside the UK, and the British Standards (‘BS EN’) are versions of international ISO/IEC standards – e.g. the Australian version (‘AS/NZS’) is often identical to the British standard.

My comments and explanations are shown [thus].

[Functional Safety]

“Functional safety is the part of the overall safety of plant and equipment that depends on the correct functioning of safety-related systems and other risk reduction measures such as safety instrumented systems (SIS), alarm systems and basic process control systems (BPCS).

[Functional Safety is popular, in fact almost ubiquitous, in the process industry, where large amounts of flammable liquids and gasses are handled. That said, the systems and techniques developed by and for the process industry have been so successful that they are found in many other industrial, transport and defence applications.]

SIS [Safety Instrumented Systems]

SIS are instrumented systems that provide a significant level of risk reduction against accident hazards.  They typically consist of sensors and logic functions that detect a dangerous condition and final elements, such as valves, that are manipulated to achieve a safe state.

The general benchmark of good practice is BS EN 61508, Functional safety of electrical/electronic/programmable electronic safety related systems. BS EN 61508 has been used as the basis for application-specific standards such as:

  • BS EN 61511: process industry
  • BS EN 62061: machinery
  • BS EN 61513: nuclear power plants

BS EN 61511, Functional safety – Safety instrumented systems for the process industry sector, is the benchmark standard for the management of functional safety in the process industries. It defines the safety lifecycle and describes how functional safety should be managed throughout that lifecycle. It sets out many engineering and management requirements, however, the key principles of the safety lifecycle are to:

  • use hazard and risk assessment to identify requirements for risk reduction
  • allocate risk reduction to SIS or to other risk reduction measures (including instrumented systems providing safety functions of low / undefined safety integrity)
  • specify the required function, integrity and other requirements of the SIS
  • design and implement the SIS to satisfy the safety requirements specification
  • install, commission and validate the SIS
  • operate, maintain and periodically proof-test the SIS
  • manage modifications to the SIS
  • decommission the SIS

BS EN 61511 also defines requirements for management processes (plan, assess, verify, monitor and audit) and for the competence of people and organisations engaged in functional safety.  An important management process is Functional Safety Assessment (FSA) which is used to make a judgement as to the functional safety and safety integrity achieved by the safety instrumented system.

Alarm Systems

Alarm systems are instrumented systems designed to notify an operator that a process is moving out of its normal operating envelope to allow them to take corrective action.  Where these systems reduce the risk of accidents, they need to be designed to good practice requirements considering both the E,C&I design and human factors issues to ensure they provide the necessary risk reduction.

In certain limited cases, alarm systems may provide significant accident risk reduction, where they also might be considered as a SIS. The general benchmark of good practice for management of alarm systems is BS EN 62682.

BPCS [Basic Process Control Systems]

BPCS are instrumented systems that provide the normal, everyday control of the process.  They typically consist of field instrumentation such as sensors and control elements like valves which are connected to a control system, interfaced, and could be operated by a plant operator.  A control system may consist of simple electronic devices like relays or complicated programmable systems like DCS (Distributed Control System) or PLCs (Programmable Logic Controllers).

BPCS are normally designed for flexible and complex operation and to maximize production rather than to prevent accidents.  However, it is often their failure that can lead to accidents, and therefore they should be designed to good practice requirements. The general benchmark of good practice for instrumentation in process control systems is BS 6739.”

[To be honest, I would have put this the other way around. The BCPS came first, although they were just called ‘control systems’, and some had alarms to get the operators’ attention. As the complexity of these control systems increased, then cascading alarms became a problem and alarms had to be managed as a ‘thing’. Finally, the process industry used additional systems, when the control system/alarm system combo became inadequate, and thus the terms SIS and BCPS were born.]

[It’s worth noting that for very rapid processes where a human either cannot intervene fast enough or lacks the data to do so reliably, the SIS becomes an automatic protection system, as found in rail signaling systems, or ‘autonomous’ vehicles. Also for domains where there is no ‘fail-safe’ state, for example in aircraft flight control systems, the tendency has been to engineer multiple, redundant, high-integrity control systems, rather than use a BCPS/SIS combo.]

Copyright

The above text is reproduced under Creative Commons Licence from the UK HSE’s webpage. The Safety Artisan complies with such licensing conditions in full.

[Functional Safety – END]

Back to Home Page