Categories
Functional Safety software safety

Updating Legal Presumptions for Computer Reliability

TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice!

Background

The ‘Horizon’ Scandal in the UK was a major miscarriage of justice:

Between 1999 and 2015, over 900 sub postmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions carried out by the Post Office. Other sub postmasters were prosecuted but not convicted, forced to cover Horizon shortfalls with their own money, or had their contracts terminated. The court cases, criminal convictions, imprisonments, loss of livelihoods and homes, debts and bankruptcies, took a heavy toll on the victims and their families, leading to stress, illness, family breakdown, and at least four suicides.

Wikipedia, British Post Office scandal

‘Horizon’ was a faulty computer system, produced by Fujitsu.  The Post Office had lobbied the British Government to reverse the burden of proof so that courts assumed that computer systems were reliable until proven otherwise.  This made it very difficult for sub-postmasters – small-business franchise owners – to defend themselves in court.

A 1984 act of parliament ruled that computer evidence was only admissible if it could be shown that the computer was used and operating properly. But that act was repealed in 1999, just months before the first trials of the Horizon system began. When post office operators were accused of having stolen money, the hallucinatory evidence of the Horizon system was deemed sufficient proof. Without any evidence to the contrary, the defendants could not force the system to be tested in court and their loss was all but guaranteed.

Alex Hern writing in The Guardian in January 2024.

This shocking miscarriage of justice was based on an equally shocking presumption.  One that anyone with a background in software development would find ridiculous. 

Introduction 

Legal experts warn that failure to immediately update laws regarding computer reliability could lead to a recurrence of scandals like the Horizon case. Critics argue that the current presumption of computer reliability shifts the burden of proof in criminal cases, potentially compromising fair trials.

The Presumption of Computer Reliability

English and Welsh law assume computers to be reliable unless proven otherwise, a principle criticized for its reversal of the burden of proof. Stephen Mason, a leading barrister in electronic evidence, emphasizes the unfairness of this presumption, stating it impedes individuals from challenging computer-generated evidence.

It is also patently unrealistic.  As I explain in my article on the Principles of Safe Software Development, there are numerous examples of computer systems going wrong:

  • Drug Infusion Pumps,
  • The NASA Mars Polar Lander,
  • The Airbus A320 accident at Warsaw,
  • Boeing 777 FADEC malfunction,
  • Patriot Missile Software Problem in Gulf War II, and many more…

Making software dependable or safe requires enormous effort and care.

Historical Context and the Horizon Scandal

Dating back to an old common law principle, presuming the reliability of mechanical systems, the UK Post Office also lobbied to have the principle applied to digital systems. The implications of this change became evident during the Horizon scandal, where flawed computer evidence led to wrongful accusations against post office operators. Repealing a 1984 act further weakened safeguards against unreliable computer evidence, exacerbating the issue.

International Influence and Legal Precedents

The influence of English common law extends internationally, perpetuating the presumption of computer reliability in legal systems worldwide. Mason highlights cases from various countries supporting this standard, underscoring its global impact.

“[The Law] says, for the person who’s saying ‘there’s something wrong with this computer’, that they have to prove it. Even if it’s the person accusing them who has the information.”

Stephen Mason

Modern Challenges and the Rise of AI

Advancements in AI technology intensify the need to reevaluate legal presumptions. Noah Waisberg, CEO of Zuva, warns against assuming the infallibility of AI systems, which operate probabilistically and may lack consistency.

With a traditional rules-based system, it’s generally fair to assume that a computer will do as instructed. Of course, bugs happen, meaning it would be risky to assume any computer program is error-free…Machine-learning-based systems don’t work that way. They are probabilistic … you shouldn’t count on them to behave consistently – only to work in line with their projected accuracy…It will be hard to say that they are reliable enough to support a criminal conviction.

Noah Waisberg

This poses significant challenges in relying on AI-generated evidence for criminal convictions.

Section 5: Proposed Legal Reforms

James Christie is a software consultant, who co-authored recommendations for an update to the UK law.  He proposes two-stage reforms to address the issue.

The first would require providers of evidence to show the court that they have developed and managed their systems responsibly, and to disclose their record of known bugs … If they can’t … the onus would then be on the provider of evidence to show the court why none of these failings or problems affect the quality of evidence, and why it should still be considered reliable.

James Christie

First, evidence providers must demonstrate responsible development and management of their systems, including disclosure of known bugs. Second, if unable to do so, providers must justify why these shortcomings do not affect the evidence’s reliability.

The Reality of Software Development

First of all, we need to understand how mistakes made in software can lead to failures and ultimately accidents.

Errors in Software Development

This is illustrated well by this standard BS 5760. We see that during development people, either on their own or using tools make mistakes. That’s inevitable. And there will be many mistakes in the software – as we will see. These mistakes can lead to faults or defects being present in the software. Again, inevitably, some of them get through.

BS 5760-8:1998. Reliability of systems, equipment and components. Guide to assessment of the reliability of systems containing software

If we jump over the fence, the software is now in use. All these faults are in the software but they lie hidden. Until that is, some revealing mechanism comes along and triggers them. That revealing mechanism might be a change in the environment and operator scenario or changing inputs that maybe the software is seeing from sensors.

That doesn’t mean that a failure is inevitable because lots of errors don’t lead to failures that matter. But some do. And that is how we get from mistakes to false or defects in the software to run time errors.

What Happens to Errors in Software Products?

A long time ago (1984!), a very well-known paper in the IBM Journal of Research looked at how long it took faults in IBM operating system software to become failures for the first time. We are not talking about cowboys producing software on the web that may or may not work okay, or people in their bedrooms producing apps. We’re talking about a very sophisticated product here that it was in use all around the world.

Yet, what Adams found was that lots of software faults took more than 5,000 operating years to be revealed. He found that more than 90% of faults in the software would take longer than 50 years to become failures.

‘Optimizing Preventive Service of Software Products’ Edward N. Adams, IBM Journal of Research and Development, 1984, Vol 28, Iss. 1

There are two things that Adams’s work tells us.

First, in any significant piece of software, there is a huge reservoir of faults waiting to be revealed. So if people start telling you that their software contains no defects or faults, either they’re dumb enough to believe that or they think you are. What we see in reality is that even in a very high-quality software product, there are a lot of latent defects.

Second, many of them – the vast majority of them – will take a long, long time to reveal themselves. Testing will not reveal them. Using Beta versions will not reveal them. Fifty years of use will not reveal them. They’re still there.

[This Section is a short extract from my course Principles of Safe Software Development.]

Conclusion

Legal experts stress the urgency of updating laws to reflect the fallibility of computers, crucial for ensuring fair trials and preventing miscarriages of justice. The UK Ministry of Justice acknowledges the need for scrutiny, pending the outcome of the Horizon inquiry, signaling a potential shift towards addressing issues of computer reliability in the legal framework.

Hopefully, the legal people will come to realize what software engineers have known for a long time.  Software reliability is difficult to achieve and must be demonstrated.

Categories
Mil-Std-882E Safety Analysis software safety

Functional Hazard Analysis with Mil-Std-882E

In this video, I look at Functional Hazard Analysis with Mil-Std-882E (FHA, which is Task 208 in Mil-Std-882E). FHA analyses software, complex electronic hardware, and human interactions. I explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (I refer to other lessons for special techniques for software safety and Human Factors.)

This video, and the related webinar ‘Identify & Analyze Functional Hazards’, deal with an important topic. Programmable electronics and software now run so much of our modern world. They control many safety-related products and services. If they go wrong, they can hurt people.

I’ve been working with software-intensive systems since 1994. Functional hazards are often misunderstood, or overlooked, as they are hidden. However, the accidents that they can cause are very real. If you want to expand your analysis skills beyond just physical hazards, I will show you how.

This is the seven-minute demo; the full version is 40 minutes long.

Functional Hazard Analysis: Context

So how do we analyze software safety?

Before we even start, we need to identify those system functions that may impact safety. We can do this by performing a Functional Failure Analysis (FFA) of all system requirements that might credibly lead to human harm.

An FFA looks at functional requirements (the system should do ‘this’ or ‘that’) and examines what could go wrong. What if:

  • The function does not work when needed?
  • The function works when not required?
  • The function works incorrectly? (There may be more than one version of this.)

(A variation of this technique is explained here.)

If the function could lead to a hazard then it is marked for further analysis. This is where we apply the FHA, Task 208.

Functional Hazard Analysis: The Lesson

Topics: Functional Hazard Analysis

  • Task 208 Purpose;
  • Task Description;
  • Update & Reporting
  • Contracting; and
  • Commentary.

Transcript: Functional Hazard Analysis

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyze the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is Task 208 in Mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from Task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? In classic 882 style, Task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, we use it to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. In the real world, lots of software-intensive systems cause accidents that have killed people, even when they’re functioning as intended. That’s one of the shortcomings of this Military Standard – it focuses on failure. But even if something performs as specified, either:

  • The specification might be wrong, or
  • The system might do something that the human operator does not expects.

Mil-Std-882E just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity – severity only, we’ll come back to that – to identify what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behavior over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this – the assumption in this task is that we’re doing early – We’ll see that later – and that system design, system architecture, is still up for grabs. We can still influence it.

COTS and MOTS Software

Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life – But we’ll talk about how you deal with the realities later.

And they’re allocating these functions and these items of interest to hardware, software, and human interfaces. And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those types of simple components are only really subject to random failure, that’s not what we’re talking about here.

We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide is on purpose; so, we use the FHA to identify the consequences of malfunction, functional failure, or lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a system engineering process – that’s not always the case. We’ll talk about that at the end as well.

Also, we’re going to identify and document these functions and items and allocate and it says to partition them in the software design architecture. When we say partition, that’s jargon for separating them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done….

Then What?

Once the FFA has identified the required ‘Level or Rigor’, we need to translate that into a suitable software development standard. This might be:

  • RTCA DO-178C (also know as ED-12C) for civil aviation;
  • The US Joint Software System Safety Engineering Handbook (JSSEH) for military systems;
  • IEC 61508 (functional safety) for the process industry;
  • CENELEC-50128 for the rail industry; and
  • ISO 26262 for automotive applications.

Such standards use Safety Integrity Levels (SILs) or Development Assurance Levels (DALs) to enforce appropriate Levels of Rigor. You can learn about those in my course Principles of Safe Software Development.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Categories
Blog software safety

Principles of Software Safety Assurance

This is the first in a new series of blog posts on Principles of Software Safety Assurance. In it, we look at the 4+1 principles that underlie all software safety standards.

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

In this first of six blog posts, we introduce the subject and the First Principle.

Why Software Safety Principles?

I’ve been involved with industrial-scale software projects since 1994, and the concept of ‘principles’ is valuable for two reasons.

First, many technical people like detail and quickly bypass concepts to get to get to their comfort zone. This means that we often neglect the ‘big picture’ and are uncomfortable making high-level judgments. In turn, this makes it difficult for us to explain or justify our choices to management or other stakeholders.

The second reason is similar to the first. In the guts of a standard, we can think in terms of compliance with detailed requirements and our choices become simple – black and white. This is easy, but it does not equip us to choose between standards or help us to argue that an alternative means of compliance is valid. Thus, ‘Is this good enough?’ is not a question that we find easy to answer.

Introduction

Software assurance standards have increased in number along with the use of software in safety-critical applications. There are now several software standards, including the cross-domain ‘functional safety’ standard IEC 61508, the avionics standard DO-178B/C, the railway application CENELEC-50128, and the automotive application ISO26262. (The last two are derivatives of IEC 61508.)

Unfortunately, there are significant discrepancies in vocabulary, concepts, requirements, and suggestions among these standards. It could seem like there is no way out of this.

However, the common software safety assurance principles that can be observed from both these standards and best practices are few (and manageable). These concepts are presented here together with their justification and an explanation of how they relate to current standards.

These ideas serve as the unchanging foundation of any software safety argument since they hold across projects and domains. Of course, accepting these principles does not exempt one from adhering to domain-specific norms. However, they:

  • Provide a reference model for cross-sector certification; and
  • Aid in maintaining comprehension of the “big picture” of software safety issues;
  • While analyzing and negotiating the specifics of individual standards.

Software Safety Principles

Principle 1: Requirements Validity

The first software safety assurance principle is:

Principle 1: Software safety requirements shall be defined to address the software contribution to system hazards.

‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York.

The evaluation and reduction of risks are crucial to the design of safety-critical systems. When specific environmental factors come into play, system-level dangers like unintentional braking release in cars and the absence of stall warnings in aircraft can result in accidents. Although conceptual, software can implement system control or monitoring features that increase these risks (e.g. software implementing antilock braking or aircraft warning functions).

Typically, the system safety assessment process uses safety analysis methodologies like Fault Tree Analysis or Hazard and Operability (HAZOP) Studies to pinpoint how software, along with other components like sensors, actuators, or power sources, can contribute to risks.  The results of these methodologies ought to influence the formulation of safety requirements and their distribution among software components.

It is crucial for us to remember that software is now considered a black box, utilized to enable specific functions, and with limited visibility into how these functions are implemented. The risk from some system hazards can rise to unacceptable levels if hazardous software failures are not identified and suitable safety standards are not defined and applied.

Examples of software requirements not being adequately defined – and the effects thereof – were reported by the US Federal Drug Authority (FDA).

Simply put, software is a fundamental enabling technology employed in safety-critical systems. Assessing how software might increase system risks should be a crucial component of the overall system safety process. We define safety standards to minimize hazardous software contributions that are discovered in a safety process, which addresses these contributions.

These contributions must be described in a clear and testable way, namely by identifying the exact types of software failures that can result in risks. If not, we run the risk of creating generic software safety requirements—or even just correctness requirements—that don’t take into account the specific hazardous failure modes that have an impact on the system’s safety.

Principles of Software Safety Assurance: End of Part 1 (of 6)

I based this blog post on the paper ‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I learned safety engineering from Tim Kelly, and others, at the University of York. I am so glad that I can share their valuable work in a more accessible format.

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Principles of Software Safety Training

Learn more about this subject in my course ‘Principles of Safe Software’ here. The next post in the series is here.

My course on Udemy, ‘Principles of Software Safety Standards’ is a cut-down version of the full Principles Course. Nevertheless, it still scores 4.42 out of 5.00 and attracts comments like:

  • “It gives me an idea of standards as to how they are developed and the downward pyramid model of it.” 4* Niveditha V.
  • “This was really good course for starting the software safety standareds, comparing and reviewing strengths and weakness of them. Loved the how he try to fit each standared with4+1 principles. Highly recommend to anyone that want get into software safety.” 4.5* Amila R.
  • “The information provides a good overview. Perfect for someone like me who has worked with the standards but did not necessarily understand how the framework works.” 5* Mahesh Koonath V.
  • “Really good overview of key software standards and their strengths and weaknesses against the 4+1 Safety Principles.” 4.5* Ann H.
Categories
Blog software safety

Software Safety Principles Conclusions and References

Software Safety Principles Conclusions and References is the sixth and final blog post on Principles of Software Safety Assurance. In them, we look at the 4+1 principles that underlie all software safety standards. (The previous post in the series is here.)

Read on to Benefit From…

The conclusions of this paper are brief and readable, but very valuable. It’s important for us – as professionals and team players – to be able to express these things to managers and other stakeholders clearly. Talking to non-specialists is something that most technical people could do better.

The references include links to the standards covered by the paper. Unsurprisingly, these are some of the most popular and widely used processes in software engineering. The other links take us to the key case studies that support the conclusions.

Content

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold true across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Conclusion

These six blog posts have presented the 4+1 model of foundational principles of software safety assurance. The principles strongly connect to elements of current software safety assurance standards and they act as a common benchmark against which standards can be measured.

Through the examples provided, it’s also clear that, although these concepts can be stated clearly, they haven’t always been put into practice. There may still be difficulties with their application by current standards. Particularly, there is still a great deal of research and discussion going on about the management of confidence with respect to software safety assurance (Principle 4+1).

[My own, informal, observations agree with this last point. Some standards apply Principle 4+1 more rigorously, but as a result, they are more expensive. As a result, they are less popular and less used.]

Standards and References

[1] RTCA/EUROCAE, Software Considerations in Airborne Systems and Equipment Certification, DO-178C/ED-12C, 2011.

[2] CENELEC, EN-50128:2011 – Railway applications – Communication, signaling and processing systems – Software for railway control and protection systems, 2011.

[3] ISO-26262 Road vehicles – Functional safety, FDIS, International Organization for Standardization (ISO), 2011

[4] IEC-61508 – Functional Safety of Electrical / Electronic / Programmable Electronic Safety-Related Systems. International Electrotechnical Commission (IEC), 1998

[5] FDA, Examples of Reported Infusion Pump Problems, Accessed on 27 September 2012,

http://www.fda.gov/MedicalDevices/ProductsandMedicalProcedures/GeneralHospitalDevicesandSupplies/InfusionPumps/ucm202496.htm

[6] FDA, FDA Issues Statement on Baxter’s Recall of Colleague Infusion Pumps, Accessed on 27 September 2012, http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm210664.htm

[7] FDA, Total Product Life Cycle: Infusion Pump – Premarket Notification 510(k) Submissions, Draft Guidance, April 23, 2010.

[8] “Report on the Accident to Airbus A320-211 Aircraft in Warsaw on 14 September 1993”, Main Commission Aircraft Accident Investigation Warsaw, March 1994, http://www.rvs.unibielefeld.de/publications/Incidents/DOCS/ComAndRep/Warsaw/warsaw-report.html  Accessed on 1st October 2012.

[9] JPL Special Review Board, “Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions”, Jet Propulsion Laboratory”, March 2000.

[10] Australian Transport Safety Bureau. In-Flight Upset Event 240Km North-West of Perth, WA, Boeing Company 777-2000, 9M-MRG. Aviation Occurrence Report 200503722, 2007.

[11] H. Wolpe, General Accounting Office Report on Patriot Missile Software Problem, February 4, 1992, Accessed on 1st October 2012, Available at: http://www.fas.org/spp/starwars/gao/im92026.htm

[12] Y.C. Yeh, Triple-Triple Redundant 777 Primary Flight Computer, IEEE Aerospace Applications Conference pg 293-307, 1996.

[13] D.M. Hunns and N. Wainwright, Software-based protection for Sizewell B: the regulator’s perspective. Nuclear Engineering International, September 1991.

[14] R.D. Hawkins, T.P. Kelly, A Framework for Determining the Sufficiency of Software Safety Assurance. IET System Safety Conference, 2012.

[15] SAE. ARP 4754 – Guidelines for Development of Civil Aircraft and Systems. 1996.

Software Safety Principles: End of the Series

This blog post series was derived from ‘The Principles of Software Safety Assurance’, by RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Principles of Software Safety Training

Learn more about this subject in my course ‘Principles of Safe Software’ here.

My course on Udemy, ‘Principles of Software Safety Standards’ is a cut-down version of the full Principles Course. Nevertheless, it still scores 4.42 out of 5.00 and attracts comments like:

  • “It gives me an idea of standards as to how they are developed and the downward pyramid model of it.” 4* Niveditha V.
  • “This was really good course for starting the software safety standareds, comparing and reviewing strengths and weakness of them. Loved the how he try to fit each standared with4+1 principles. Highly recommend to anyone that want get into software safety.” 4.5* Amila R.
  • “The information provides a good overview. Perfect for someone like me who has worked with the standards but did not necessarily understand how the framework works.” 5* Mahesh Koonath V.
  • “Really good overview of key software standards and their strengths and weaknesses against the 4+1 Safety Principles.” 4.5* Ann H.
Categories
Blog software safety

Software Safety Assurance and Standards

This post, Software Safety Assurance and Standards, is the fifth in a series of six blog posts on Principles of Software Safety Assurance. In it, we look at the 4+1 principles that underlie all software safety standards. (The previous post in the series is here.)

Read on to Benefit from…

In this post, we assess how well specific, popular standards apply the 4+1 Principles. In particular, I add some insights from my experience in large-scale software projects (since 1994) to give further commentary. My comments are [shown thus].

The perfect software safety standard doesn’t exist. Arguably, it never will, as standards must be generic to ensure that they are widely applicable, whereas software projects may have particular needs. However, if we understand these standards we can discover their weaknesses and tailor them, and/or add to them accordingly.

Content

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold true across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Relationship to Existing Software Safety Standards

The ideas of software safety assurance discussed in this article are not explicit in most software safety standards, though they are typically present. However, by concentrating only on adherence to the letter of these standards, software developers using these standards are likely to lose sight of the primary goals (e.g. through box-ticking). We look at manifestations of each of the Principles in some of the most popular software safety standards below – IEC 61508, ISO 26262, and DO 178C.

Principle 1

IEC 61508 and ISO 26262 both demonstrate how hazard analysis at the system level and software safety criteria have been linked. High-level requirements that address system requirements assigned to software to prevent system risks must be defined, according to DO-178C. Particularly when used in conjunction with companion standard ARP 4754, this addresses Principle 1.

[In military aviation, I’m used to seeing Do-178 used in conjunction with Mil-Std-882. This also links hazard analysis to software safety requirements, although perhaps not as thoroughly as ARP 4754.]

Principle 2

Traceability in software needs is always required. The standards also place a strong emphasis on the software requirements’ iterative validation.

Specific examples of requirements decomposition models are provided by DO-178C and ISO26262. Capturing the justification for the required traceability is an area where standards frequently fall short (a crucial aspect of Principle 2).

What is particularly lacking is a focus on upholding the purpose of the software safety rules. Richer types of traceability that take the requirements’ purpose [intent] into account rather than just syntactic ones at various phases of development are needed for this.

Principle 3

The basis of the software safety standards is guidance on requirement satisfaction. Although there are distinct disparities in the advised methods of pleasure, this principle is generally thoroughly addressed (for example DO-178 traditionally placed a strong emphasis on testing).

[Def Stan 00-55 places more emphasis on proof, not just testing. However, this onerous software safety standard has fallen out of fashion.]

Principle 4

This requires that the absence of mistakes introduced during the software lifetime be demonstrated. Aspects of this principle can be seen in the standards. However, of all the standards, the software hazard analysis part receives the least attention.

[N.B. The combination of Mil-Std-882E and the Joint Software Systems Safety Engineering Handbook places a lot of emphasis on this aspect.]

The standards imply that system-level safety analysis is a process. The purpose of software development is to prove that requirements, including safety requirements assigned to software, as produced by system-level procedures, are correct. At later phases of the development process, these criteria are refined and put into practice without explicitly applying software hazard analysis.

There is no specific requirement in DO 178C to identify “emerging” safety risks during software development, but it does permit recognized safety issues to be transmitted back to the system level.

Principle 4+1

All standards share the idea of modifying the software assurance strategy in accordance with “risk.” However, there are significant differences in how the software’s criticality is assessed. IEC 61508 establishes a Safety Integrity Level based on the probability delta in risk reduction, DO-178B emphasizes severity, and ISO 26262 adds the idea of the vehicle’s controllability. At various levels of criticality, the suggested strategies and processes vary greatly as well.

[The Mil-Std-882E approach is to set a ‘level of rigor’ for software development. This uses a combination of mishap severity and the reliance placed on the software to set the level.]

Software Safety Assurance and Standards: End of Part 5 (of 6)

This blog post is derived from ‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Principles of Software Safety Training

Learn more about this subject in my course ‘Principles of Safe Software’ here. The next post in the series is here.

My course on Udemy, ‘Principles of Software Safety Standards’ is a cut-down version of the full Principles Course. Nevertheless, it still scores 4.42 out of 5.00 and attracts comments like:

  • “It gives me an idea of standards as to how they are developed and the downward pyramid model of it.” 4* Niveditha V.
  • “This was really good course for starting the software safety standareds, comparing and reviewing strengths and weakness of them. Loved the how he try to fit each standared with4+1 principles. Highly recommend to anyone that want get into software safety.” 4.5* Amila R.
  • “The information provides a good overview. Perfect for someone like me who has worked with the standards but did not necessarily understand how the framework works.” 5* Mahesh Koonath V.
  • “Really good overview of key software standards and their strengths and weaknesses against the 4+1 Safety Principles.” 4.5* Ann H.
Categories
Blog software safety

Software Safety Assurance

Software Safety Assurance is the fourth in a new series of six blog posts on Principles of Software Safety Assurance. In them, we look at the 4+1 principles that underlie all software safety standards. (The previous post in the series is here.)

Read on for These Benefits…

This post deals with some crucial software assurance topics: what is it? what does it mean? I add [my comments] further explaining some key topics, based on my wide experience in the industry since 1994.

There are some important case studies here. They add depth and diversity to those already presented in previous posts. This post also addresses the crucial issues of diverse assurance techniques, as no one approach is likely to be adequate for safety significant software.

Content

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold true across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Software Assurance = Justified Confidence

[The original authors referred to Principle 4+1 as ‘confidence’, but this term is not well recognized, so I have used ‘assurance’. The two terms are related. Both terms get us to ask: how much safety is enough? This is also the topic addressed in my blog post on Proportionality.]

Principle 4+1:

The confidence established in addressing the software safety principles shall be commensurate to the contribution of the software to system risk.

‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York.

All safety-related software systems must adhere to the four aforementioned principles. To prove that each of the guiding principles has been established for the software, evidence must be presented.

Depending on the characteristics of the software system itself, the dangers that are present, and the principle that is being shown, the proof may take many different forms. The strength and quantity of the supporting evidence will determine how confidently or assuredly the premise is established.

Therefore, it’s crucial to confirm that the level of trust developed is always acceptable. This is frequently accomplished by making sure that the level of confidence attained corresponds to the contribution the software makes to system risk. This strategy makes sure that the areas that lower safety risk the most receive the majority of attention (when producing evidence).

This method is extensively used today. Many standards employ concepts like Safety Integrity Levels (SILs) or Development Assurance Levels (DALs) to describe the amount of confidence needed in a certain software function. [And the ‘Level of Rigor’ required for its development.]

Examples

The flight control system for the Boeing 777 airplane is a Fly-By-Wire (FBW) system … The Primary Flight Computer (PFC) is the central computation element of the FBW system. The triple modular redundancy (TMR) concept also applies to the PFC architectural design. Further, the N-version dissimilarity issue is integrated into the TMR concept.

Details are given of a ‘special case procedure’ within the principles’ framework which has been developed specifically to handle the particular problem of the assessment of software-based protection systems. The application of this ‘procedure’ to the Sizewell B Nuclear Power Station computer-based primary protection system is explained.

Suitability of Evidence

Once the essential level of confidence has been established, it is crucial to be able to judge whether it has been reached. Several factors must be taken into account when determining the degree of confidence with which each principle is put into practice.

The suitability of the evidence should be taken into consideration first. The constraints of the type of evidence being used must be considered too. These restrictions will have an impact on the degree of confidence that can be placed in each sort of evidence with regard to a certain principle.

Examples of these restrictions include the degree of test coverage that can be achieved, the precision of the models employed in formal analysis approaches, or the subjectivity of review and inspection. Most techniques have limits on what they can achieve.

Due to these limitations, it could be necessary to combine diverse types of evidence to reach the required degree of confidence in any one of the principles. The reliability of each piece of evidence must also be taken into account. This takes into account the degree of confidence in the item of evidence’s capacity to perform as expected.

This is also frequently referred to as evidence rigor or evidence integrity. The rigorousness of the technique employed to produce the evidence item determines its reliability. The primary variables that will impact trustworthiness are Tools, Personnel, Methodology, Level of Audit and Review, and Independence.

The four software safety principles will never change. However, there is a wide range of trust in how those principles are developed. We now know that a determination must be made regarding the degree of assurance required for any given system’s principles to be established. We now have our guiding principle.

Since it affects how the previous four principles are put into practice, this concept is also known as Principle 4+1.

Software Safety Assurance: End of Part 4 (of 6)

This blog post is derived from ‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Principles of Software Safety Training

Learn more about this subject in my course ‘Principles of Safe Software’ here. The next post in the series is here.

My course on Udemy, ‘Principles of Software Safety Standards’ is a cut-down version of the full Principles Course. Nevertheless, it still scores 4.42 out of 5.00 and attracts comments like:

  • “It gives me an idea of standards as to how they are developed and the downward pyramid model of it.” 4* Niveditha V.
  • “This was really good course for starting the software safety standareds, comparing and reviewing strengths and weakness of them. Loved the how he try to fit each standared with4+1 principles. Highly recommend to anyone that want get into software safety.” 4.5* Amila R.
  • “The information provides a good overview. Perfect for someone like me who has worked with the standards but did not necessarily understand how the framework works.” 5* Mahesh Koonath V.
  • “Really good overview of key software standards and their strengths and weaknesses against the 4+1 Safety Principles.” 4.5* Ann H.
Categories
Blog software safety

Software Safety Principle 4

Software Safety Principle 4 is the third in a new series of six blog posts on Principles of Software Safety Assurance. In it, we look at the 4+1 principles that underlie all software safety standards. (The previous post in the series is here.)

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Principle 4: Hazardous Software Behaviour

The fourth software safety principle is:

Principle 4: Hazardous behaviour of the software shall be identified and mitigated.

‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York.

Software safety requirements imposed on a software design can capture the high-level safety requirements’ intent. However, this does not ensure that all of the software’s potentially dangerous behaviors have been considered. Because of how the software has been created and built, there will frequently be unanticipated behaviors that cannot be understood through a straightforward requirements decomposition. These risky software behaviors could be caused by one of the following:

  1. Unintended interactions and behaviors brought on by software design choices; or
  2. Systematic mistakes made when developing software.

On 1 August 2005, a Boeing Company 777-200 aircraft, registered 9M-MRG, was being operated on a scheduled international passenger service from Perth to Kuala Lumpur, Malaysia. The crew experienced several frightening and contradictory cockpit indications.

This incident illustrates the issues that can result from unintended consequences of software design. Such incidents could only be foreseen through a methodical and detailed analysis of potential software failure mechanisms and their repercussions (both on the program and external systems). Putting safeguards in place to address potential harmful software behavior is possible if it has been found. However, doing so requires us to examine the potential impact of software design decisions.

Not all dangerous software behavior will develop as a result of unintended consequences of the software design. As a direct result of flaws made during the software design and implementation phases, dangerous behavior may also be seen. Seemingly minor development mistakes can have serious repercussions.

It’s important to stress that this is not a problem with software quality in general. We exclusively focus on faults that potentially result in dangerous behavior for software safety assurance. As a result, efforts can be concentrated on lowering systematic errors in areas where they might have an impact on safety.

Since systematically establishing direct hazard causality for every error may not be possible in practice, it may be preferable for a while to accept what is regarded as best practice. However, the justification for doing so ought to at the very least be founded on knowledge from the software safety community on how the particular problem under consideration has led to safety-related accidents. 

To guarantee that adequate rigor is applied to their development, it is also crucial to identify the most crucial components of the software design. Any software behavior that may be risky must be recognized and stopped if there we are to be confident that the software will always behave safely.

Software Safety Principle 4: End of Part 3 (of 6)

This blog post is derived from ‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Learn more about this subject in my course ‘Principles of Safe Software’ here. The next post in the series is here.

Categories
Blog software safety

Software Safety Principles 2 and 3

Software Safety Principles 2 and 3 is the second in a new series of blog posts on Principles of Software Safety Assurance. In it, we look at the 4+1 principles that underlie all software safety standards. (The previous blog post is here.)

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold true across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Principle 2: Requirement Decomposition

The second software safety principle is:

Principle 2: The intent of the software safety requirements shall be maintained throughout requirements decomposition.

‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York.

The requirements and design are gradually broken down as the software development lifecycle moves forwards, leading to the creation of a more intricate software design. The term “derived software requirements” refers to the criteria that were derived for the more intricate software design. The intent of those criteria must be upheld as the software safety requirements are broken down once they have been established as comprehensive and accurate at the highest (most abstract) level of design.

An example of the failure of requirements decomposition is the crash of Lufthansa Flight 2904 at Warsaw on 14 September 1993.

In essence, the issue is one of ongoing requirements validation. How do we show that the requirements expressed at one level of design abstraction are equal to those defined at a more abstract level? This difficulty arises constantly during the software development process.

It is insufficient to only consider requirements fulfillment. The software safety requirements had been met in the Flight 2904 example. However, they did not match the intent of the high-level safety requirements in the real world.

Human factors difficulties (a warning may be presented to a pilot as necessary, but that warning may not be noticed on the busy cockpit displays) are another consideration that may make the applicability of the decomposition more challenging.

Ensuring that all necessary details are included in the first high-level need is one possible theoretical solution to this issue. However, it would be difficult to accomplish this in real life. It is inevitable that design choices requiring more specific criteria will be made later in the software development lifecycle. It is not possible to accurately know this detail until that design choice has been made.

The decomposition of safety criteria must always be handled if the program is to be regarded as safe to use.

Requirements Satisfaction

The third software safety assurance principle is:

Principle 3: Software safety requirements shall be satisfied.

‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York.

It must be confirmed that a set of “valid” software safety requirements has been met after they have been defined. This set may be assigned software safety requirements (Principle 1), or refined or derived software safety requirements (Principle 2). The fact that these standards are precise, well-defined, and actually verifiable is a crucial need for their satisfaction.

The sorts of verification techniques used to show that the software safety requirements have been met will vary on the degree of safety criticality, the stage of development, and the technology being employed. Therefore, attempting to specify certain verification methodologies that ought to be employed for the development of verification findings is neither practical nor wise.

Mars Polar Lander was an ambitious mission to set a spacecraft down near the edge of Mars’ south polar cap and dig for water ice. The mission was lost on arrival on December 3, 1999.

Given the complexity and safety-critical nature of many software-based systems, it is obvious that using just one type of software verification is insufficient. As a result, a combination of verification techniques is frequently required to produce the verification evidence. Testing and expert review are frequently used to produce primary or secondary verification evidence. However, formal verification is increasingly emphasized because it may more reliably satisfy the software safety standards.

The main obstacle to proving that the software safety standards have been met is the evidence’s inherent limitations as a result of the methods described above. The characteristics of the problem space are the root of the difficulties.

Given the complexity of software systems, especially those used to achieve autonomous capabilities, there are challenges with completeness for both testing and analysis methodologies. The underlying logic of the software can be verified using formal methods, but there are still significant drawbacks. Namely, it is difficult to provide assurance of model validity. Also, formal methods do not deal with the crucial problem of hardware integration.

Clearly, the capacity to meet the stated software safety requirements is a prerequisite for ensuring the safety of software systems.

Software Safety Principles 2 & 3: End of Part 2 (of 6)

This blog post is derived from ‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Learn more about this subject in my course ‘Principles of Safe Software’ here. The next post in the series is here.