Categories
Mil-Std-882E Safety Analysis software safety

Functional Hazard Analysis

In this full-length (40-minute) session, The Safety Artisan looks at Functional Hazard Analysis, or FHA, which is Task 208 in Mil-Std-882E. FHA analyses software, complex electronic hardware, and human interactions. We explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (We refer to other lessons for special techniques for software safety and Human Factors.)

Functional Hazard Analysis: Context

So how do we analyze software safety?

Before we even start, we need to identify those system functions that may impact safety. We can do this by performing a Functional Failure Analysis (FFA) of all system requirements that might credibly lead to human harm.

An FFA looks at functional requirements (the system should do ‘this’ or ‘that’) and examines what could go wrong. What if:

  • The function does not work when needed?
  • The function works when not required?
  • The function works incorrectly? (There may be more than one version of this.)

(A variation of this technique is explained here.)

If the function could lead to a hazard then it is marked for further analysis. This is where we apply the FHA, Task 208.

Functional Hazard Analysis: The Lesson

This is the seven-minute demo; the full version is 40 minutes long.

Topics: Functional Hazard Analysis

  • Task 208 Purpose;
  • Task Description;
  • Update & Reporting
  • Contracting; and
  • Commentary.

Transcript: Functional Hazard Analysis

Click here for the Transcript

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyse the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is Task 208 in Mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from Task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? And in classic 882 style, Task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, it’s used to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. The truth is in the real world, that lots of software-intensive systems have been involved in accidents that have killed lots of people, even when they’re functioning as intended. That’s one of the short-sightedness of this Mil. Standard is that it focuses on failure. The idea that if something is performing as specified, that either the specification might be wrong or there might be some disconnect between what the system is doing and what the human expects – The way the standard is written just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity – severity only, we’ll come back to that – for the purpose of identifying what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sort of a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behaviour over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people is the short answer.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this – the assumption in this task is that we’re doing early – We’ll see that later – and that system design, system architecture, is still up for grabs. We can still influence it. Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life – But we’ll talk about how you deal with the realities later. And they’re allocating these functions and these items of interest to hardware, software and human interfaces. And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those type of simple components that are only really subject to random failure, that’s not what we’re talking about here. We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide on purpose; so we use the FHA to identify consequences of malfunction or functional failure, lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a systems engineering process – that’s not always the case. We’ll talk about that at the end as well. And we’re going to identify and document these functions and items and allocate and it says partition them in the software design architecture. When we say partition, that’s jargon for separate them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done.

(End of Transcription for the 7-minute demo.)

Then What?

Once the FFA has identified the required ‘Level or Rigor’, we need to translate that into a suitable software development standard. This might be:

  • RTCA DO-178C (also know as ED-12C) for civil aviation;
  • The US Joint Software System Safety Engineering Handbook (JSSEH) for military systems;
  • IEC 61508 (functional safety) for the process industry;
  • CENELEC-50128 for the rail industry; and
  • ISO 26262 for automotive applications.

Such standards use Safety Integrity Levels (SILs) or Development Assurance Levels (DALs) to enforce appropriate Levels of Rigor. You can learn about those in my course Principles of Safe Software Development.

End

Categories
Blog software safety

Software Safety Assurance and Standards

This post, Software Safety Assurance and Standards, is the fifth in a series of six blog posts on Principles of Software Safety Assurance. In it, we look at the 4+1 principles that underlie all software safety standards.

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold true across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Relationship to Existing Software Safety Standards

The ideas of software safety assurance discussed in this article are not explicit in most software safety standards, though they are typically present. However, by concentrating only on adherence to the letter of these standards, software developers using these standards are likely to lose sight of the primary goals (e.g. through box-ticking). We look at manifestations of each of the Principles in some of the most popular software safety standards below – IEC 61508, ISO 26262, and DO 178C.

Principle 1

IEC 61508 and ISO 26262 both demonstrate how hazard analysis at the system level and software safety criteria have been linked. High-level requirements that address system requirements assigned to software to prevent system risks must be defined, according to DO-178C. Particularly when used in conjunction with companion standard ARP 4754, this addresses Principle 1.

[In military aviation, I’m used to seeing Do-178 used in conjunction with Mil-Std-882. This also links hazard analysis to software safety requirements, although perhaps not as thoroughly as ARP 4754.]

Principle 2

Traceability in software needs is always required. The standards also place a strong emphasis on the software requirements’ iterative validation.

Specific examples of requirements decomposition models are provided by DO-178C and ISO26262. Capturing the justification for the required traceability is an area where standards frequently fall short (a crucial aspect of Principle 2).

What is particularly lacking is a focus on upholding the purpose of the software safety rules. Richer types of traceability that take the requirements’ purpose into account rather than just syntactic ones at various phases of development are needed for this.

Principle 3

The basis of the software safety standards is guidance on requirement satisfaction. Although there are distinct disparities in the advised methods of pleasure, this principle is generally thoroughly addressed (for example DO-178 traditionally placed a strong emphasis on testing).

[Def Stan 00-55 places more emphasis on proof, not just testing. However, this onerous software safety standard has fallen out of fashion.]

Principle 4

This requires that the absence of mistakes introduced during the software lifetime be demonstrated. Aspects of this principle can be seen in the standards. However, of all the standards, the software hazard analysis part receives the least attention.

[N.B. The combination of Mil-Std-882E and the Joint Software Systems Safety Engineering Handbook places a lot of emphasis on this aspect.]

The standards imply that system-level safety analysis is a process. The purpose of software development is to prove that requirements, including safety requirements assigned to software, as produced by system-level procedures, are correct. At later phases of the development process, these criteria are refined and put into practice without explicitly applying software hazard analysis.

There is no specific requirement in DO 178C to identify “emerging” safety risks during software development, but it does permit recognized safety issues to be transmitted back to the system level.

Principle 4+1

All standards share the idea of modifying the software assurance strategy in accordance with “risk.” However, there are significant differences in how the software’s criticality is assessed. IEC 61508 establishes a Safety Integrity Level based on the probability delta in risk reduction, DO-178B emphasizes severity, and ISO 26262 adds the idea of the vehicle’s controllability. At various levels of criticality, the suggested strategies and processes vary greatly as well.

[ The Mil-Std-882E approach is to set a ‘level of rigor’ for software development. This uses a combination of mishap severity and the reliance placed on the software to set the level.]

Software Safety Assurance and Standards: End of Part 5 (of 6)

This blog post is derived from ‘The Principles of Software Safety Assurance’, RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

If you found this blog article helpful then please leave a review, below. If you have a private question or comments then please connect here.

Categories
Mil-Std-882E Safety Analysis System Safety

How to Understand Safety Standards

Learn How to Understand Safety Standards with this FREE session from The Safety Artisan.

In this module, Understanding Your Standard, we’re going to ask the question: Am I Doing the Right Thing, and am I Doing it Right? Standards are commonly used for many reasons. We need to understand our chosen system safety engineering standard, in order to know: the concepts, upon which it is based; what it was designed to do, why and for whom; which kinds of risk it addresses; what kinds of evidence it produces; and it’s advantages and disadvantages.

Understand Safety Standards : You’ll Learn to

  • List the hazard analysis tasks that make up a program; and
  • Describe the key attributes of Mil-Std-882E. 
Understanding Your Standard

Topics:  Understand Safety Standards

Aim: Am I Doing the Right Thing, and am I Doing it Right?

  • Standards: What and Why?
  • System Safety Engineering pedigree;
  • Advantages – systematic, comprehensive, etc:
  • Disadvantages – cost/schedule, complexity & quantity not quality.

Transcript: Understand Safety Standards

Click here for the Transcript on Understanding Safety Standards

In Module Three, we’re going to understand our Standard. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we’re going to ask ‘Are we doing the right thing?’ And ‘Are we doing it right?’

What and Why?

So, what are we going to do, and why are we doing it? First of all, the use of standards in safety is very common for lots of reasons. It helps us to have confidence that what we’re doing is good enough. We’ve met a standard of performance in the absolute sense. It helps us to say, ‘We’ve achieved standardization or commonality in what we’re doing’. And we can also use it to help us achieve a compromise. That can be a compromise across different stakeholders or across different organizations. And standardization gives us some of the other benefits as well. If we’re all doing the same thing rather than we’re all doing different things, it makes it easier to train staff. This is one example of how a standard helps.

However, we need to understand this tool that we’re going to use. What it does, what it’s designed to do, and what it is not designed to do. That’s important for any standard or any tool. In safety, it’s particularly important because safety is in many respects intangible. This is because we’re always looking to prevent a future problem from occurring. In the present, it’s a little bit abstract. It’s a bit intangible. So, we need to make sure that in concept what we’re doing makes sense and is coherent. That it works together. If we look at those five bullet points there, we need to understand the concept of each standard. We need to understand the basis of each one.

And they’re not all based on the same concept. Thus some of them are contradictory or incompatible. We need to understand the design of the standard. What the standard does, what the aim of the standard is, why it came into existence. And who brought it into existence. To do what for who – who’s the ultimate customer here?

And for risk analysis standards, we need to understand what kind of risks it addresses. Because the way you treat a financial risk might be very different from a safety risk. In the world of finance, you might have a portfolio of products, like loans. These products might have some risks associated with them. One or two loans might go bad and you might lose money on those. But as long as the whole portfolio is making money that might be acceptable to you. You might say, ‘I’m not worried about that 10% of my loans have gone south and all gone wrong. I’m still making plenty of profit out of the other 90%’. It doesn’t work that way with safety. You can’t say ‘It’s OK that I’ve killed a few people over here because all this a lot over here are still alive!’. It doesn’t work like that!

Also, what kind of evidence does the standard produce? Because in safety, we are very often working in a legal framework that requires us to do certain things. It requires us to achieve a certain level of safety and prove that we have done so. So, we need certain kinds of evidence. In different jurisdictions and different industries, some evidence is acceptable. Some are not. You need to know which is for your area.

And then finally, let’s think about the pros and cons of the standard, what does it do well? And what does it do not so well?

System Safety Pedigree

We’re going to look at a standard called Military Standard 882E. Many decades ago, this standard developed was created by the US government and military to help them bring into service complex-cutting edge military equipment. Equipment that was always on the cutting edge. That pushed the limits of what you could achieve in performance.

That’s a lot of complexity. Lots of critical weapon systems, and so forth. And they needed something that could cope with all that complexity. It’s a system safety engineering standard. It’s used by engineers, but also by many other specialists. As I said, it’s got a background from military systems. These days you find these principles used pretty much everywhere. So, all the approaches to System Safety that 882 introduced are in other standards. They are also in other countries.

It addresses risks to people, equipment, and the environment, as we heard earlier. And because it’s an American standard, it’s about system safety. It’s very much about identifying requirements. What do we need to happen to get safety? To do that, it produces lots of requirements. It performs analyses in all those requirements and generates further requirements. And it produces requirements for test evidence. We then need to fulfill these requirements. It’s got several important advantages and disadvantages. We’re going to discuss these in the next few slides.

Comprehensive Analysis

Before we get to that, we need to look at the key feature of this standard. The strengths and weaknesses of this standard come from its comprehensive analysis. And the chart (see the slide) is meant to show how we are looking at the system from lots of different perspectives. (It’s not meant to be some arcane religious symbol!) So, we’re looking at a system from 10 different perspectives, in 10 different ways.

Going around clockwise, we’ve got these ten different hazard analysis tasks. First of all, we start off with preliminary hazard identification. Then preliminary hazard analysis. We do some system requirements hazard analysis. So, we identify the safety requirements that the system is going to meet so that we are safe. We look at subsystem and system hazard analysis. At operating and support hazard analysis – people working with the system. Number seven, we look at health hazard analysis – Can the system cause health problems for people? Functional hazard analysis, which is all about what it does. We’re thinking of sort of source software and data-driven functionality. Maybe there’s no physical system, but it does stuff. It delivers benefits or risks. System of systems hazard analysis – we could have lots of different and/or complex systems interacting. And then finally, the tenth one – environmental hazard analysis.

If we use all these perspectives to examine the system, we get a comprehensive analysis of the system. From this analysis, we should be confident that we have identified everything we need to. All the hazards and all the safety requirements that we need to identify. Then we can confidently deliver an appropriate safe system. We can do this even if the system is extremely complex. The standard is designed to deal with big, complex cutting-edge systems.

Advantages #1

In fact, as we move on to advantages, that’s the number one advantage of this standard. If we use it and we use all 10 of those tasks, we can cope with the largest and the most demanding programs. I spent much of my career working on the Eurofighter Typhoon. It was a multi-billion-dollar program. It cost hundreds of billions of dollars, four different nations worked together on it. We used a derivative of Mil. Standard 882 to look at safety and analyze it. And it coped. It was powerful enough to deal with that gigantic program. I spent 13 years of my life on and off on that program so I’d like to think that I know my stuff when we’re talking about this.

As we’ve already said, it’s a systematic approach to safety. Systems, safety, engineering. And we can start very early. We can start with early requirements – discovery. We don’t even need a design – we know that we have a need. So we can think about those needs and analyze them.

And it can cover us right through until final disposal. And it covers all kinds of elements that you might find in a system. Remember our definition of ‘system’? It’s something that consists of hardware, software, data, human beings, etc. The standard can cope with all the elements of a system. In fact, it’s designed into the standard. It was specifically designed to look at all those different elements. Then to get different insights from those elements. It’s designed to get that comprehensive coverage. It’s really good at what it does. And it involves, not just engineers, but people from all kinds of other disciplines. Including operators, maintainers, etc, etc.

I came from a maintenance background. I was either directly or indirectly supporting operators. I was responsible for trying to help them get the best out of their system. Again, that’s a very familiar world to me. And rigorous standards like this can help us to think rigorously about what we’re doing. And so get results even in the presence of great complexity, which is not always a given, I must say.

So, we can be confident by applying the standard. We know that we’re going to get a comprehensive and thorough analysis. This assures us that what we’re doing is good.

Advantages #2

So, there’s another set of advantages. I’ve already mentioned that we get assurance. Assurance is ‘justified confidence’. So we can have high confidence that all reasonably foreseeable hazards will be identified and analyzed. And if you’re in a legal jurisdiction where you are required to hit a target, this is going to help you hit that target.

The standard was also designed for use in contracts. It’s designed to be applied to big programs. We’d define that as where we are doing the development of complex high-performance systems. So, there are a lot of risks. It’s designed to cope with those risks.

Finally, the standard also includes requirements for contracting, for interfaces with other systems, for interfaces with systems engineering. This is very important for a variety of disciplines. It’s important for other engineering and technical disciplines. It’s important for non-technical disciplines and for analysis and recordkeeping. Again, all these things are important, whether it is for legal reasons or not. We need to do recordkeeping. We need to liaise with other people and consult with them. There are legal requirements for that in many countries. This standard is going to help us do all those things.

But, of course, in a standard everything has pros and cons and Mil. Standard 882 is no exception. So, let’s look at some of the disadvantages.

Disadvantages #1

First of all, a full system safety program might be overkill for the system that you want to use, or that you want to analyze.  The Cold War, thank goodness, is over; generally speaking, we’re not in the business of developing cutting-edge high-performance killing machines that cost billions and billions of dollars and are very, very risky. These days, we tend to reduce program risk and cost by using off-the-shelf stuff and modifying it. Whether that be for military systems, infrastructure in the chemical industry, transportation, whatever it might be. Very much these days we have a family of products and we reuse them in different ways. We mix and match to get the results that we want.

And of course, all this comprehensive analysis is not cheap and it’s not quick. It may be that you’ve got a program that is schedule-constrained. Or you want to constrain the cost and you cannot afford the time and money to throw a full 882 program at it. So, that’s a disadvantage.

The second family of problems is that these kinds of safety standards have often been applied prescriptively. The customer would often say, ‘Go away and go and do this. I’m going to tell you what to do based on what I think reduces my risk’. Or at least it covers their backside. So, contractors got used to being told to do certain things by purchasers and customers. The customers didn’t understand the standards that they were applying and insisting upon. So, the customers did not understand how to tailor a safety standard to get the result that they wanted. So they asked for dumb things or things that didn’t add value. And the contractors got used to working in that kind of environment. They got used to being told what to do and doing it because they wouldn’t get paid if they didn’t. So, you can’t really blame them.

But that’s not great, OK? That can result in poor behaviors. You can waste a lot of time and money doing stuff that doesn’t actually add value. And everybody recognizes that it doesn’t add value. So you end up bringing the whole safety program into disrepute and people treat it cynically. They treat it as a box-ticking exercise. They don’t apply creativity and imagination to it. Much less determination and persistence. And that’s what you need for a good effective system safety program. You need creativity. You need imagination. You need people to be persistent and dedicated to doing a good job. You need that rigor so that you can have the confidence that you’re doing a good job because it’s intangible.

Disadvantages #2

Let’s move onto the second kind of family of disadvantages. And this is the one that I’ve seen the most, actually, in the real world. If you do all 10 tasks and even if you don’t do all 10, you can create too many hazards. If you recall the graphic from earlier, we have 10 tasks. Each task looks at the system from a different angle. What you can get is lots and lots of duplication in hazard identification. You can have essentially the same hazards identified over and over again in each task. And there’s a problem with that, in two ways.

First of all, quality suffers. We end up with a fragmented picture of hazards. We end up with lots and lots of hazards in the hazard log, but not only that. We get fragments of hazards rather than the real thing. Remember I said those tests for what a hazard really is? Very often you can get causes masquerading as hazards. Or other things that that exacerbating factors that make things worse. They’re not a hazard in their own right, but they get recorded as hazards. And that problem results in people being unable to see the big picture of risk. So that undermines what we’re trying to do. And as I say, we get lots of things misidentified and thrown into the pot. This also distracts people. You end up putting effort into managing things that don’t make a difference to safety. They don’t need to be managed. Those are the quality problems.

And then there are quantity problems. And from personal experience, having too many hazards is a problem in itself.  I’ve worked on large programs where we were managing 250 hazards or thereabouts. That is challenging even with a sizable, dedicated team. That is a lot of work in trying to manage that number of hazards effectively. And there’s always the danger that it will slide into becoming a box-ticking exercise. Superficial at best.

I’ve also seen projects that have two and a half thousand hazards or even 4000 hazards in the hazard log. Now, once you get up to that level, that is completely unmanageable. People who have thousands of hazards in a hazard log and they think they’re managing safety are kidding themselves. They don’t understand what safety is if they think that’s going to work. So, you end up with all these items in your hazard log, which become a massive administrative burden. So people end up taking shortcuts and the real hazards are lost. The real issues that you want to focus on are lost in the sea of detail that nobody will ever understand. You won’t be able to control them.

Unfortunately, Mil. Standard 882 is good at generating these grotesque numbers of hazards. If you don’t know how to use the standard and don’t actively manage this issue, it gets to this stage. It can go and does go, badly wrong. This is particularly true on very big programs. And you really need clarity on big projects.

Summary of Module

Let’s summarize what we’ve done with this module. The aim was to help us understand whether we’re doing the right thing and whether we’ve done it right. And standards are terrific for helping us to do that. They help us to ensure we’re doing the right thing. That we’re looking at the right things. And they help us to ensure that we’re doing it rigorously and repeatedly. All the good quality things that we want. And Mil. Standard 882E that we’re looking at is a system safety engineering standard. So it’s designed to deal with complexity and high-performance and high-risk. And it’s got a great pedigree. It’s been around for a long time.

Now that gives advantages. So, we have a system safety program with this standard that helps us to deal with complexity. That can cope with big programs, with lots of risks. That’s great.

The disadvantages of this standard are that if we don’t know how to tailor or manage it properly, it can cost a lot of money. It can take a lot of time to give results which can cause problems for the program. And ultimately, you can accidentally ignore safety if you don’t deliver on time. And it can generate complexity. And it can generate a quantity of data that is so great that it actually undermines the quality of the data. It undermines what we’re trying to achieve. In that, we get a fragmented picture in which we can’t see the true risks. And so we can’t manage them effectively. If we get it wrong with this standard, we can get it really wrong. And that brings us to the end of this module.

This is Module 3 of SSRAP

This is Module 3 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.