Categories
Mil-Std-882E Safety Analysis software safety

Functional Hazard Analysis with Mil-Std-882E

In this video, I look at Functional Hazard Analysis with Mil-Std-882E (FHA, which is Task 208 in Mil-Std-882E). FHA analyses software, complex electronic hardware, and human interactions. I explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (I refer to other lessons for special techniques for software safety and Human Factors.)

This video, and the related webinar ‘Identify & Analyze Functional Hazards’, deal with an important topic. Programmable electronics and software now run so much of our modern world. They control many safety-related products and services. If they go wrong, they can hurt people.

I’ve been working with software-intensive systems since 1994. Functional hazards are often misunderstood or overlooked, as they are hidden. However, the accidents that they can cause are very real. If you want to expand your analysis skills beyond just physical hazards, I will show you how.

This is the seven-minute demo; the full version is 40 minutes long.

Functional Hazard Analysis: Context

So how do we analyze software safety?

Before we even start, we need to identify those system functions that may impact safety. We can do this by performing a Functional Failure Analysis (FFA) of all system requirements that might credibly lead to human harm.

An FFA looks at functional requirements (the system should do ‘this’ or ‘that’) and examines what could go wrong:

  • Does the function work when needed?
  • Does the function work when not required?
  • Does the function work incorrectly? (There may be more than one version of this.)

(A variation of this technique is explained here.)

If the function could lead to a hazard then it is marked for further analysis. This is where we apply the FHA, Task 208.

Functional Hazard Analysis: The Lesson

Topics: Functional Hazard Analysis

  • Task 208 Purpose;
  • Task Description;
  • Update & Reporting
  • Contracting; and
  • Commentary.

Transcript: Functional Hazard Analysis

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyze the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is Task 208 in Mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from Task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? In classic 882 style, Task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, we use it to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. In the real world, lots of software-intensive systems cause accidents that have killed people, even when they’re functioning as intended. That’s one of the shortcomings of this Military Standard – it focuses on failure. But even if something performs as specified, either:

  • The specification might be wrong, or
  • The system might do something that the human operator does not expects.

Mil-Std-882E just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity – severity only, we’ll come back to that – to identify what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behavior over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this – the assumption in this task is that we’re doing early – We’ll see that later – and that system design, system architecture, is still up for grabs. We can still influence it.

COTS and MOTS Software

Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life – But we’ll talk about how you deal with the realities later.

And they’re allocating these functions and these items of interest to hardware, software, and human interfaces. And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those types of simple components are only really subject to random failure, that’s not what we’re talking about here.

We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide is on purpose; so, we use the FHA to identify the consequences of malfunction, functional failure, or lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a system engineering process – that’s not always the case. We’ll talk about that at the end as well.

Also, we’re going to identify and document these functions and items and allocate and it says to partition them in the software design architecture. When we say partition, that’s jargon for separating them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done….

Then What?

Once the FFA has identified the required ‘Level or Rigor’, we need to translate that into a suitable software development standard. This might be:

  • RTCA DO-178C (also know as ED-12C) for civil aviation;
  • The US Joint Software System Safety Engineering Handbook (JSSEH) for military systems;
  • IEC 61508 (functional safety) for the process industry;
  • CENELEC-50128 for the rail industry; and
  • ISO 26262 for automotive applications.

Such standards use Safety Integrity Levels (SILs) or Development Assurance Levels (DALs) to enforce appropriate Levels of Rigor. You can learn about those in my course, Principles of Safe Software Development.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Categories
Blog Mil-Std-882E Safety Analysis

How to do Preliminary Hazard Analysis with Mil-Std-882E

In this 45-minute session, I look at how to do a Preliminary Hazard Analysis with Mil-Std-882E. Preliminary Hazard Analysis, or PHA, is Task 202 in the Standard.

I explore Task 202’s aim, description, scope, and contracting requirements. There’s value-adding commentary, and I explain the issues with PHA – how to do it well and avoid the pitfalls.

Now, I have worked in System Safety since 1996, and I think that PHA is one of the three tasks that EVERY project needs to do. The other two are:

I look at these three tasks together in my course ‘Foundations of Safety Assessment’. This is one of five linked courses on Mil-Std-882E. They will teach you how to get the maximum benefits from your System Safety Program.

This is the seven-minute-long demo video. The full video is 45 minutes long.

Topics: How to do Preliminary Hazard Analysis

  • Task 202 Purpose;
  • Task Description;
  • Recording & Scope;
  • Risk Assessment (Tables I, II & III);
  • Risk Mitigation (order of preference);
  • Contracting; and
  • Commentary.

Transcript: How to do Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you’ll find professional, pragmatic, and impartial safety training resources. So, we’ll get straight on to our session, which is on the 8th of February 2020.

Preliminary Hazard Analysis

Now we’re going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It’s very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It’s one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I’m doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording, and scope. How we do risk assessments against Tables 1, 2, and 3. These tables describe severities, likelihoods, and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting, and then a short commentary from myself. I’m providing commentary all the way through. So, let’s crack on.

Task 202 Purpose

The purpose of Task 202, as it says, is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks, and identify potential mitigation measures. We’re going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I’ve highlighted some stuff that I particularly want to talk about.

It says “the contractor” [does this or that], but it doesn’t matter who is doing the analysis, and, the customer needs to do something to inform themselves, otherwise they won’t understand what they’re doing.  Whoever does it needs to perform and document a PHA. It’s about determining initial risk assessments. There’s going to be more work, more detailed work done later. But for now, we’re doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions we propose to introduce. That’s very important. We don’t need a design to do this. We can get in early when we have user requirements, functional requirements, and that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that’s another big issue. We’ll talk about that more later. It says to include mishap data as well, if accessible: American term mishap, means an accident, but we’re avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, or whatever. It’s a mishap. It’s an undesirable event.

We look for accessible data from similar systems, legacy systems, and other lessons learned. I’ve talked about that a little bit in the Task 201 lesson, and there’s more on that today under commentary. We need to look at provisions, and alternatives, meaning design provisions and design alternatives to reduce risks and add mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What’s the task description? That’s a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we’ve got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call a hazard tracking system; we might call it a hazard log; we might call it a risk register. It doesn’t matter what it’s called. The key point is it’s a tracking system. It’s a live document, as people say, it’s a spreadsheet or a database, something like that. It’s something relatively easy to update and change.

Also, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and refine them and change them as time goes on. Very important point.

That’s it for the Demo…

End: How to do Preliminary Hazard Analysis

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Mil-Std-882E Safety Analysis

Preliminary Hazard Identification with Mil-Std-882E

Want to know how to perform Preliminary Hazard Identification with Mil-Std-882E? (This is Task 201 under the standard.)

This is the first step in safety assessment.  We look at three classic complementary techniques to identify hazards and their pros and cons.  This includes all the content from Task 201, and also practical insights from my 25 years of experience with Mil-Std-882. 

You Will Learn to:

  • Conduct Preliminary Hazard Identification using diverse techniques for best results;
  • Define what Preliminary Hazard Identification is and does;
  • Record Preliminary Hazard Identification results correctly;
  • Contract for Preliminary Hazard Identification successfully; and
  • Apply it early enough to make a difference.
This is the seven-minute-long demo video.

Topics: Preliminary Hazard Identification with Mil-Std-882E

  • Task 201 Purpose & Task Description;
  • Historical Review;
  • Recording Results;
  • Contracting; and
  • Commentary:
    • Historical Data;
    • Hazard Checklists; and
    • Analysis Techniques.

Transcript: Preliminary Hazard Identification

Hello, everyone, and welcome to the Safety Artisan, where you will find instructional materials that are professional, pragmatic, and impartial because we don’t have anything to sell, and we don’t have an axe to grind. Let’s look at what we’re doing today, which is Preliminary Hazard Identification. We are looking at one of the first actual analysis tasks in Mil-Std-882E, which is a systems safety engineering standard from the US government, and it’s typically used on military systems, but it does turn up elsewhere.

Preliminary Hazard ID is Task 201

I’m recording this on the 2nd of February 2020, however, the Mil-Std has been in existence since May 2012 and it is still current, it looks like it is sticking around for quite a while, and this lesson isn’t likely to go out of date anytime soon.

Topics for this session

What we’re going to cover is, quoting from the task, first of all, we’re going to look at the purpose and the task description, where the task talks quite a lot about historical review (I think we’ve got three slides of that), recording results, putting stuff in contracts and then I’m adding some commentary of my own. I will be commenting all the way through, that’s the value add, that’s why I’m doing this, but then there’s some specific extra information that I think you will find helpful, should you need to implement Task 201. In this session, we’ve moved up one level from awareness and we are now looking at practice, at being equipped to actually perform safety jobs, to do safety tasks.

Preliminary Hazard Identification (T201)

The purpose of Task 201 is to compile a list of potential hazards early in development. two things to note here: it is only a list, it’s very preliminary. I’ll keep coming back to that, this is important. Remember, this is the very first thing we do that’s an analytical task. There are planning tasks in the 100 series, but actually, some of them depend on you doing Task 201 because you can’t work out how are you going to manage something until you’ve got some idea of what you’re dealing with. We’ll come back to that in later lessons.

It is a list of potential hazards that we’re after, and we’re trying to do it early in development. And I really can’t overemphasize how important it is to do these things early in development, because we need to do some work early on in order to set expectations, in order to set budgets, in order to set requirements and to basically get a grip, get some scope on what we think we might be doing for the rest of the program. this is a really important task and it should be done as early as possible, and it’s okay to do it several times. Because it’s an early task it should be quick, it should be fairly cheap. We should be doing it just as soon as we can when we’re at the conceptual stage when we don’t even have a proper set of requirements and then we redo it thereafter maybe. And maybe different organizations will do it for themselves and pass the information on to others. And we’ll talk about that later as well.

Task Description

This is the task description. It says the contractor shall – actually forget about who’s supposed to do it, lots of people could and should be doing this as part of their project management or program management risk reduction because as I said, this is fundamental to what we’re doing for the rest of the safety program and indeed maybe the whole project itself. So, what we need to do is “examine the system shortly after the material solution analysis begins and compile a Preliminary Hazard List (PHL) identifying potential hazards inherent in the concept”. That’s what the standard actually says.

A couple of things to note here. Saying that you start doing it after material solution analysis has begun might be read as implying you don’t do it until after you finish doing the requirements, and I think that’s wrong, I think that’s far too late. To my mind, that is not the correct interpretation. Indeed, if we look at the last four words in the definition, it says we’re “identifying potential hazards inherent in the concept”. That, I think, gives us the correct steer. we’ve got a concept, maybe not even a full set of requirements, what are the hazards associated with that concept, with that scope? And I think that’s a good way to look at it.

Historical Review

This task places a great deal of emphasis on the review of historical documentation, and specifically on reviewing documentation with similar and legacy systems. an old system, a legacy system that we are maybe replacing with this system but there might be other legacy systems around. We need to look at those systems. The assumption is that we actually have some data from similar and legacy systems. And that’s a key weakness really with this, is that we’re assuming that we can get hold of that data. But I’ll talk about the issues with that when I get to my commentary at the end.

We need to look at the following…

End: Preliminary Hazard Identification with Mil-Std-882E

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Mil-Std-882E System Safety

Learn How to Perform System Safety Analysis

In this ‘super post’, we will Learn How to Perform System Safety Analysis. I will show you thirteen lessons that explain each of the ten analysis tasks, the analysis process, and how to combine those tasks into a program!

Follow the links to sample and buy lessons on individual tasks.

Introduction

Military Standard 882, or Mil-Std-882 for short, is one of the most widely used system-safety standards. As the name implies, this standard is used on US military systems, but it has found its way, sometimes in disguise, into many other programs around the world. It’s been around for a long time and is now in its fifth incarnation: 882E.

Unfortunately, 882 has also been widely misunderstood and misapplied. This is probably not the fault of the standard and is just another facet of its popularity. The truth is that any standard can be applied blindly – no standard is a substitute for competent decision-making.

In this series of posts, we will: provide awareness of this standard; explain how to use it; and discuss how to manage, tailor, and implement it. Links to each training session and to each section of the standard are provided in the following sections.

Mil-Std-882E Training Sessions

System Safety Process, full post here

Photo by Bonneval Sebastien on Unsplash

In this full-length (50 minutes) video, you will learn to:

  • Know the system safety process according to Mil-Std-882E;
  • List and order the eight elements;
  • Understand how they are applied;
  • Skilfully apply system safety using realistic processes; and
  • Feel more confident dealing with multiple standards.

In System Safety Process, we look a the general requirements of Mil-Std-882E. We cover the Applicability of the 882E tasks; the General requirements; the Process with eight elements; and the application of process theory to the real world.

Design Your System Safety Analysis Program

Photo by Christina Morillo from Pexels

Learn how to Design a System Safety Program for any system in any application.

Learning Objectives. At the end of this course, you will be able to:

  • Define what a risk analysis program is;
  • List the hazard analysis tasks that make up a program;
  • Select tasks to meet your needs; and
  • Design a tailored risk analysis program for any application.

Analysis: 200-series Tasks

Preliminary Hazard Identification, Task 201

Identify Hazards.

In this video, we find out how to create a Preliminary Hazard List, the first step in safety assessment. We look at three classic complementary techniques to identify hazards and their pros and cons. This includes all the content from Task 201, and also practical insights from my 25 years of experience with Mil-Std-882.

You can buy the full video, plus lots of bonus material, here.

Preliminary Hazard Analysis, Task 202

See More Clearly.

In this 45-minute session, The Safety Artisan looks at Preliminary Hazard Analysis, or PHA, which is Task 202 in Mil-Std-882E. We explore Task 202’s aim, description, scope, and contracting requirements. We also provide value-adding commentary and explain the issues with PHA – how to do it well and avoid the pitfalls.

System Requirements Hazard Analysis, Task 203

Law, Regulations, Codes of Practice, Guidance, Standards & Recognised Good Practice.

In this 45-minute session, The Safety Artisan looks at Safety Requirements Hazard Analysis, or SRHA, which is Task 203 in the Mil-Std-882E standard. We explore Task 203’s aim, description, scope, and contracting requirements. SRHA is an important and complex task, which needs to be done on several levels to be successful. This video explains the issues and discusses how to perform SRHA well.

Sub-system Hazard Analysis, Task 204

Breaking it down to the constituent parts.

In this video lesson, The Safety Artisan looks at Sub-System Hazard Analysis, or SSHA, which is Task 204 in Mil-Std-882E. We explore Task 204’s aim, description, scope, and contracting requirements. We also provide value-adding commentary and explain the issues with SSHA – how to do it well and avoid the pitfalls.

System Hazard Analysis, Task 205

Putting the pieces of the puzzle together.

In this 45-minute session, The Safety Artisan looks at System Hazard Analysis, or SHA, which is Task 205 in Mil-Std-882E. We explore Task 205’s aim, description, scope, and contracting requirements. We also provide value-adding commentary, which explains SHA – how to use it to complement Sub-System Hazard Analysis (SSHA, Task 204) to get the maximum benefits for your System Safety Program.

Operating and Support Hazard Analysis, Task 206

Operate it, maintain it, supply it, dispose of it.

In this full-length session, The Safety Artisan looks at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. We explore Task 205’s aim, description, scope, and contracting requirements. We also provide value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the pitfalls to avoid. We refer to other lessons for specific tools and techniques, such as Human Factors analysis methods.

Health Hazard Analysis, Task 207

Hazards to human health are many and various.

In this full-length (55-minute) session, The Safety Artisan looks at Health Hazard Analysis, or HHA, which is Task 207 in Mil-Std-882E. We explore the aim, description, and contracting requirements of this complex Task, which covers: physical, chemical & biological hazards; Hazardous Materials (HAZMAT); ergonomics, aka Human Factors; the Operational Environment; and non/ionizing radiation. We outline how to implement Task 207 in compliance with Australian WHS. 

Functional Hazard Analysis, Task 208

Components where systemic failure dominates random failure.

In this full-length (40-minute) session, The Safety Artisan looks at Functional Hazard Analysis, or FHA, which is Task 208 in Mil-Std-882E. FHA analyses software, complex electronic hardware, and human interactions. We explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. 

System-Of-Systems Hazard Analysis, Task 209

Existing systems are often combined to create a new capability.

In this full-length (38-minute) session, The Safety Artisan looks at Systems-of-Systems Hazard Analysis, or SoSHA, which is Task 209 in Mil-Std-882E. SoSHA analyses collections of systems, which are often put together to create a new capability, which is enabled by human brokering between the different systems. We explore the aim, description, and contracting requirements of this Task, and an extended example to illustrate SoSHA. (We refer to other lessons for special techniques for Human Factors analysis.)

Environmental Hazard Analysis, Task 210

Environmental requirements in the USA, UK, and Australia.

This is the full, one-hour session on Environmental Hazard Analysis (EHA), which is Task 210 in Mil-Std-882E. We explore the aim, task description, and contracting requirements of this Task, but this is only half the video. We then look at environmental requirements in the USA, UK, and Australia, before examining how to apply EHA in detail under the Australian/international regime. This uses my practical experience of applying EHA. 

Categories
Mil-Std-882E Safety Analysis

System Safety Engineering Process

The System Safety Engineering Process – what it is and how to do it.

This is the full-length (50-minute) session on the System Safety Process, which is called up in the general requirements of Mil-Std-882E. I cover the Applicability of Mil-Std-882E tasks, the General Requirements, the Process with eight elements, and the Application of process theory to the real world. 

You Will Learn to:

  • Know the system safety process iaw Mil-Std-882E;
  • List and order the eight elements;
  • Understand how they are applied;
  • Skilfully apply system safety using realistic processes; and
  • Feel more confident dealing with this and other standards.
System Safety Process – this is the free demo.

Topics: System Safety Engineering Process

  • Applicability of Mil-Std-882E tasks;
  • General requirements;
  • Process with eight elements; and
  • Application of process theory to the real world

Transcript: Preliminary Hazard Identification

CLICK HERE for the Transcript

System Safety Process

Hi, everyone, and welcome to the Safety Artisan. I’m Simon, your host. Today I’m going to be using my experience with System Safety Engineering to talk you through the process that we need to follow to achieve success. Because to use a corny saying, ‘Safety doesn’t happen by accident’. Safety is what we call an emergent property. And to get it, we need to decide what we mean by safety, decide what our goals are, and then work out how we’re going to get there. It’s a planned systematic activity. Especially if we’re going to deal with very complex projects or situations. Times where there is a requirement to make that understanding and that planning explicit. Where the requirement becomes the difference between success and failure. Anyway, that’s enough of that. Let’s get on and look at the session.

Military Standard 882E, Section 4 General Requirements

Today we’re talking about System Safety Process. To help us do that, we’re going to be looking at a particular standard – the general requirements of that standard. And those are from Section Four of Military Standard 882E. But don’t get hung up on which standard it is. That’s not the point here. It’s a means to an end. I’ll talk about other standards and how we perform system safety engineering in different domains.

Learning Objectives

Our learning objectives for today are here. In this session, you will learn, or you’ll know, the system safety process in accordance with that Mil. Standard. You will be able to list and order the eight elements of the process. You will understand how to apply the eight elements. And you will be able to apply system safety with some skill using realistic processes. We’re going to spend quite a bit of time talking about how it’s actually done vs. how it appears on a sheet of paper. Also known as how it appears written in a standard. So, we’re going to talk about doing it in the real world. At the end of all that, you will be able to feel more confident dealing with multiple different standards.

The focus is not on this military standard, but on understanding the process. The fundamentals of what we’re trying to achieve and why. Then you will be able to extrapolate those principles to other standards. And that should help you to understand whatever it is you’re dealing with. It doesn’t have to be Mil. Standard 882E.

Contents of this Session

We’ve got four sets of contents in the session. First of all, I’m going to talk about the applicability of Military Standard 882E. From the standard itself and the tasks (you’ll see why that’s important) to understanding what you’re supposed to do. Then other standards later on. I’m going to talk about those general requirements that the standard places on us to do the work. A big part of that is looking at a process following the eight elements. And finally, we will apply that theory of how the process should work to the real world. And that will include learning some real-world lessons. You should find these useful for all standards and all circumstances.

So, it just remains for me to say thank you very much for listening. You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Mil-Std-882E Safety Analysis System Safety

How to Understand Safety Standards

Learn How to Understand Safety Standards with this FREE session from The Safety Artisan.

In this module, Understanding Your Standard, we’re going to ask the question: Am I Doing the Right Thing, and am I Doing it Right? Standards are commonly used for many reasons. We need to understand our chosen system safety engineering standard, in order to know: the concepts, upon which it is based; what it was designed to do, why and for whom; which kinds of risk it addresses; what kinds of evidence it produces; and it’s advantages and disadvantages.

Understand Safety Standards : You’ll Learn to

  • List the hazard analysis tasks that make up a program; and
  • Describe the key attributes of Mil-Std-882E. 
Understanding Your Standard

Topics:  Understand Safety Standards

Aim: Am I Doing the Right Thing, and am I Doing it Right?

  • Standards: What and Why?
  • System Safety Engineering pedigree;
  • Advantages – systematic, comprehensive, etc:
  • Disadvantages – cost/schedule, complexity & quantity not quality.

Transcript: Understand Safety Standards

Click here for the Transcript on Understanding Safety Standards

In Module Three, we’re going to understand our Standard. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we’re going to ask ‘Are we doing the right thing?’ And ‘Are we doing it right?’

What and Why?

So, what are we going to do, and why are we doing it? First of all, the use of standards in safety is very common for lots of reasons. It helps us to have confidence that what we’re doing is good enough. We’ve met a standard of performance in the absolute sense. It helps us to say, ‘We’ve achieved standardization or commonality in what we’re doing’. And we can also use it to help us achieve a compromise. That can be a compromise across different stakeholders or across different organizations. And standardization gives us some of the other benefits as well. If we’re all doing the same thing rather than we’re all doing different things, it makes it easier to train staff. This is one example of how a standard helps.

However, we need to understand this tool that we’re going to use. What it does, what it’s designed to do, and what it is not designed to do. That’s important for any standard or any tool. In safety, it’s particularly important because safety is in many respects intangible. This is because we’re always looking to prevent a future problem from occurring. In the present, it’s a little bit abstract. It’s a bit intangible. So, we need to make sure that in concept what we’re doing makes sense and is coherent. That it works together. If we look at those five bullet points there, we need to understand the concept of each standard. We need to understand the basis of each one.

And they’re not all based on the same concept. Thus some of them are contradictory or incompatible. We need to understand the design of the standard. What the standard does, what the aim of the standard is, why it came into existence. And who brought it into existence. To do what for who – who’s the ultimate customer here?

And for risk analysis standards, we need to understand what kind of risks it addresses. Because the way you treat a financial risk might be very different from a safety risk. In the world of finance, you might have a portfolio of products, like loans. These products might have some risks associated with them. One or two loans might go bad and you might lose money on those. But as long as the whole portfolio is making money that might be acceptable to you. You might say, ‘I’m not worried about that 10% of my loans have gone south and all gone wrong. I’m still making plenty of profit out of the other 90%’. It doesn’t work that way with safety. You can’t say ‘It’s OK that I’ve killed a few people over here because all this a lot over here are still alive!’. It doesn’t work like that!

Also, what kind of evidence does the standard produce? Because in safety, we are very often working in a legal framework that requires us to do certain things. It requires us to achieve a certain level of safety and prove that we have done so. So, we need certain kinds of evidence. In different jurisdictions and different industries, some evidence is acceptable. Some are not. You need to know which is for your area.

And then finally, let’s think about the pros and cons of the standard, what does it do well? And what does it do not so well?

System Safety Pedigree

We’re going to look at a standard called Military Standard 882E. Many decades ago, this standard developed was created by the US government and military to help them bring into service complex-cutting edge military equipment. Equipment that was always on the cutting edge. That pushed the limits of what you could achieve in performance.

That’s a lot of complexity. Lots of critical weapon systems, and so forth. And they needed something that could cope with all that complexity. It’s a system safety engineering standard. It’s used by engineers, but also by many other specialists. As I said, it’s got a background from military systems. These days you find these principles used pretty much everywhere. So, all the approaches to System Safety that 882 introduced are in other standards. They are also in other countries.

It addresses risks to people, equipment, and the environment, as we heard earlier. And because it’s an American standard, it’s about system safety. It’s very much about identifying requirements. What do we need to happen to get safety? To do that, it produces lots of requirements. It performs analyses in all those requirements and generates further requirements. And it produces requirements for test evidence. We then need to fulfill these requirements. It’s got several important advantages and disadvantages. We’re going to discuss these in the next few slides.

Comprehensive Analysis

Before we get to that, we need to look at the key feature of this standard. The strengths and weaknesses of this standard come from its comprehensive analysis. And the chart (see the slide) is meant to show how we are looking at the system from lots of different perspectives. (It’s not meant to be some arcane religious symbol!) So, we’re looking at a system from 10 different perspectives, in 10 different ways.

Going around clockwise, we’ve got these ten different hazard analysis tasks. First of all, we start off with preliminary hazard identification. Then preliminary hazard analysis. We do some system requirements hazard analysis. So, we identify the safety requirements that the system is going to meet so that we are safe. We look at subsystem and system hazard analysis. At operating and support hazard analysis – people working with the system. Number seven, we look at health hazard analysis – Can the system cause health problems for people? Functional hazard analysis, which is all about what it does. We’re thinking of sort of source software and data-driven functionality. Maybe there’s no physical system, but it does stuff. It delivers benefits or risks. System of systems hazard analysis – we could have lots of different and/or complex systems interacting. And then finally, the tenth one – environmental hazard analysis.

If we use all these perspectives to examine the system, we get a comprehensive analysis of the system. From this analysis, we should be confident that we have identified everything we need to. All the hazards and all the safety requirements that we need to identify. Then we can confidently deliver an appropriate safe system. We can do this even if the system is extremely complex. The standard is designed to deal with big, complex cutting-edge systems.

Advantages #1

In fact, as we move on to advantages, that’s the number one advantage of this standard. If we use it and we use all 10 of those tasks, we can cope with the largest and the most demanding programs. I spent much of my career working on the Eurofighter Typhoon. It was a multi-billion-dollar program. It cost hundreds of billions of dollars, four different nations worked together on it. We used a derivative of Mil. Standard 882 to look at safety and analyze it. And it coped. It was powerful enough to deal with that gigantic program. I spent 13 years of my life on and off on that program so I’d like to think that I know my stuff when we’re talking about this.

As we’ve already said, it’s a systematic approach to safety. Systems, safety, engineering. And we can start very early. We can start with early requirements – discovery. We don’t even need a design – we know that we have a need. So we can think about those needs and analyze them.

And it can cover us right through until final disposal. And it covers all kinds of elements that you might find in a system. Remember our definition of ‘system’? It’s something that consists of hardware, software, data, human beings, etc. The standard can cope with all the elements of a system. In fact, it’s designed into the standard. It was specifically designed to look at all those different elements. Then to get different insights from those elements. It’s designed to get that comprehensive coverage. It’s really good at what it does. And it involves, not just engineers, but people from all kinds of other disciplines. Including operators, maintainers, etc, etc.

I came from a maintenance background. I was either directly or indirectly supporting operators. I was responsible for trying to help them get the best out of their system. Again, that’s a very familiar world to me. And rigorous standards like this can help us to think rigorously about what we’re doing. And so get results even in the presence of great complexity, which is not always a given, I must say.

So, we can be confident by applying the standard. We know that we’re going to get a comprehensive and thorough analysis. This assures us that what we’re doing is good.

Advantages #2

So, there’s another set of advantages. I’ve already mentioned that we get assurance. Assurance is ‘justified confidence’. So we can have high confidence that all reasonably foreseeable hazards will be identified and analyzed. And if you’re in a legal jurisdiction where you are required to hit a target, this is going to help you hit that target.

The standard was also designed for use in contracts. It’s designed to be applied to big programs. We’d define that as where we are doing the development of complex high-performance systems. So, there are a lot of risks. It’s designed to cope with those risks.

Finally, the standard also includes requirements for contracting, for interfaces with other systems, for interfaces with systems engineering. This is very important for a variety of disciplines. It’s important for other engineering and technical disciplines. It’s important for non-technical disciplines and for analysis and recordkeeping. Again, all these things are important, whether it is for legal reasons or not. We need to do recordkeeping. We need to liaise with other people and consult with them. There are legal requirements for that in many countries. This standard is going to help us do all those things.

But, of course, in a standard everything has pros and cons and Mil. Standard 882 is no exception. So, let’s look at some of the disadvantages.

Disadvantages #1

First of all, a full system safety program might be overkill for the system that you want to use, or that you want to analyze.  The Cold War, thank goodness, is over; generally speaking, we’re not in the business of developing cutting-edge high-performance killing machines that cost billions and billions of dollars and are very, very risky. These days, we tend to reduce program risk and cost by using off-the-shelf stuff and modifying it. Whether that be for military systems, infrastructure in the chemical industry, transportation, whatever it might be. Very much these days we have a family of products and we reuse them in different ways. We mix and match to get the results that we want.

And of course, all this comprehensive analysis is not cheap and it’s not quick. It may be that you’ve got a program that is schedule-constrained. Or you want to constrain the cost and you cannot afford the time and money to throw a full 882 program at it. So, that’s a disadvantage.

The second family of problems is that these kinds of safety standards have often been applied prescriptively. The customer would often say, ‘Go away and go and do this. I’m going to tell you what to do based on what I think reduces my risk’. Or at least it covers their backside. So, contractors got used to being told to do certain things by purchasers and customers. The customers didn’t understand the standards that they were applying and insisting upon. So, the customers did not understand how to tailor a safety standard to get the result that they wanted. So they asked for dumb things or things that didn’t add value. And the contractors got used to working in that kind of environment. They got used to being told what to do and doing it because they wouldn’t get paid if they didn’t. So, you can’t really blame them.

But that’s not great, OK? That can result in poor behaviors. You can waste a lot of time and money doing stuff that doesn’t actually add value. And everybody recognizes that it doesn’t add value. So you end up bringing the whole safety program into disrepute and people treat it cynically. They treat it as a box-ticking exercise. They don’t apply creativity and imagination to it. Much less determination and persistence. And that’s what you need for a good effective system safety program. You need creativity. You need imagination. You need people to be persistent and dedicated to doing a good job. You need that rigor so that you can have the confidence that you’re doing a good job because it’s intangible.

Disadvantages #2

Let’s move onto the second kind of family of disadvantages. And this is the one that I’ve seen the most, actually, in the real world. If you do all 10 tasks and even if you don’t do all 10, you can create too many hazards. If you recall the graphic from earlier, we have 10 tasks. Each task looks at the system from a different angle. What you can get is lots and lots of duplication in hazard identification. You can have essentially the same hazards identified over and over again in each task. And there’s a problem with that, in two ways.

First of all, quality suffers. We end up with a fragmented picture of hazards. We end up with lots and lots of hazards in the hazard log, but not only that. We get fragments of hazards rather than the real thing. Remember I said those tests for what a hazard really is? Very often you can get causes masquerading as hazards. Or other things that that exacerbating factors that make things worse. They’re not a hazard in their own right, but they get recorded as hazards. And that problem results in people being unable to see the big picture of risk. So that undermines what we’re trying to do. And as I say, we get lots of things misidentified and thrown into the pot. This also distracts people. You end up putting effort into managing things that don’t make a difference to safety. They don’t need to be managed. Those are the quality problems.

And then there are quantity problems. And from personal experience, having too many hazards is a problem in itself.  I’ve worked on large programs where we were managing 250 hazards or thereabouts. That is challenging even with a sizable, dedicated team. That is a lot of work in trying to manage that number of hazards effectively. And there’s always the danger that it will slide into becoming a box-ticking exercise. Superficial at best.

I’ve also seen projects that have two and a half thousand hazards or even 4000 hazards in the hazard log. Now, once you get up to that level, that is completely unmanageable. People who have thousands of hazards in a hazard log and they think they’re managing safety are kidding themselves. They don’t understand what safety is if they think that’s going to work. So, you end up with all these items in your hazard log, which become a massive administrative burden. So people end up taking shortcuts and the real hazards are lost. The real issues that you want to focus on are lost in the sea of detail that nobody will ever understand. You won’t be able to control them.

Unfortunately, Mil. Standard 882 is good at generating these grotesque numbers of hazards. If you don’t know how to use the standard and don’t actively manage this issue, it gets to this stage. It can go and does go, badly wrong. This is particularly true on very big programs. And you really need clarity on big projects.

Summary of Module

Let’s summarize what we’ve done with this module. The aim was to help us understand whether we’re doing the right thing and whether we’ve done it right. And standards are terrific for helping us to do that. They help us to ensure we’re doing the right thing. That we’re looking at the right things. And they help us to ensure that we’re doing it rigorously and repeatedly. All the good quality things that we want. And Mil. Standard 882E that we’re looking at is a system safety engineering standard. So it’s designed to deal with complexity and high-performance and high-risk. And it’s got a great pedigree. It’s been around for a long time.

Now that gives advantages. So, we have a system safety program with this standard that helps us to deal with complexity. That can cope with big programs, with lots of risks. That’s great.

The disadvantages of this standard are that if we don’t know how to tailor or manage it properly, it can cost a lot of money. It can take a lot of time to give results which can cause problems for the program. And ultimately, you can accidentally ignore safety if you don’t deliver on time. And it can generate complexity. And it can generate a quantity of data that is so great that it actually undermines the quality of the data. It undermines what we’re trying to achieve. In that, we get a fragmented picture in which we can’t see the true risks. And so we can’t manage them effectively. If we get it wrong with this standard, we can get it really wrong. And that brings us to the end of this module.

This is Module 3 of SSRAP

This is Module 3 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.

Categories
Mil-Std-882E Safety Analysis

System Safety Risk Assessment

Learn about System Safety Risk Assessment with The Safety Artisan.

In this module, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

You Will Learn to:

  • Explain what a system safety approach is and does; and
  • Define what a risk analysis program is; 
System Safety Risk Analysis.

Topics: System Safety Risk Assessment

Aim: How do we deal with real-world complexity?

  • What is System Safety?
  • The Need for Process;
  • A Realistic, Useful, Powerful process:
    • Context, Communication & Consultation; and
    • Monitoring & Review, Risk Treatment.
  • Required Risk Reduction.

Transcript: System Safety Risk Assessment

Click here for the Transcript on System Safety Risk Assessment

In this module, on System Safety Risk Assessment, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

What is System Safety?

To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context. This wider context is operational effectiveness – We want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.

And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work with the intention to produce specified results.

We’ve got three things there. We’ve got a system. We’ve got the operating environment within which it works- or designed to work. And we have the thing that it’s supposed to produce; its function or its application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?

That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.

We Need A Process

We’ve sorted our context. How is all this going to happen? We need a process. In the standard that we’re going to look at in the next module, we have an eight-element process. As you can see there, we start with documenting our approach. Then we identify and document hazards. We document everything according to the standard so forget that.

We assess risk. We plan how we’re going to mitigate the risk. We identify risk mitigation measures or controls as there are often known. Then we apply those controls to reduce risk. We verify and confirm that the risk reduction that we have achieved, or that we believe we will achieve. And then we got to get somebody to accept that risk. In other words, to say that it is an acceptable level of risk. That we can put up with this level of risk in exchange for the benefits that the system is going to give us. Finally, we need to manage risk through the entire lifecycle of the system until we finally get rid of it.

The key point about this is whatever process we follow, we need to approach it with rigor. We stick to a systematic process. We take a structured and rigorous approach to looking at our system.

And as you can see there from the arrows, every step in the eight-element sequence flows into the next step. Each step supports and enables the following steps. We document the results as we go. However, even this example is a little bit too simple.

A More Realistic Process

So, let’s get a more realistic process. What we’ve got here are the same things we’ve had before. We’ve established the context at the beginning. Next, there’s risk assessment. Risk assessment consists of risk identification, risk analysis, and risk evaluation. It asks ‘Where are we?’ in relation to a yardstick or framework that categorizes risk. The category determines whether a risk is acceptable or not.

After determining whether the risk is acceptable or not, we may need to apply some risk treatment. Risk Treatment will reduce the risk further. By then we should have the risk down to an acceptable level.

So, that’s the straight-through process, once through. In the real world, we may have to go around this path several times. Having treated the risk over a period of time, we need to monitor and review it. We need to make sure that the risk turns out, in reality, to be what we estimated it to be. Or at least no worse. If it turns out to be better- Well, that’s great!

And on that monitoring and review cycle, maybe we even need to go back because the context has changed. These changes could include using the system to do something it was not designed to do. Or modifying the system to operate in a wider variety of environments. Whatever it might be, the context has changed. So, we need to look again at the risk assessment and go round that loop again.

And while we’re doing all that, we need to communicate with other people. These other people include end-users, stakeholders, other people who have safety responsibilities. We need to communicate with the people who we have to work with. And we have to consult people. We may have to consult workers. We may have to consult the public, people that we put at risk, other duty holders who hold a duty to manage risk. That’s our cycle. That’s more realistic. In my experience as a safety engineer, this is much more realistic. A once-through process often doesn’t cut it.

Required Risk Reduction

We’re doing all this to drive risk down to an acceptable level. Well, what do we mean by that? Well, there are several different ways that we can do this, and I’ve got to illustrate it here. On the left-hand side of the slide, we have what’s usually known as the ALARP triangle. It’s this thing that looks a bit like a carrot where the width of the triangle indicates the amount of risk. So, at the top of the triangle, we’ve got lots of risks. And if you’re in the UK or Australia where I live, this is the way it’s done. So there will be some level of risk that is intolerable. Then if the risk isn’t intolerable, we can only tolerate it or accept it if it is ALARP or SFARP. And ALARP means that we’ve reduced the risk as low as reasonably practicable. And SFARP means so far as is reasonably practicable. Essentially, they’re the same thing – reasonably practical.

We must ensure that we have applied all reasonably practicable risk reduction measures. And once we’ve done so, if we’re in this tolerable or acceptable region, then we can live with the risk. The law allows us to do that.

That’s how it’s done in the UK and Australia. But in other jurisdictions, like the USA, you might need to use a different approach. A risk matrix approach as we can see on the right-hand side of this slide. This particular risk matrix is from the standard we’re about to look at. And we could take that and say, ‘We’ve determined what the risk is. There is no absolute limit on how much risk we can accept. But the higher the risk, the more senior level of sign-off from management we need’. In effect, you are prioritizing the risk. So you only bring the worst risks to the attention of senior management. You are asking  ‘Will you accept this? Or are you prepared to spend the money? Or will you restrict the operational system to reduce the risk?’. This is good because it makes people with authority consider risks. They are responsible and need to make meaningful decisions.

In short, different approaches are legal in different jurisdictions.

Summary of Module

In Module Two, we’ve asked ourselves, ‘How can we deal with real-world complexity?’. And one way that’s developed to do that is System Safety. System Safety is where we take a systematic approach to safety. This approach applies to both the system itself – the product – and the process of System Safety.

We address product and process. We need that rigorous process to give us confidence that what we’ve done is good enough. We have a realistic, useful and powerful process that enables us to put things in context. It helps us to communicate with everyone we need to, to consult with those that we have a duty to consult with. And also, we put around the basic risk process, this monitoring and review. And of course, we analyze risk to reduce it to acceptable levels. So we’ve got to treat the risk or reduce it or control it in some way to get it to those acceptable levels. In the end, it’s all about getting that required risk reduction to work. That reduction makes the risk acceptable to expose human beings to, for the benefit that it will give us.

This is Module 2 of SSRAP

This is Module 2 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.