Categories
Blog Safety Analysis Tools & Techniques

Five Ways to Identify Hazards

In my webinar ‘Five Ways to Identify Hazards’ I look at a mix of techniques. We need these diverse techniques to assure us (give justified confidence) that we have identified the full range of hazards associated with a system.

To do this I draw on my 25 years of experience (see ‘Meet the Author‘, below) and relevant standards. Here’s the introduction to the webinar.

Five Ways to Identify Hazards: Video Introduction

Webinar: ‘Five Ways to Identify Hazards’

Four Things to Remember

For hazard identification, we need to be aware of four things.

What we’re doing is we are imagining what could go wrong. And I want to emphasize, first of all, imagination. We need to be open to what could happen. That’s the mindset that we need, and we’re looking at what could go wrong, not what will go wrong. Think about possibilities, not certainties.

The second thing is that it’s very easy to dive down a rabbit hole and get into mega detail about one particular thing and spend lots of time, waste lots of time doing that. That’s not what we need to be doing. We need a broad approach. We need to go wide and think about as many different possible hazards as we can. Don’t dive deep that will come later, the deep analysis will come later.

Another aspect of that point is we’re talking about hazard identification. We’re just here to identify hazards. We’re not here to try to assess them yet.

Yet another mistake that people make is to try and jump straight to fixing the hazard. Many of us watching will be engineers. We love fixing problems. We like to solve problems, but we’re not here to solve the problem yet. We’re only here to identify it. So we’re going to avoid the temptation to jump in and try and come up with a solution. That’s not what we’re doing with hazard identification.

So those are four things to bear in mind.

Five Ways to Identify Hazards

Let’s move on. So I’ve said that this was entitled five ways to identify hazards.

There are, of course, many ways to identify hazards, but I just thought I’d pick on these five because there was a nice broad range of things and things that I can show you how to do straight away.

Those are the five things that we’ve got and we’ll have a slide on each one of those. First, we can ask the workers or end users or their representatives. Secondly, we can inspect the workplace, we can look around for hazards. And maybe we’ve got a real workplace that we can look at or maybe we’ve just got a representation, we can do both.

We can use a hazard identification checklist, we can survey historical data. So all the squiggly lines at the bottom of the screen, there’s an example of some historical data and we can conduct a number of analyses on that.

But the analysis I picked on (Number 5) is Functional Failure Analysis and we’ll see why in just a moment. So those are the five things that we will cover in the next hour. We’ll also have time for a Question and Answer session and then a worked example of how to do a simple Functional Failure Analysis…

There’s More!

This is just one of many webinars in my Safety Engineering Academy. You can see summaries of them all in this blog post.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Safety Analysis Tools & Techniques

Exploring Causal Analysis: Techniques and Insights

In this post, ‘Exploring Causal Analysis: Techniques and Insights’, I provide a quick summary of my recent webinar. You can see a short video introduction below, or access the full webinar at my Safety Engineering Academy.

Introduction:

Causal analysis is a vital aspect of system safety engineering, offering insights into the root causes of issues and guiding effective problem-solving strategies. In this webinar, we delve into various causal analysis techniques and discuss their practical applications in diverse domains.

Section 1: Introduction to Causal Analysis

Causal analysis involves understanding the sequence of events leading to an outcome and identifying the underlying factors contributing to it. We explore the fundamentals of causal analysis and its significance in safety engineering.

Section 2: Popular Causal Analysis Techniques

We examine eight popular causal analysis methods, including Pareto charts, Failure Mode and Effects Analysis (FMEA), 5 Whys, Ishikawa diagrams, Fault Tree Analysis (FTA), 8D reporting, DMAIC, and Scatter Diagrams. Each technique is analyzed for its strengths, limitations, and practical utility.

Section 3: Deeper Dive into Selected Techniques

We take a closer look at selected causal analysis techniques, exploring their application in real-world scenarios. Examples include using Pareto charts to identify dominant causes, leveraging FMEA for failure mode analysis, and utilizing Fault Tree Analysis for assessing complex system failures.

Section 4: Insights and Reflections

Drawing from years of experience in system safety engineering across diverse domains and international contexts, we share insights and reflections on the effectiveness of different causal analysis techniques. We emphasize the importance of choosing the right technique based on the specific objectives and available data.

Section 5: Resources and Next Steps

We provide attendees with valuable resources for further exploration of causal analysis techniques, including links to webinars, online courses, and templates. Additionally, we offer a glimpse into ongoing research and developments in the field of safety engineering.

Conclusion:

Causal analysis is a dynamic and evolving field that plays a crucial role in ensuring system reliability and safety. By employing a range of techniques and approaches, safety practitioners can gain deeper insights into the root causes of issues and implement effective risk management strategies.

Bonus: Q&A Sessions

Here are the two Q&A Sessions from the Webinar:

Causal Analysis – Q&A Session 1

Exploring Causal Analysis: Techniques and Insights – Get more here

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Mil-Std-882E Safety Analysis software safety

Functional Hazard Analysis with Mil-Std-882E

In this video, I look at Functional Hazard Analysis with Mil-Std-882E (FHA, which is Task 208 in Mil-Std-882E). FHA analyses software, complex electronic hardware, and human interactions. I explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (I refer to other lessons for special techniques for software safety and Human Factors.)

This video, and the related webinar ‘Identify & Analyze Functional Hazards’, deal with an important topic. Programmable electronics and software now run so much of our modern world. They control many safety-related products and services. If they go wrong, they can hurt people.

I’ve been working with software-intensive systems since 1994. Functional hazards are often misunderstood or overlooked, as they are hidden. However, the accidents that they can cause are very real. If you want to expand your analysis skills beyond just physical hazards, I will show you how.

This is the seven-minute demo; the full version is 40 minutes long.

Functional Hazard Analysis: Context

So how do we analyze software safety?

Before we even start, we need to identify those system functions that may impact safety. We can do this by performing a Functional Failure Analysis (FFA) of all system requirements that might credibly lead to human harm.

An FFA looks at functional requirements (the system should do ‘this’ or ‘that’) and examines what could go wrong:

  • Does the function work when needed?
  • Does the function work when not required?
  • Does the function work incorrectly? (There may be more than one version of this.)

(A variation of this technique is explained here.)

If the function could lead to a hazard then it is marked for further analysis. This is where we apply the FHA, Task 208.

Functional Hazard Analysis: The Lesson

Topics: Functional Hazard Analysis

  • Task 208 Purpose;
  • Task Description;
  • Update & Reporting
  • Contracting; and
  • Commentary.

Transcript: Functional Hazard Analysis

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyze the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is Task 208 in Mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from Task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? In classic 882 style, Task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, we use it to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. In the real world, lots of software-intensive systems cause accidents that have killed people, even when they’re functioning as intended. That’s one of the shortcomings of this Military Standard – it focuses on failure. But even if something performs as specified, either:

  • The specification might be wrong, or
  • The system might do something that the human operator does not expects.

Mil-Std-882E just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity – severity only, we’ll come back to that – to identify what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behavior over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this – the assumption in this task is that we’re doing early – We’ll see that later – and that system design, system architecture, is still up for grabs. We can still influence it.

COTS and MOTS Software

Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life – But we’ll talk about how you deal with the realities later.

And they’re allocating these functions and these items of interest to hardware, software, and human interfaces. And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those types of simple components are only really subject to random failure, that’s not what we’re talking about here.

We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide is on purpose; so, we use the FHA to identify the consequences of malfunction, functional failure, or lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a system engineering process – that’s not always the case. We’ll talk about that at the end as well.

Also, we’re going to identify and document these functions and items and allocate and it says to partition them in the software design architecture. When we say partition, that’s jargon for separating them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done….

Then What?

Once the FFA has identified the required ‘Level or Rigor’, we need to translate that into a suitable software development standard. This might be:

  • RTCA DO-178C (also know as ED-12C) for civil aviation;
  • The US Joint Software System Safety Engineering Handbook (JSSEH) for military systems;
  • IEC 61508 (functional safety) for the process industry;
  • CENELEC-50128 for the rail industry; and
  • ISO 26262 for automotive applications.

Such standards use Safety Integrity Levels (SILs) or Development Assurance Levels (DALs) to enforce appropriate Levels of Rigor. You can learn about those in my course, Principles of Safe Software Development.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Categories
Blog Mil-Std-882E Safety Analysis

How to do Preliminary Hazard Analysis with Mil-Std-882E

In this 45-minute session, I look at how to do a Preliminary Hazard Analysis with Mil-Std-882E. Preliminary Hazard Analysis, or PHA, is Task 202 in the Standard.

I explore Task 202’s aim, description, scope, and contracting requirements. There’s value-adding commentary, and I explain the issues with PHA – how to do it well and avoid the pitfalls.

Now, I have worked in System Safety since 1996, and I think that PHA is one of the three tasks that EVERY project needs to do. The other two are:

I look at these three tasks together in my course ‘Foundations of Safety Assessment’. This is one of five linked courses on Mil-Std-882E. They will teach you how to get the maximum benefits from your System Safety Program.

This is the seven-minute-long demo video. The full video is 45 minutes long.

Topics: How to do Preliminary Hazard Analysis

  • Task 202 Purpose;
  • Task Description;
  • Recording & Scope;
  • Risk Assessment (Tables I, II & III);
  • Risk Mitigation (order of preference);
  • Contracting; and
  • Commentary.

Transcript: How to do Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you’ll find professional, pragmatic, and impartial safety training resources. So, we’ll get straight on to our session, which is on the 8th of February 2020.

Preliminary Hazard Analysis

Now we’re going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It’s very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It’s one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I’m doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording, and scope. How we do risk assessments against Tables 1, 2, and 3. These tables describe severities, likelihoods, and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting, and then a short commentary from myself. I’m providing commentary all the way through. So, let’s crack on.

Task 202 Purpose

The purpose of Task 202, as it says, is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks, and identify potential mitigation measures. We’re going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I’ve highlighted some stuff that I particularly want to talk about.

It says “the contractor” [does this or that], but it doesn’t matter who is doing the analysis, and, the customer needs to do something to inform themselves, otherwise they won’t understand what they’re doing.  Whoever does it needs to perform and document a PHA. It’s about determining initial risk assessments. There’s going to be more work, more detailed work done later. But for now, we’re doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions we propose to introduce. That’s very important. We don’t need a design to do this. We can get in early when we have user requirements, functional requirements, and that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that’s another big issue. We’ll talk about that more later. It says to include mishap data as well, if accessible: American term mishap, means an accident, but we’re avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, or whatever. It’s a mishap. It’s an undesirable event.

We look for accessible data from similar systems, legacy systems, and other lessons learned. I’ve talked about that a little bit in the Task 201 lesson, and there’s more on that today under commentary. We need to look at provisions, and alternatives, meaning design provisions and design alternatives to reduce risks and add mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What’s the task description? That’s a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we’ve got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call a hazard tracking system; we might call it a hazard log; we might call it a risk register. It doesn’t matter what it’s called. The key point is it’s a tracking system. It’s a live document, as people say, it’s a spreadsheet or a database, something like that. It’s something relatively easy to update and change.

Also, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and refine them and change them as time goes on. Very important point.

That’s it for the Demo…

End: How to do Preliminary Hazard Analysis

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Blog Safety Analysis

Preliminary Hazard Identification & Analysis Guide

Get your free Preliminary Hazard Identification & Analysis, PHIA Guide here!

Introduction

Hazard Identification is sometimes defined as: “The process of identifying and listing the hazards and accidents associated with a system.”

Hazard Analysis is sometimes defined as: “The process of describing in detail the hazards and accidents associated with a system and defining accident sequences.”

Preliminary Hazard Identification and Analysis (PHIA) helps you determine the scope of safety activities and requirements. You can identify the main hazards likely to arise from the capability and functionality being provided. Perform it as early as possible in the project life cycle. Thus, you will provide important early input to setting Safety requirements and refining the Project Safety Plan.

PHIA seeks to answer, at an early stage of the project, the question: “What Hazards and Accidents might affect this system and how could they happen?”

Aim

The PHIA aims to identify, as early as possible, the main Hazards and Accidents that may arise during the life of the system. It provides input to:

  1. Scoping the subsequent Safety activities required in any Safety Plan. A successful PHIA will help to gauge the proportionate effort that is likely to be required to produce an effective Safety Case, proportionate to risks.
  2. Selecting or eliminating options for subsequent assessment.
  3. Setting the initial Safety requirements and criteria.
  4. Subsequent Hazard Analyses.
  5. Initiate Hazard Log.

Description

Perform a PHIA as early as possible to obtain maximum benefit. Use it to understand what the Hazards and Accidents are, why, and how they might be realized. A PHIA is an important part of Risk Management, project planning, and requirements definition. It helps you to identify the main system hazards and helps target where a more thorough analysis should be undertaken.

Usually, PHIA is based on a structured brainstorming exercise, supported by hazard checklists. A structured approach helps to minimize the possibility of missing an important hazard. It also demonstrates that a comprehensive approach has been applied.

Get Your PHIA Guide as part of the FREE Learning Bundle

Front cover of PHIA Guide
Subscribe to The Safety Artisan Mailing List and get your Free Gift!

Find more on basic safety topics at Start Here.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Safety Analysis System Safety

Foundations of Safety Assessment

In this post on the Foundations of Safety Assessment, I’m going to look at the (few) things that we need to do in every System Safety Program.

Because we don’t always need to do everything. We don’t always need to throw everything at the problem. Some systems are simpler than others, and they don’t need the ‘whole nine yards’ in order to get a decent result. With that knowledge, we’re going to be able to design an analysis program for different applications or for different systems.

As an example, I’m going to use Military Standard 882E (Mil-Std-882E). Under that standard we would use these Tasks:

  • Task 201 – Preliminary Hazard Identification;
  • Task 202 – Preliminary Hazard Analysis; and
  • Task 203 – System Requirements Hazard Analysis.

(You will also find related material in my posts on Safety Analysis Techniques Overview and tailoring your Risk Analysis Program.)

Foundations of Safety Assessment – The Big Picture

I promised you we were going to look at the overview of the sequence.

And I think this is what pulls it all together and explains it powerfully. So the background to this is we’ve got, an accident or mishap sequence. Whatever you want to call it and we start with causes on the left and causes lead two a hazard, and then a has it can lead to multiple consequences.

Bowtie diagram showing five types of hazard analysis.
Bowtie showing the Foundations of System Safety

That is what the bowtie here is representing. It’s showing that multiple causes can lead to a single hazard, and a single hazard can lead to multiple consequences.

Don’t worry too much about the bow tie. I’m not pushing that in particular, it’s a useful technique, but it’s not the only one. We’ll come onto that – that’s the background.

This is the accident sequence we’re trying to discover and understand. I’m going to talk a lot about discovery and understanding.

Preliminary Hazard Identification

Typically, we will start by trying to identify hazards. There are techniques out there that will help us identify hazards associated with the system being used in a specific application, or purpose, in a specific operating environment.

Always bear in mind those three questions about the context, that help us to do this. What’s the system? What are we using it for? and in what environment?

And if we change any of those things, then probably the hazards will change. But we start off with preliminary hazard identification, which is intended to identify hazards. There’s a big, big arrow pointing at hazards, but also, inevitably, it will identify causes and consequences as well, because it’s not always clear. What is the hazard when you start? talking of discovery, we’re going to discover some stuff.

We may finally classify what we’re talking about later. we’re trying to discover hazards. In reality, we’re going to discover lots of stuff, but mainly we hope hazards, that’s stage one.

System Requirements Hazard Analysis

Now, then we’re actually going to step outside of the accident sequence itself. We’re going to do some requirements analysis, and the requirements analysis has to come after the PHIA because some safety requirements are driven by the presence of certain hazards.

If you’ve got a noise hazard somebody’s hearing might be affected, then regulations in multiple countries are going to require you to do certain things to monitor the noise. Let’s say or monitor the effect that it’s having on workers and put in place a program to handle that. The presence of certain hazards will drive certain requirements for safety controls or risk controls.

Then there are the broader requirements. Analysis of what the law requires, what the regulations require, codes of practice, etc. We’ll get onto that, and one of the things that requirements analysis is going to do is give us an initial stab of what we’ve got to have – certain controls because we’re required to. That’s a little bit of an aside in terms of the sequence, but it’s very, very important.

Preliminary Hazard Analysis

Thirdly, and, fourthly, once we’ve discovered some hazards, we’re going to need to understand what might cause those hazards and therefore how likely is the hazard to exist in particular circumstances, and then also think about the consequences that might arise from a hazard. And once we’ve explored those, we will be in a position to actually capture the risk.

 Because we will have some view on likelihood. And we would also have some view on the severity of consequences from considering the consequences. We’ll come onto that later.

Looking at Controls

Finally, having done all those other things, we will be in a position to take a much more systematic look at controls and say, we’ve got these causes. We’ve got these hazards. We’ve got these potential consequences.  What do I need to do to control this risk and prevent this accident sequence from playing out?

What I need to put in place to interrupt the accident sequence, and I’ve put the controls. The dashed lines indicate that we’ve got barriers to that accident sequence, and they are dashed because no control is perfect. (Other than gravity. But of course, if you turn your vehicle upside down, then gravity is working against you, so even gravity isn’t foolproof.)

No control is 100% effective. We need to just accept that and deal with that, and understand. There is your overview of the sequence, and I’ve spent a bit of time talking about that because it is absolutely fundamental to everything you’re going to do.

Well, That’s a Brief Summary of the Foundations of Safety Assessment

If you have any questions, please leave a comment below.

Categories
Mil-Std-882E Safety Analysis

Preliminary Hazard Identification with Mil-Std-882E

Want to know how to perform Preliminary Hazard Identification with Mil-Std-882E? (This is Task 201 under the standard.)

This is the first step in safety assessment.  We look at three classic complementary techniques to identify hazards and their pros and cons.  This includes all the content from Task 201, and also practical insights from my 25 years of experience with Mil-Std-882. 

You Will Learn to:

  • Conduct Preliminary Hazard Identification using diverse techniques for best results;
  • Define what Preliminary Hazard Identification is and does;
  • Record Preliminary Hazard Identification results correctly;
  • Contract for Preliminary Hazard Identification successfully; and
  • Apply it early enough to make a difference.
This is the seven-minute-long demo video.

Topics: Preliminary Hazard Identification with Mil-Std-882E

  • Task 201 Purpose & Task Description;
  • Historical Review;
  • Recording Results;
  • Contracting; and
  • Commentary:
    • Historical Data;
    • Hazard Checklists; and
    • Analysis Techniques.

Transcript: Preliminary Hazard Identification

Hello, everyone, and welcome to the Safety Artisan, where you will find instructional materials that are professional, pragmatic, and impartial because we don’t have anything to sell, and we don’t have an axe to grind. Let’s look at what we’re doing today, which is Preliminary Hazard Identification. We are looking at one of the first actual analysis tasks in Mil-Std-882E, which is a systems safety engineering standard from the US government, and it’s typically used on military systems, but it does turn up elsewhere.

Preliminary Hazard ID is Task 201

I’m recording this on the 2nd of February 2020, however, the Mil-Std has been in existence since May 2012 and it is still current, it looks like it is sticking around for quite a while, and this lesson isn’t likely to go out of date anytime soon.

Topics for this session

What we’re going to cover is, quoting from the task, first of all, we’re going to look at the purpose and the task description, where the task talks quite a lot about historical review (I think we’ve got three slides of that), recording results, putting stuff in contracts and then I’m adding some commentary of my own. I will be commenting all the way through, that’s the value add, that’s why I’m doing this, but then there’s some specific extra information that I think you will find helpful, should you need to implement Task 201. In this session, we’ve moved up one level from awareness and we are now looking at practice, at being equipped to actually perform safety jobs, to do safety tasks.

Preliminary Hazard Identification (T201)

The purpose of Task 201 is to compile a list of potential hazards early in development. two things to note here: it is only a list, it’s very preliminary. I’ll keep coming back to that, this is important. Remember, this is the very first thing we do that’s an analytical task. There are planning tasks in the 100 series, but actually, some of them depend on you doing Task 201 because you can’t work out how are you going to manage something until you’ve got some idea of what you’re dealing with. We’ll come back to that in later lessons.

It is a list of potential hazards that we’re after, and we’re trying to do it early in development. And I really can’t overemphasize how important it is to do these things early in development, because we need to do some work early on in order to set expectations, in order to set budgets, in order to set requirements and to basically get a grip, get some scope on what we think we might be doing for the rest of the program. this is a really important task and it should be done as early as possible, and it’s okay to do it several times. Because it’s an early task it should be quick, it should be fairly cheap. We should be doing it just as soon as we can when we’re at the conceptual stage when we don’t even have a proper set of requirements and then we redo it thereafter maybe. And maybe different organizations will do it for themselves and pass the information on to others. And we’ll talk about that later as well.

Task Description

This is the task description. It says the contractor shall – actually forget about who’s supposed to do it, lots of people could and should be doing this as part of their project management or program management risk reduction because as I said, this is fundamental to what we’re doing for the rest of the safety program and indeed maybe the whole project itself. So, what we need to do is “examine the system shortly after the material solution analysis begins and compile a Preliminary Hazard List (PHL) identifying potential hazards inherent in the concept”. That’s what the standard actually says.

A couple of things to note here. Saying that you start doing it after material solution analysis has begun might be read as implying you don’t do it until after you finish doing the requirements, and I think that’s wrong, I think that’s far too late. To my mind, that is not the correct interpretation. Indeed, if we look at the last four words in the definition, it says we’re “identifying potential hazards inherent in the concept”. That, I think, gives us the correct steer. we’ve got a concept, maybe not even a full set of requirements, what are the hazards associated with that concept, with that scope? And I think that’s a good way to look at it.

Historical Review

This task places a great deal of emphasis on the review of historical documentation, and specifically on reviewing documentation with similar and legacy systems. an old system, a legacy system that we are maybe replacing with this system but there might be other legacy systems around. We need to look at those systems. The assumption is that we actually have some data from similar and legacy systems. And that’s a key weakness really with this, is that we’re assuming that we can get hold of that data. But I’ll talk about the issues with that when I get to my commentary at the end.

We need to look at the following…

End: Preliminary Hazard Identification with Mil-Std-882E

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Mil-Std-882E Safety Analysis

System Safety Engineering Process

The System Safety Engineering Process – what it is and how to do it.

This is the full-length (50-minute) session on the System Safety Process, which is called up in the general requirements of Mil-Std-882E. I cover the Applicability of Mil-Std-882E tasks, the General Requirements, the Process with eight elements, and the Application of process theory to the real world. 

You Will Learn to:

  • Know the system safety process iaw Mil-Std-882E;
  • List and order the eight elements;
  • Understand how they are applied;
  • Skilfully apply system safety using realistic processes; and
  • Feel more confident dealing with this and other standards.
System Safety Process – this is the free demo.

Topics: System Safety Engineering Process

  • Applicability of Mil-Std-882E tasks;
  • General requirements;
  • Process with eight elements; and
  • Application of process theory to the real world

Transcript: Preliminary Hazard Identification

CLICK HERE for the Transcript

System Safety Process

Hi, everyone, and welcome to the Safety Artisan. I’m Simon, your host. Today I’m going to be using my experience with System Safety Engineering to talk you through the process that we need to follow to achieve success. Because to use a corny saying, ‘Safety doesn’t happen by accident’. Safety is what we call an emergent property. And to get it, we need to decide what we mean by safety, decide what our goals are, and then work out how we’re going to get there. It’s a planned systematic activity. Especially if we’re going to deal with very complex projects or situations. Times where there is a requirement to make that understanding and that planning explicit. Where the requirement becomes the difference between success and failure. Anyway, that’s enough of that. Let’s get on and look at the session.

Military Standard 882E, Section 4 General Requirements

Today we’re talking about System Safety Process. To help us do that, we’re going to be looking at a particular standard – the general requirements of that standard. And those are from Section Four of Military Standard 882E. But don’t get hung up on which standard it is. That’s not the point here. It’s a means to an end. I’ll talk about other standards and how we perform system safety engineering in different domains.

Learning Objectives

Our learning objectives for today are here. In this session, you will learn, or you’ll know, the system safety process in accordance with that Mil. Standard. You will be able to list and order the eight elements of the process. You will understand how to apply the eight elements. And you will be able to apply system safety with some skill using realistic processes. We’re going to spend quite a bit of time talking about how it’s actually done vs. how it appears on a sheet of paper. Also known as how it appears written in a standard. So, we’re going to talk about doing it in the real world. At the end of all that, you will be able to feel more confident dealing with multiple different standards.

The focus is not on this military standard, but on understanding the process. The fundamentals of what we’re trying to achieve and why. Then you will be able to extrapolate those principles to other standards. And that should help you to understand whatever it is you’re dealing with. It doesn’t have to be Mil. Standard 882E.

Contents of this Session

We’ve got four sets of contents in the session. First of all, I’m going to talk about the applicability of Military Standard 882E. From the standard itself and the tasks (you’ll see why that’s important) to understanding what you’re supposed to do. Then other standards later on. I’m going to talk about those general requirements that the standard places on us to do the work. A big part of that is looking at a process following the eight elements. And finally, we will apply that theory of how the process should work to the real world. And that will include learning some real-world lessons. You should find these useful for all standards and all circumstances.

So, it just remains for me to say thank you very much for listening. You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Mil-Std-882E Safety Analysis System Safety

How to Understand Safety Standards

Learn How to Understand Safety Standards with this FREE session from The Safety Artisan.

In this module, Understanding Your Standard, we’re going to ask the question: Am I Doing the Right Thing, and am I Doing it Right? Standards are commonly used for many reasons. We need to understand our chosen system safety engineering standard, in order to know: the concepts, upon which it is based; what it was designed to do, why and for whom; which kinds of risk it addresses; what kinds of evidence it produces; and it’s advantages and disadvantages.

Understand Safety Standards : You’ll Learn to

  • List the hazard analysis tasks that make up a program; and
  • Describe the key attributes of Mil-Std-882E. 
Understanding Your Standard

Topics:  Understand Safety Standards

Aim: Am I Doing the Right Thing, and am I Doing it Right?

  • Standards: What and Why?
  • System Safety Engineering pedigree;
  • Advantages – systematic, comprehensive, etc:
  • Disadvantages – cost/schedule, complexity & quantity not quality.

Transcript: Understand Safety Standards

Click here for the Transcript on Understanding Safety Standards

In Module Three, we’re going to understand our Standard. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we’re going to ask ‘Are we doing the right thing?’ And ‘Are we doing it right?’

What and Why?

So, what are we going to do, and why are we doing it? First of all, the use of standards in safety is very common for lots of reasons. It helps us to have confidence that what we’re doing is good enough. We’ve met a standard of performance in the absolute sense. It helps us to say, ‘We’ve achieved standardization or commonality in what we’re doing’. And we can also use it to help us achieve a compromise. That can be a compromise across different stakeholders or across different organizations. And standardization gives us some of the other benefits as well. If we’re all doing the same thing rather than we’re all doing different things, it makes it easier to train staff. This is one example of how a standard helps.

However, we need to understand this tool that we’re going to use. What it does, what it’s designed to do, and what it is not designed to do. That’s important for any standard or any tool. In safety, it’s particularly important because safety is in many respects intangible. This is because we’re always looking to prevent a future problem from occurring. In the present, it’s a little bit abstract. It’s a bit intangible. So, we need to make sure that in concept what we’re doing makes sense and is coherent. That it works together. If we look at those five bullet points there, we need to understand the concept of each standard. We need to understand the basis of each one.

And they’re not all based on the same concept. Thus some of them are contradictory or incompatible. We need to understand the design of the standard. What the standard does, what the aim of the standard is, why it came into existence. And who brought it into existence. To do what for who – who’s the ultimate customer here?

And for risk analysis standards, we need to understand what kind of risks it addresses. Because the way you treat a financial risk might be very different from a safety risk. In the world of finance, you might have a portfolio of products, like loans. These products might have some risks associated with them. One or two loans might go bad and you might lose money on those. But as long as the whole portfolio is making money that might be acceptable to you. You might say, ‘I’m not worried about that 10% of my loans have gone south and all gone wrong. I’m still making plenty of profit out of the other 90%’. It doesn’t work that way with safety. You can’t say ‘It’s OK that I’ve killed a few people over here because all this a lot over here are still alive!’. It doesn’t work like that!

Also, what kind of evidence does the standard produce? Because in safety, we are very often working in a legal framework that requires us to do certain things. It requires us to achieve a certain level of safety and prove that we have done so. So, we need certain kinds of evidence. In different jurisdictions and different industries, some evidence is acceptable. Some are not. You need to know which is for your area.

And then finally, let’s think about the pros and cons of the standard, what does it do well? And what does it do not so well?

System Safety Pedigree

We’re going to look at a standard called Military Standard 882E. Many decades ago, this standard developed was created by the US government and military to help them bring into service complex-cutting edge military equipment. Equipment that was always on the cutting edge. That pushed the limits of what you could achieve in performance.

That’s a lot of complexity. Lots of critical weapon systems, and so forth. And they needed something that could cope with all that complexity. It’s a system safety engineering standard. It’s used by engineers, but also by many other specialists. As I said, it’s got a background from military systems. These days you find these principles used pretty much everywhere. So, all the approaches to System Safety that 882 introduced are in other standards. They are also in other countries.

It addresses risks to people, equipment, and the environment, as we heard earlier. And because it’s an American standard, it’s about system safety. It’s very much about identifying requirements. What do we need to happen to get safety? To do that, it produces lots of requirements. It performs analyses in all those requirements and generates further requirements. And it produces requirements for test evidence. We then need to fulfill these requirements. It’s got several important advantages and disadvantages. We’re going to discuss these in the next few slides.

Comprehensive Analysis

Before we get to that, we need to look at the key feature of this standard. The strengths and weaknesses of this standard come from its comprehensive analysis. And the chart (see the slide) is meant to show how we are looking at the system from lots of different perspectives. (It’s not meant to be some arcane religious symbol!) So, we’re looking at a system from 10 different perspectives, in 10 different ways.

Going around clockwise, we’ve got these ten different hazard analysis tasks. First of all, we start off with preliminary hazard identification. Then preliminary hazard analysis. We do some system requirements hazard analysis. So, we identify the safety requirements that the system is going to meet so that we are safe. We look at subsystem and system hazard analysis. At operating and support hazard analysis – people working with the system. Number seven, we look at health hazard analysis – Can the system cause health problems for people? Functional hazard analysis, which is all about what it does. We’re thinking of sort of source software and data-driven functionality. Maybe there’s no physical system, but it does stuff. It delivers benefits or risks. System of systems hazard analysis – we could have lots of different and/or complex systems interacting. And then finally, the tenth one – environmental hazard analysis.

If we use all these perspectives to examine the system, we get a comprehensive analysis of the system. From this analysis, we should be confident that we have identified everything we need to. All the hazards and all the safety requirements that we need to identify. Then we can confidently deliver an appropriate safe system. We can do this even if the system is extremely complex. The standard is designed to deal with big, complex cutting-edge systems.

Advantages #1

In fact, as we move on to advantages, that’s the number one advantage of this standard. If we use it and we use all 10 of those tasks, we can cope with the largest and the most demanding programs. I spent much of my career working on the Eurofighter Typhoon. It was a multi-billion-dollar program. It cost hundreds of billions of dollars, four different nations worked together on it. We used a derivative of Mil. Standard 882 to look at safety and analyze it. And it coped. It was powerful enough to deal with that gigantic program. I spent 13 years of my life on and off on that program so I’d like to think that I know my stuff when we’re talking about this.

As we’ve already said, it’s a systematic approach to safety. Systems, safety, engineering. And we can start very early. We can start with early requirements – discovery. We don’t even need a design – we know that we have a need. So we can think about those needs and analyze them.

And it can cover us right through until final disposal. And it covers all kinds of elements that you might find in a system. Remember our definition of ‘system’? It’s something that consists of hardware, software, data, human beings, etc. The standard can cope with all the elements of a system. In fact, it’s designed into the standard. It was specifically designed to look at all those different elements. Then to get different insights from those elements. It’s designed to get that comprehensive coverage. It’s really good at what it does. And it involves, not just engineers, but people from all kinds of other disciplines. Including operators, maintainers, etc, etc.

I came from a maintenance background. I was either directly or indirectly supporting operators. I was responsible for trying to help them get the best out of their system. Again, that’s a very familiar world to me. And rigorous standards like this can help us to think rigorously about what we’re doing. And so get results even in the presence of great complexity, which is not always a given, I must say.

So, we can be confident by applying the standard. We know that we’re going to get a comprehensive and thorough analysis. This assures us that what we’re doing is good.

Advantages #2

So, there’s another set of advantages. I’ve already mentioned that we get assurance. Assurance is ‘justified confidence’. So we can have high confidence that all reasonably foreseeable hazards will be identified and analyzed. And if you’re in a legal jurisdiction where you are required to hit a target, this is going to help you hit that target.

The standard was also designed for use in contracts. It’s designed to be applied to big programs. We’d define that as where we are doing the development of complex high-performance systems. So, there are a lot of risks. It’s designed to cope with those risks.

Finally, the standard also includes requirements for contracting, for interfaces with other systems, for interfaces with systems engineering. This is very important for a variety of disciplines. It’s important for other engineering and technical disciplines. It’s important for non-technical disciplines and for analysis and recordkeeping. Again, all these things are important, whether it is for legal reasons or not. We need to do recordkeeping. We need to liaise with other people and consult with them. There are legal requirements for that in many countries. This standard is going to help us do all those things.

But, of course, in a standard everything has pros and cons and Mil. Standard 882 is no exception. So, let’s look at some of the disadvantages.

Disadvantages #1

First of all, a full system safety program might be overkill for the system that you want to use, or that you want to analyze.  The Cold War, thank goodness, is over; generally speaking, we’re not in the business of developing cutting-edge high-performance killing machines that cost billions and billions of dollars and are very, very risky. These days, we tend to reduce program risk and cost by using off-the-shelf stuff and modifying it. Whether that be for military systems, infrastructure in the chemical industry, transportation, whatever it might be. Very much these days we have a family of products and we reuse them in different ways. We mix and match to get the results that we want.

And of course, all this comprehensive analysis is not cheap and it’s not quick. It may be that you’ve got a program that is schedule-constrained. Or you want to constrain the cost and you cannot afford the time and money to throw a full 882 program at it. So, that’s a disadvantage.

The second family of problems is that these kinds of safety standards have often been applied prescriptively. The customer would often say, ‘Go away and go and do this. I’m going to tell you what to do based on what I think reduces my risk’. Or at least it covers their backside. So, contractors got used to being told to do certain things by purchasers and customers. The customers didn’t understand the standards that they were applying and insisting upon. So, the customers did not understand how to tailor a safety standard to get the result that they wanted. So they asked for dumb things or things that didn’t add value. And the contractors got used to working in that kind of environment. They got used to being told what to do and doing it because they wouldn’t get paid if they didn’t. So, you can’t really blame them.

But that’s not great, OK? That can result in poor behaviors. You can waste a lot of time and money doing stuff that doesn’t actually add value. And everybody recognizes that it doesn’t add value. So you end up bringing the whole safety program into disrepute and people treat it cynically. They treat it as a box-ticking exercise. They don’t apply creativity and imagination to it. Much less determination and persistence. And that’s what you need for a good effective system safety program. You need creativity. You need imagination. You need people to be persistent and dedicated to doing a good job. You need that rigor so that you can have the confidence that you’re doing a good job because it’s intangible.

Disadvantages #2

Let’s move onto the second kind of family of disadvantages. And this is the one that I’ve seen the most, actually, in the real world. If you do all 10 tasks and even if you don’t do all 10, you can create too many hazards. If you recall the graphic from earlier, we have 10 tasks. Each task looks at the system from a different angle. What you can get is lots and lots of duplication in hazard identification. You can have essentially the same hazards identified over and over again in each task. And there’s a problem with that, in two ways.

First of all, quality suffers. We end up with a fragmented picture of hazards. We end up with lots and lots of hazards in the hazard log, but not only that. We get fragments of hazards rather than the real thing. Remember I said those tests for what a hazard really is? Very often you can get causes masquerading as hazards. Or other things that that exacerbating factors that make things worse. They’re not a hazard in their own right, but they get recorded as hazards. And that problem results in people being unable to see the big picture of risk. So that undermines what we’re trying to do. And as I say, we get lots of things misidentified and thrown into the pot. This also distracts people. You end up putting effort into managing things that don’t make a difference to safety. They don’t need to be managed. Those are the quality problems.

And then there are quantity problems. And from personal experience, having too many hazards is a problem in itself.  I’ve worked on large programs where we were managing 250 hazards or thereabouts. That is challenging even with a sizable, dedicated team. That is a lot of work in trying to manage that number of hazards effectively. And there’s always the danger that it will slide into becoming a box-ticking exercise. Superficial at best.

I’ve also seen projects that have two and a half thousand hazards or even 4000 hazards in the hazard log. Now, once you get up to that level, that is completely unmanageable. People who have thousands of hazards in a hazard log and they think they’re managing safety are kidding themselves. They don’t understand what safety is if they think that’s going to work. So, you end up with all these items in your hazard log, which become a massive administrative burden. So people end up taking shortcuts and the real hazards are lost. The real issues that you want to focus on are lost in the sea of detail that nobody will ever understand. You won’t be able to control them.

Unfortunately, Mil. Standard 882 is good at generating these grotesque numbers of hazards. If you don’t know how to use the standard and don’t actively manage this issue, it gets to this stage. It can go and does go, badly wrong. This is particularly true on very big programs. And you really need clarity on big projects.

Summary of Module

Let’s summarize what we’ve done with this module. The aim was to help us understand whether we’re doing the right thing and whether we’ve done it right. And standards are terrific for helping us to do that. They help us to ensure we’re doing the right thing. That we’re looking at the right things. And they help us to ensure that we’re doing it rigorously and repeatedly. All the good quality things that we want. And Mil. Standard 882E that we’re looking at is a system safety engineering standard. So it’s designed to deal with complexity and high-performance and high-risk. And it’s got a great pedigree. It’s been around for a long time.

Now that gives advantages. So, we have a system safety program with this standard that helps us to deal with complexity. That can cope with big programs, with lots of risks. That’s great.

The disadvantages of this standard are that if we don’t know how to tailor or manage it properly, it can cost a lot of money. It can take a lot of time to give results which can cause problems for the program. And ultimately, you can accidentally ignore safety if you don’t deliver on time. And it can generate complexity. And it can generate a quantity of data that is so great that it actually undermines the quality of the data. It undermines what we’re trying to achieve. In that, we get a fragmented picture in which we can’t see the true risks. And so we can’t manage them effectively. If we get it wrong with this standard, we can get it really wrong. And that brings us to the end of this module.

This is Module 3 of SSRAP

This is Module 3 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.

Categories
Mil-Std-882E Safety Analysis

System Safety Risk Assessment

Learn about System Safety Risk Assessment with The Safety Artisan.

In this module, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

You Will Learn to:

  • Explain what a system safety approach is and does; and
  • Define what a risk analysis program is; 
System Safety Risk Analysis.

Topics: System Safety Risk Assessment

Aim: How do we deal with real-world complexity?

  • What is System Safety?
  • The Need for Process;
  • A Realistic, Useful, Powerful process:
    • Context, Communication & Consultation; and
    • Monitoring & Review, Risk Treatment.
  • Required Risk Reduction.

Transcript: System Safety Risk Assessment

Click here for the Transcript on System Safety Risk Assessment

In this module, on System Safety Risk Assessment, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

What is System Safety?

To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context. This wider context is operational effectiveness – We want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.

And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work with the intention to produce specified results.

We’ve got three things there. We’ve got a system. We’ve got the operating environment within which it works- or designed to work. And we have the thing that it’s supposed to produce; its function or its application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?

That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.

We Need A Process

We’ve sorted our context. How is all this going to happen? We need a process. In the standard that we’re going to look at in the next module, we have an eight-element process. As you can see there, we start with documenting our approach. Then we identify and document hazards. We document everything according to the standard so forget that.

We assess risk. We plan how we’re going to mitigate the risk. We identify risk mitigation measures or controls as there are often known. Then we apply those controls to reduce risk. We verify and confirm that the risk reduction that we have achieved, or that we believe we will achieve. And then we got to get somebody to accept that risk. In other words, to say that it is an acceptable level of risk. That we can put up with this level of risk in exchange for the benefits that the system is going to give us. Finally, we need to manage risk through the entire lifecycle of the system until we finally get rid of it.

The key point about this is whatever process we follow, we need to approach it with rigor. We stick to a systematic process. We take a structured and rigorous approach to looking at our system.

And as you can see there from the arrows, every step in the eight-element sequence flows into the next step. Each step supports and enables the following steps. We document the results as we go. However, even this example is a little bit too simple.

A More Realistic Process

So, let’s get a more realistic process. What we’ve got here are the same things we’ve had before. We’ve established the context at the beginning. Next, there’s risk assessment. Risk assessment consists of risk identification, risk analysis, and risk evaluation. It asks ‘Where are we?’ in relation to a yardstick or framework that categorizes risk. The category determines whether a risk is acceptable or not.

After determining whether the risk is acceptable or not, we may need to apply some risk treatment. Risk Treatment will reduce the risk further. By then we should have the risk down to an acceptable level.

So, that’s the straight-through process, once through. In the real world, we may have to go around this path several times. Having treated the risk over a period of time, we need to monitor and review it. We need to make sure that the risk turns out, in reality, to be what we estimated it to be. Or at least no worse. If it turns out to be better- Well, that’s great!

And on that monitoring and review cycle, maybe we even need to go back because the context has changed. These changes could include using the system to do something it was not designed to do. Or modifying the system to operate in a wider variety of environments. Whatever it might be, the context has changed. So, we need to look again at the risk assessment and go round that loop again.

And while we’re doing all that, we need to communicate with other people. These other people include end-users, stakeholders, other people who have safety responsibilities. We need to communicate with the people who we have to work with. And we have to consult people. We may have to consult workers. We may have to consult the public, people that we put at risk, other duty holders who hold a duty to manage risk. That’s our cycle. That’s more realistic. In my experience as a safety engineer, this is much more realistic. A once-through process often doesn’t cut it.

Required Risk Reduction

We’re doing all this to drive risk down to an acceptable level. Well, what do we mean by that? Well, there are several different ways that we can do this, and I’ve got to illustrate it here. On the left-hand side of the slide, we have what’s usually known as the ALARP triangle. It’s this thing that looks a bit like a carrot where the width of the triangle indicates the amount of risk. So, at the top of the triangle, we’ve got lots of risks. And if you’re in the UK or Australia where I live, this is the way it’s done. So there will be some level of risk that is intolerable. Then if the risk isn’t intolerable, we can only tolerate it or accept it if it is ALARP or SFARP. And ALARP means that we’ve reduced the risk as low as reasonably practicable. And SFARP means so far as is reasonably practicable. Essentially, they’re the same thing – reasonably practical.

We must ensure that we have applied all reasonably practicable risk reduction measures. And once we’ve done so, if we’re in this tolerable or acceptable region, then we can live with the risk. The law allows us to do that.

That’s how it’s done in the UK and Australia. But in other jurisdictions, like the USA, you might need to use a different approach. A risk matrix approach as we can see on the right-hand side of this slide. This particular risk matrix is from the standard we’re about to look at. And we could take that and say, ‘We’ve determined what the risk is. There is no absolute limit on how much risk we can accept. But the higher the risk, the more senior level of sign-off from management we need’. In effect, you are prioritizing the risk. So you only bring the worst risks to the attention of senior management. You are asking  ‘Will you accept this? Or are you prepared to spend the money? Or will you restrict the operational system to reduce the risk?’. This is good because it makes people with authority consider risks. They are responsible and need to make meaningful decisions.

In short, different approaches are legal in different jurisdictions.

Summary of Module

In Module Two, we’ve asked ourselves, ‘How can we deal with real-world complexity?’. And one way that’s developed to do that is System Safety. System Safety is where we take a systematic approach to safety. This approach applies to both the system itself – the product – and the process of System Safety.

We address product and process. We need that rigorous process to give us confidence that what we’ve done is good enough. We have a realistic, useful and powerful process that enables us to put things in context. It helps us to communicate with everyone we need to, to consult with those that we have a duty to consult with. And also, we put around the basic risk process, this monitoring and review. And of course, we analyze risk to reduce it to acceptable levels. So we’ve got to treat the risk or reduce it or control it in some way to get it to those acceptable levels. In the end, it’s all about getting that required risk reduction to work. That reduction makes the risk acceptable to expose human beings to, for the benefit that it will give us.

This is Module 2 of SSRAP

This is Module 2 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.