Categories
Mil-Std-882E Safety Analysis System Safety

System Requirements Hazard Analysis

In this 45-minute session, I’m looking at System Requirements Hazard Analysis, or SRHA, which is Task 203 in the Mil-Std-882E standard. I will explore Task 203’s aim, description, scope, and contracting requirements.  SRHA is an important and complex task, which needs to be done on several levels to be successful.  This video explains the issues and discusses how to perform SRHA well.

This is the seven-minute demo video, the full version is 40 minutes’ long.

Topics: System Requirements Hazard Analysis

  • Task 202 Purpose;
  • Task Description:
    • Determine Requirements;
    • Incorporate Requirements; and
    • Assess the compliance of the System.
  • Contracting;
  • Section 4.2 (of the standard); and
  • Commentary.
Transcript: System Requirements Hazard Analysis

Introduction

Hello and welcome to the Safety Artisan, where you will find professional, pragmatic and impartial advice on all things system, safety and related.

System Requirements Hazard Analysis

And so today, which is the 1st of March 2020, we’re going to be talking about – let me just find it for you – we’ll be talking about system requirements, hazard analysis. And this is part of our series on Mil. Standard 882E (882 Echo) and this one a task 203. Task 203 in the Mil. standard. And it’s a very widely used system safety engineering standard and its influence is found in many places, not just on military procurement programs.

Topics for this Session

We’re going to look at this task, which is very important, possibly the most important task of all, as we’ll see. so in to talk about the purpose of the task, which is word for word from the task description itself. We’re going to talk about in the task description, the three aims of this task, which is to determine or work out requirements, incorporate them, and then assess the compliance of the system with those requirements, because, of course, it may not be a simple read-across. We’ve got six slides on that. That’s most of the task. Then we’ve just got one slide on contracting, which if you’ve seen any of the others in this series, will seem very familiar. We’ve got a little bit of a chat about Section 4.2 from the standard and some commentary, and the reason for that will become clear. So, let’s crack on.

System Requirements Hazard Analysis

Task 203.1, the purpose of Task 203 is to perform and document a System Requirements Hazard Analysis or SRHA. And as we’ve already said, the purpose of this is to determine the design requirements. We’re going to focus on design rather than buying stuff off the shelf – we’ll talk about the implications of that a little bit later. Design requirements to eliminate or reduce hazards and risks, incorporate those requirements, into a says, into the documentation, but what it should say is incorporate risk reduction measures into the system itself and then document it. And then finally, to assess compliance of the system with these requirements. Then it says the SRHA address addresses all life-cycle phases, so not just meant for you to think about certain phases of the program. What are the requirements through life for the system? And in all modes. Whether it’s in operation, whether it’s in maintenance or refit, whether it’s being repaired or disposed of, whatever it might be.

Task Description #1

First of six slides on the task description. I’m using more than one colour because there’s some quite a lot of important points packed quite tightly together in this description. We’re assuming that the contractor performs and documents this SRHA. The customer needs to do a lot of work here before ever gets near a contractor. More on that later. We need to determine system design requirements to eliminate hazards or reduce associated risks.

Two things here. By identifying applicable policies, regulations and standards etc. More on that later. And analysing identified hazards. So, requirements to perform the analysis as well as to simply just state ‘We want a system to do this and not to do that’. So, we need to put some requirements to say ‘Here’s what we want analysed maybe to what degree? And why.’ is always helpful.

Task Description #2

Breaking those breaking those two requirements down.

Part a. We’re going to identify applicable requirements by reviewing our military and industry standards and specs, historical documentation of systems that are similar or with a system that we’re replacing, perhaps. Look at, it’s assumed that the US Department of Defense is the customer, ultimate customer. So, the ultimate customer’s requirements, including whatever they’ve said about standard ways of mitigating certain common risks. System performance spec, that’s your functional performance spec or whatever you want to call it. Other system design requirements and documents- Bit of a catchall there. And applicable federal, military, state and local regulations. This is a US standard. It’s a federated system, much like Australia or indeed lots of modern states, even the UK. There are variations in law across England, Wales, Scotland and Ireland. They’re not great, but they do exist. And in the US and Australia, those differences are greater. And it says applicable executive orders. Executive orders, they’re not law, but they are what the executive arm of the U.S. government has issued, and international agreements. A lot of words in there- have a look at the different statements that are in that in white, blue and yellow. Basically, from international agreements right down to whatever requirements may be applicable, they all need to be looked at and taken account of. So, there’s a huge amount of work there for someone to do. I’ll come back to who that someone should be later.

End

Well, that is the end of the presentation. And it just remains for me to say thanks again for watching and do lookout for the next sessions in the series on 882 echo (882E). There are quite a few to go. We’re going to go through all the tasks and the general and specific requirements of the standard and the appendices. We will also talk about more advanced topics, about how we manage and apply all this stuff.

So, from The Safety Artisan, thanks very much and goodbye.

End: System Requirements Hazard Analysis

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Blog Mil-Std-882E Safety Analysis

How to do Preliminary Hazard Analysis

In this 45-minute session, The Safety Artisan looks at how to do Preliminary Hazard Analysis, or PHA, which is Task 202 in Mil-Std-882E. We explore Task 202’s aim, description, scope, and contracting requirements. We also provide value-adding commentary and explain the issues with PHA – how to do it well and avoid the pitfalls.

This is the seven-minute-long demo video. The full video is 45 minutes long.

Topics: How to do Preliminary Hazard Analysis

  • Task 202 Purpose;
  • Task Description;
  • Recording & Scope;
  • Risk Assessment (Tables I, II & III);
  • Risk Mitigation (order of preference);
  • Contracting; and
  • Commentary.

Transcript: How to do Preliminary Hazard Analysis

Transcript: Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you’ll find professional, pragmatic and impartial safety training resources. So, we’ll get straight on to our session and it is the 8th February 2020. 

Preliminary Hazard Analysis

Now we’re going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It’s very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It’s one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I’m doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording and scope. How we do risk assessments against Tables 1, 2 and 3. Basically, it is severity, likelihood and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting and then a short commentary from myself. In fact, I’m providing commentary all the way through. So, let’s crack on.

Task 202 Purpose

The purpose of Task 202, as it says, is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks and identify potential mitigation measures. We’re going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I’ve highlighted some stuff that I particularly want to talk about.

It says “the contractor” [does this or that], but it doesn’t really matter who is doing the analysis, and actually, the customer needs to do some to inform themselves, otherwise they won’t really understand what they’re doing.  Whoever does it needs to perform and document PHA. It’s about determining initial risk assessments. There’s going to be more work, more detailed work done later. But for now, we’re doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions that we’re proposing to introduce. That’s very important. We don’t need a design to do this. We can get in early when we have user requirements, functional requirements, that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that’s another big issue. We’ll talk about that more later. It says including mishap data as well, if accessible: American term mishap, it means accident, but we’re avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, whatever. It’s a mishap. It’s an undesirable event. We look for accessible data from similar systems, legacy systems and other lessons learned. I’ve talked about that a little bit in Task 201 lesson about that, and there’s more on that today under commentary. We need to look at provisions, alternatives, meaning design provisions and design alternatives in order to reduce risks and adding mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What’s the task description? That’s a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we’ve got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call hazard tracking system; we might call it hazard log; we might call it a risk register. It doesn’t really matter what it’s called. The key point is it’s a tracking system. It’s a live document, as people say, it’s a spreadsheet or a database, something like that. It’s something relatively easy to update and change. And, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and to refine them and change them as time goes on. Very important point.

That’s it for the Demo…

End: How to do Preliminary Hazard Analysis

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Mil-Std-882E Safety Analysis software safety

Functional Hazard Analysis

In this full-length (40-minute) session, The Safety Artisan looks at Functional Hazard Analysis, or FHA, which is Task 208 in Mil-Std-882E. FHA analyses software, complex electronic hardware, and human interactions. We explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (We refer to other lessons for special techniques for software safety and Human Factors.)

Functional Hazard Analysis: Context

So how do we analyze software safety?

Before we even start, we need to identify those system functions that may impact safety. We can do this by performing a Functional Failure Analysis (FFA) of all system requirements that might credibly lead to human harm.

An FFA looks at functional requirements (the system should do ‘this’ or ‘that’) and examines what could go wrong. What if:

  • The function does not work when needed?
  • The function works when not required?
  • The function works incorrectly? (There may be more than one version of this.)

(A variation of this technique is explained here.)

If the function could lead to a hazard then it is marked for further analysis. This is where we apply the FHA, Task 208.

Functional Hazard Analysis: The Lesson

This is the seven-minute demo; the full version is 40 minutes long.

Topics: Functional Hazard Analysis

  • Task 208 Purpose;
  • Task Description;
  • Update & Reporting
  • Contracting; and
  • Commentary.

Transcript: Functional Hazard Analysis

Click here for the Transcript

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyse the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is Task 208 in Mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from Task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? And in classic 882 style, Task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, it’s used to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. The truth is in the real world, that lots of software-intensive systems have been involved in accidents that have killed lots of people, even when they’re functioning as intended. That’s one of the short-sightedness of this Mil. Standard is that it focuses on failure. The idea that if something is performing as specified, that either the specification might be wrong or there might be some disconnect between what the system is doing and what the human expects – The way the standard is written just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity – severity only, we’ll come back to that – for the purpose of identifying what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sort of a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behaviour over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people is the short answer.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this – the assumption in this task is that we’re doing early – We’ll see that later – and that system design, system architecture, is still up for grabs. We can still influence it. Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life – But we’ll talk about how you deal with the realities later. And they’re allocating these functions and these items of interest to hardware, software and human interfaces. And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those type of simple components that are only really subject to random failure, that’s not what we’re talking about here. We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide on purpose; so we use the FHA to identify consequences of malfunction or functional failure, lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a systems engineering process – that’s not always the case. We’ll talk about that at the end as well. And we’re going to identify and document these functions and items and allocate and it says partition them in the software design architecture. When we say partition, that’s jargon for separate them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done.

(End of Transcription for the 7-minute demo.)

Then What?

Once the FFA has identified the required ‘Level or Rigor’, we need to translate that into a suitable software development standard. This might be:

  • RTCA DO-178C (also know as ED-12C) for civil aviation;
  • The US Joint Software System Safety Engineering Handbook (JSSEH) for military systems;
  • IEC 61508 (functional safety) for the process industry;
  • CENELEC-50128 for the rail industry; and
  • ISO 26262 for automotive applications.

Such standards use Safety Integrity Levels (SILs) or Development Assurance Levels (DALs) to enforce appropriate Levels of Rigor. You can learn about those in my course Principles of Safe Software Development.

End

Categories
Blog Safety Analysis

Failure Mode Effects Analysis

TL;DR This article on Failure Mode Effects Analysis explains this powerful and commonly-used family of techniques. It covers:

  • A description of the technique, including its purpose;
  • When it might be used;
  • Advantages, disadvantages and limitations;
  • Sources of additional information;
  • A simple example of an FMEA/FMECA; and
  • Additional comments.

I’ve added some ‘top tips’ of my own based on my personal experience in the industry.

Top Tip

In this article, I have used material from a UK Ministry of Defence guide, reproduced under the terms of the UK’s Open Government Licence.

A Description of the Technique, Including its Purpose

Failure modes and effects analysis (FMEA) was one of the first systematic techniques for failure analysis. It was developed in the United States military (Military Procedure MIL-P-1629, titled ‘Procedures for Performing a Failure Modes, Effects and Criticality Analysis’, November 9, 1949) as a reliability evaluation technique to determine the effect of system and equipment failures. Failures were classified according to their impact on mission success and personnel, equipment and safety. In the 1960’s it was used by the aerospace industry and NASA during the Apollo program. More and more industries – notably the automotive industry – have seen the benefits to be gained by using FMEAs to complement their design processes.

This qualitative technique helps identify failure potential in a design or process i.e. to foresee failure before it actually happens. This is done by defining the system which is under consideration to ensure system boundaries are established and then by following a procedure, which helps to identify design features or process operations that could fail. The procedure requires the following essential questions to be asked:

  • How can each component fail?
  • What might cause these modes of failure?
  • What could the effects be if these failures did occur?
  • How serious are these failure modes?
  • How is each failure mode detected?
  • What are the safeguards in place to protect against accidents resulting from the failure mode?

As always with safety analyses, the more precisely you can answer these questions (above), the better the results you will get.

Top Tip

As an aid in structuring the analysis and ensuring a systematic approach, results are recorded in a tabular format. Several different forms are in use, and the form design can be tailored-made to suit the particular requirements of a study. Examples of forms can be found in several standards (links below).

Make the form support the flow of the process, left-to-right, then top-down!

Top Tip

The FMEA analysis can be extended if necessary by characterizing the likelihood, severity and resulting levels of risk of failures. FMEAs that incorporate this criticality analysis (CA) are known as FMECAs. A FMECA is an analytical quantitative technique, which ranks failure modes according to their probability and consequences (i.e. the resulting effect of the failure mode on the system, mission and personnel). It is referred to as a “bottom-up approach” as it starts by identifying the potential failure modes of a component and analyzing their effects on the whole system. It can be quite complex depending on how the user drives the technique.

It is important to note that the FMECA does not provide a model by which system reliability can be quantified. Hence, if the objective is to estimate the probability of events, a technique that results in a logic model of the failure mechanisms must be employed, typically a fault tree and/or an event tree.

Reliability Block Diagrams, or for repairable systems, Markov Chains can also be used.

Top Tip

A FMEA or FMECA can be conducted on either a component or a functional level. A functional FMEA/FMECA only covers hardware aspects but a functional FMEA/FMECA can cover all aspects of a system. For either approach, the general principle remains the same.

When it Might be Used

FMEA is applicable for any well-defined system but is primarily used for reviews of mechanical and electrical systems. It can be used in many situations, for example, to assess the design of a product in terms of what could go wrong in manufacturing and in-service as a result of the weakness in the design. It can also be used to analyze failures in the manufacturing process itself and during service. It is effective for collecting information needed to troubleshoot system problems and improving maintenance and reliability of plant and equipment (defining and optimizing) as it focuses directly and individually on equipment failure modes.

It’s fair to say that you need a design, on which to perform a FMEA. Pre-design you could use Functional Failure Analysis (FFA) instead.

Top Tip

The FMECA technique is best suited for detailed analysis of system hardware, and should preferably be carried out by the designer in parallel with system development. This will not only speed up the analysis itself, but also force the design team to think systematically about the failure characteristics of the system. The primary use of the FMECA is in verifying that single component failures cannot cause a catastrophic system failure.

There are a number of areas today in which the use of FMECA has become mandatory to demonstrate system reliability. Examples of such requirements are in the classification of Dynamically Positioned (DP) vessels and in a number of US military applications for which MIL-STD documents apply.

Advantages, Disadvantages and Limitations

Advantages

  • It is widely-used and well-understood, and easy to understand and interpret
  • It can be performed by a single analyst, or more if required
  • Qualitative data about the causes and effects can be incorporated into the analysis
  • It is systematic and comprehensive, and should identify hazards with an electrical or mechanical basis
  • The level of detail incorporated can be varied to suit the analysis
  • It identifies safety-critical equipment where a single failure would be critical for the system
  • Even though the technique can be quite time consuming it can lead to a thorough understanding of the system being considered

Disadvantages

  • The technique adopts a bottom-up approach and if conducting a component level FMEA or FMECA this can be boring and repetitive
  • The benefit gained is dependent upon the experience of the analyst or the group.
  • It requires a hierarchical system drawing as the basis for the analysis, which the analyst usually has to develop before the FMEA process can start
  • It is optimised for mechanical and electrical equipment, and does not apply easily to Human Factor Integration, procedures or process equipment
  • It is difficult for the technique to cover multiple failures as equipment failures are generally analysed one by one therefore important combinations of equipment failures may be overlooked
  • Most accidents have a significant human or external influence contribution and these are not a usual failure mode with FMEA
  • More than one FMEA may be required for a system with multiple modes of operation
  • Due to its wide use there can be temptation to read across data from ARM or ILS projects where, for example, the fault-tree technique has been used. As a consequence, the safety perspective can be lost as human error has been excluded and the focus has been solely on determining faults and on not on more far-reaching safety issues
  • Perhaps the worst drawback of the technique is that all component failures are examined and documented, including those, which do not have any significant consequences.
  • For large systems, especially those with a fair degree of redundancy built into them, the amount of unnecessary documentation is a major disadvantage. Hence, the FMECA should primarily be used by designers of reasonably simple systems. It should however be noted that the concept of the FMECA form can be quite useful in other contexts, e.g. when reviewing an operation rather than a hardware system. Then the use of a form similar to the FMECA can provide a useful way of documenting the analysis. Suitable columns in the form could for example include; operation, deviation, consequence, correcting or reversing action, etc.

ARM = Availability, Reliability, Maintainability

ILS = Integrated Logistic Support (or logistics engineering

Top Tip

Sources of Additional Information, such as Standards, Textbooks and Websites

BS 5760: Part 5 Reliability of Systems, Equipment and Components: Part 5 Guide to Failure Modes, Effects and Criticality Analysis.

HSE Website – Marine Risk Assessment, Offshore Technology Report 2001/063

IEC 60812:2018 Failure modes and effects analysis (FMEA and FMECA)

As always, Understand your Standard (what it was desinged to do) to get the best out of it!

Top Tip

A Simple Example of an FMEA/FMECA

An example extract from an FMEA of a ballast system is shown below. This can be found in the HSE Marine Risk Assessment Report. The column headings are based on the US Military Standard Mil-Std 1629A, but with modifications to suit the particular application. For example, the failure mode and cause columns are combined. The criticality of each failure is ranked as minor, incipient, degraded, or critical.

An example of a FMEA Output Table

To properly understand these results you need to know how a Sea Chest works (see context here). Otherwise the example just shows what kind of output a FMEA can produce.

Top Tip

Additional comments

Failure Modes and Effects and Criticality Analysis (FMECA) is an analytical QRA technique, used by ARM and ILS systems engineers, most commonly and effectively at the late design, test and manufacture stage of a project. It requires the breakdown of the system into individual components and the identification of possible failure modes or malfunctions of each component, (such as too much flow through a valve). Referred to as a bottom-up approach, it starts by identifying the potential failure modes of a component and analyzing their potential effects on the whole system. Numerical levels can be assigned to the likelihood of the failure and the severity or consequence of the failure.

Note: It is important to recognize that FMEA/FMECA Standards have different approaches to criticality. Failure mode severity classes 1 – 5 for Standards MIL1629A and ARP926A go from Class 1 being the most severe (e.g. loss of life) to Class 5 being less severe (i.e. no effect), whereas BS 5760 deals with criticality in the opposite direction where Class 5 is the most severe.

Note that FMECA for ARM/ILS looks at availability or mission criticality, not safety criticality.  A FMECA for safety will have a different focus.

Top Tip

Software:

  • Isograph;
  • Reliability Work Bench;
  • Reliasoft;
  • Microsoft Excel.

These are not recommendations!

FMEA/FMECA tables for complex systems can run to hundreds of pages, so good tool support is essential.

Top Tip

Failure Mode Effects Analysis: have You Used this Technique?

Back to the Safety Assessment topic page.

Categories
Mil-Std-882E Safety Analysis

System Safety Engineering Process

The System Safety Engineering Process – what it is and how to do it.

This is the full-length (50-minute) session on the System Safety Process, which is called up in the general requirements of Mil-Std-882E. I cover the Applicability of Mil-Std-882E tasks, the General Requirements, the Process with eight elements, and the Application of process theory to the real world. 

You Will Learn to:

  • Know the system safety process iaw Mil-Std-882E;
  • List and order the eight elements;
  • Understand how they are applied;
  • Skilfully apply system safety using realistic processes; and
  • Feel more confident dealing with this and other standards.
System Safety Process – this is the free demo.

Topics: System Safety Engineering Process

  • Applicability of Mil-Std-882E tasks;
  • General requirements;
  • Process with eight elements; and
  • Application of process theory to the real world

Transcript: Preliminary Hazard Identification

CLICK HERE for the Transcript

System Safety Process

Hi, everyone, and welcome to the Safety Artisan. I’m Simon, your host. Today I’m going to be using my experience with System Safety Engineering to talk you through the process that we need to follow to achieve success. Because to use a corny saying, ‘Safety doesn’t happen by accident’. Safety is what we call an emergent property. And to get it, we need to decide what we mean by safety, decide what our goals are, and then work out how we’re going to get there. It’s a planned systematic activity. Especially if we’re going to deal with very complex projects or situations. Times where there is a requirement to make that understanding and that planning explicit. Where the requirement becomes the difference between success and failure. Anyway, that’s enough of that. Let’s get on and look at the session.

Military Standard 882E, Section 4 General Requirements

Today we’re talking about System Safety Process. To help us do that, we’re going to be looking at a particular standard – the general requirements of that standard. And those are from Section Four of Military Standard 882E. But don’t get hung up on which standard it is. That’s not the point here. It’s a means to an end. I’ll talk about other standards and how we perform system safety engineering in different domains.

Learning Objectives

Our learning objectives for today are here. In this session, you will learn, or you’ll know, the system safety process in accordance with that Mil. Standard. You will be able to list and order the eight elements of the process. You will understand how to apply the eight elements. And you will be able to apply system safety with some skill using realistic processes. We’re going to spend quite a bit of time talking about how it’s actually done vs. how it appears on a sheet of paper. Also known as how it appears written in a standard. So, we’re going to talk about doing it in the real world. At the end of all that, you will be able to feel more confident dealing with multiple different standards.

The focus is not on this military standard, but on understanding the process. The fundamentals of what we’re trying to achieve and why. Then you will be able to extrapolate those principles to other standards. And that should help you to understand whatever it is you’re dealing with. It doesn’t have to be Mil. Standard 882E.

Contents of this Session

We’ve got four sets of contents in the session. First of all, I’m going to talk about the applicability of Military Standard 882E. From the standard itself and the tasks (you’ll see why that’s important) to understanding what you’re supposed to do. Then other standards later on. I’m going to talk about those general requirements that the standard places on us to do the work. A big part of that is looking at a process following the eight elements. And finally, we will apply that theory of how the process should work to the real world. And that will include learning some real-world lessons. You should find these useful for all standards and all circumstances.

So, it just remains for me to say thank you very much for listening. You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Blog Safety Analysis

Preliminary Hazard Identification & Analysis Guide

Get your free Preliminary Hazard Identification & Analysis, PHIA Guide here!

Introduction

Hazard Identification is sometimes defined as: “The process of identifying and listing the hazards and accidents associated with a system.”

Hazard Analysis is sometimes defined as: “The process of describing in detail the hazards and accidents associated with a system and defining accident sequences.”

Preliminary Hazard Identification and Analysis (PHIA) helps you determine the scope of safety activities and requirements. You can identify the main hazards likely to arise from the capability and functionality being provided. Perform it as early as possible in the project life cycle. Thus, you will provide important early input to setting Safety requirements and refining the Project Safety Plan.

PHIA seeks to answer, at an early stage of the project, the question: “What Hazards and Accidents might affect this system and how could they happen?”

Aim

The aim of the PHIA is to identify, as early as possible, the main Hazards and Accidents that may arise during the life of the system. It provides input to:

  1. Scoping the subsequent Safety activities required in any Safety Plan. A successful PHIA will help to gauge the proportionate effort that is likely to be required to produce an effective Safety Case, proportionate to risks.
  2. Selecting or eliminating options for subsequent assessment.
  3. Setting the initial Safety requirements and criteria.
  4. Subsequent Hazard Analyses.
  5. Initiate Hazard Log.

Description

Perform a PHIA as early as possible in order to obtain maximum benefit. Use it to understand what the Hazards and Accidents are, why, and how they might be realized. A PHIA is an important part of Risk Management, project planning, and requirements definition. It helps you to identify the main system hazards and helps target where a more thorough analysis should be undertaken.

Usually, PHIA is based on a structured brainstorming exercise, supported by hazard checklists. A structured approach helps to minimize the possibility of missing an important hazard. It also demonstrates that a thorough and comprehensive approach has been applied.

Get Your Free PHIA Guide Here!

Find more on basic safety topics at Start Here.

Categories
Mil-Std-882E Safety Analysis System Safety

How to Understand Safety Standards

Learn How to Understand Safety Standards with this FREE session from The Safety Artisan.

In this module, Understanding Your Standard, we’re going to ask the question: Am I Doing the Right Thing, and am I Doing it Right? Standards are commonly used for many reasons. We need to understand our chosen system safety engineering standard, in order to know: the concepts, upon which it is based; what it was designed to do, why and for whom; which kinds of risk it addresses; what kinds of evidence it produces; and it’s advantages and disadvantages.

Understand Safety Standards : You’ll Learn to

  • List the hazard analysis tasks that make up a program; and
  • Describe the key attributes of Mil-Std-882E. 
Understanding Your Standard

Topics:  Understand Safety Standards

Aim: Am I Doing the Right Thing, and am I Doing it Right?

  • Standards: What and Why?
  • System Safety Engineering pedigree;
  • Advantages – systematic, comprehensive, etc:
  • Disadvantages – cost/schedule, complexity & quantity not quality.

Transcript: Understand Safety Standards

Click here for the Transcript on Understanding Safety Standards

In Module Three, we’re going to understand our Standard. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we’re going to ask ‘Are we doing the right thing?’ And ‘Are we doing it right?’

What and Why?

So, what are we going to do, and why are we doing it? First of all, the use of standards in safety is very common for lots of reasons. It helps us to have confidence that what we’re doing is good enough. We’ve met a standard of performance in the absolute sense. It helps us to say, ‘We’ve achieved standardization or commonality in what we’re doing’. And we can also use it to help us achieve a compromise. That can be a compromise across different stakeholders or across different organizations. And standardization gives us some of the other benefits as well. If we’re all doing the same thing rather than we’re all doing different things, it makes it easier to train staff. This is one example of how a standard helps.

However, we need to understand this tool that we’re going to use. What it does, what it’s designed to do, and what it is not designed to do. That’s important for any standard or any tool. In safety, it’s particularly important because safety is in many respects intangible. This is because we’re always looking to prevent a future problem from occurring. In the present, it’s a little bit abstract. It’s a bit intangible. So, we need to make sure that in concept what we’re doing makes sense and is coherent. That it works together. If we look at those five bullet points there, we need to understand the concept of each standard. We need to understand the basis of each one.

And they’re not all based on the same concept. Thus some of them are contradictory or incompatible. We need to understand the design of the standard. What the standard does, what the aim of the standard is, why it came into existence. And who brought it into existence. To do what for who – who’s the ultimate customer here?

And for risk analysis standards, we need to understand what kind of risks it addresses. Because the way you treat a financial risk might be very different from a safety risk. In the world of finance, you might have a portfolio of products, like loans. These products might have some risks associated with them. One or two loans might go bad and you might lose money on those. But as long as the whole portfolio is making money that might be acceptable to you. You might say, ‘I’m not worried about that 10% of my loans have gone south and all gone wrong. I’m still making plenty of profit out of the other 90%’. It doesn’t work that way with safety. You can’t say ‘It’s OK that I’ve killed a few people over here because all this a lot over here are still alive!’. It doesn’t work like that!

Also, what kind of evidence does the standard produce? Because in safety, we are very often working in a legal framework that requires us to do certain things. It requires us to achieve a certain level of safety and prove that we have done so. So, we need certain kinds of evidence. In different jurisdictions and different industries, some evidence is acceptable. Some are not. You need to know which is for your area.

And then finally, let’s think about the pros and cons of the standard, what does it do well? And what does it do not so well?

System Safety Pedigree

We’re going to look at a standard called Military Standard 882E. Many decades ago, this standard developed was created by the US government and military to help them bring into service complex-cutting edge military equipment. Equipment that was always on the cutting edge. That pushed the limits of what you could achieve in performance.

That’s a lot of complexity. Lots of critical weapon systems, and so forth. And they needed something that could cope with all that complexity. It’s a system safety engineering standard. It’s used by engineers, but also by many other specialists. As I said, it’s got a background from military systems. These days you find these principles used pretty much everywhere. So, all the approaches to System Safety that 882 introduced are in other standards. They are also in other countries.

It addresses risks to people, equipment, and the environment, as we heard earlier. And because it’s an American standard, it’s about system safety. It’s very much about identifying requirements. What do we need to happen to get safety? To do that, it produces lots of requirements. It performs analyses in all those requirements and generates further requirements. And it produces requirements for test evidence. We then need to fulfill these requirements. It’s got several important advantages and disadvantages. We’re going to discuss these in the next few slides.

Comprehensive Analysis

Before we get to that, we need to look at the key feature of this standard. The strengths and weaknesses of this standard come from its comprehensive analysis. And the chart (see the slide) is meant to show how we are looking at the system from lots of different perspectives. (It’s not meant to be some arcane religious symbol!) So, we’re looking at a system from 10 different perspectives, in 10 different ways.

Going around clockwise, we’ve got these ten different hazard analysis tasks. First of all, we start off with preliminary hazard identification. Then preliminary hazard analysis. We do some system requirements hazard analysis. So, we identify the safety requirements that the system is going to meet so that we are safe. We look at subsystem and system hazard analysis. At operating and support hazard analysis – people working with the system. Number seven, we look at health hazard analysis – Can the system cause health problems for people? Functional hazard analysis, which is all about what it does. We’re thinking of sort of source software and data-driven functionality. Maybe there’s no physical system, but it does stuff. It delivers benefits or risks. System of systems hazard analysis – we could have lots of different and/or complex systems interacting. And then finally, the tenth one – environmental hazard analysis.

If we use all these perspectives to examine the system, we get a comprehensive analysis of the system. From this analysis, we should be confident that we have identified everything we need to. All the hazards and all the safety requirements that we need to identify. Then we can confidently deliver an appropriate safe system. We can do this even if the system is extremely complex. The standard is designed to deal with big, complex cutting-edge systems.

Advantages #1

In fact, as we move on to advantages, that’s the number one advantage of this standard. If we use it and we use all 10 of those tasks, we can cope with the largest and the most demanding programs. I spent much of my career working on the Eurofighter Typhoon. It was a multi-billion-dollar program. It cost hundreds of billions of dollars, four different nations worked together on it. We used a derivative of Mil. Standard 882 to look at safety and analyze it. And it coped. It was powerful enough to deal with that gigantic program. I spent 13 years of my life on and off on that program so I’d like to think that I know my stuff when we’re talking about this.

As we’ve already said, it’s a systematic approach to safety. Systems, safety, engineering. And we can start very early. We can start with early requirements – discovery. We don’t even need a design – we know that we have a need. So we can think about those needs and analyze them.

And it can cover us right through until final disposal. And it covers all kinds of elements that you might find in a system. Remember our definition of ‘system’? It’s something that consists of hardware, software, data, human beings, etc. The standard can cope with all the elements of a system. In fact, it’s designed into the standard. It was specifically designed to look at all those different elements. Then to get different insights from those elements. It’s designed to get that comprehensive coverage. It’s really good at what it does. And it involves, not just engineers, but people from all kinds of other disciplines. Including operators, maintainers, etc, etc.

I came from a maintenance background. I was either directly or indirectly supporting operators. I was responsible for trying to help them get the best out of their system. Again, that’s a very familiar world to me. And rigorous standards like this can help us to think rigorously about what we’re doing. And so get results even in the presence of great complexity, which is not always a given, I must say.

So, we can be confident by applying the standard. We know that we’re going to get a comprehensive and thorough analysis. This assures us that what we’re doing is good.

Advantages #2

So, there’s another set of advantages. I’ve already mentioned that we get assurance. Assurance is ‘justified confidence’. So we can have high confidence that all reasonably foreseeable hazards will be identified and analyzed. And if you’re in a legal jurisdiction where you are required to hit a target, this is going to help you hit that target.

The standard was also designed for use in contracts. It’s designed to be applied to big programs. We’d define that as where we are doing the development of complex high-performance systems. So, there are a lot of risks. It’s designed to cope with those risks.

Finally, the standard also includes requirements for contracting, for interfaces with other systems, for interfaces with systems engineering. This is very important for a variety of disciplines. It’s important for other engineering and technical disciplines. It’s important for non-technical disciplines and for analysis and recordkeeping. Again, all these things are important, whether it is for legal reasons or not. We need to do recordkeeping. We need to liaise with other people and consult with them. There are legal requirements for that in many countries. This standard is going to help us do all those things.

But, of course, in a standard everything has pros and cons and Mil. Standard 882 is no exception. So, let’s look at some of the disadvantages.

Disadvantages #1

First of all, a full system safety program might be overkill for the system that you want to use, or that you want to analyze.  The Cold War, thank goodness, is over; generally speaking, we’re not in the business of developing cutting-edge high-performance killing machines that cost billions and billions of dollars and are very, very risky. These days, we tend to reduce program risk and cost by using off-the-shelf stuff and modifying it. Whether that be for military systems, infrastructure in the chemical industry, transportation, whatever it might be. Very much these days we have a family of products and we reuse them in different ways. We mix and match to get the results that we want.

And of course, all this comprehensive analysis is not cheap and it’s not quick. It may be that you’ve got a program that is schedule-constrained. Or you want to constrain the cost and you cannot afford the time and money to throw a full 882 program at it. So, that’s a disadvantage.

The second family of problems is that these kinds of safety standards have often been applied prescriptively. The customer would often say, ‘Go away and go and do this. I’m going to tell you what to do based on what I think reduces my risk’. Or at least it covers their backside. So, contractors got used to being told to do certain things by purchasers and customers. The customers didn’t understand the standards that they were applying and insisting upon. So, the customers did not understand how to tailor a safety standard to get the result that they wanted. So they asked for dumb things or things that didn’t add value. And the contractors got used to working in that kind of environment. They got used to being told what to do and doing it because they wouldn’t get paid if they didn’t. So, you can’t really blame them.

But that’s not great, OK? That can result in poor behaviors. You can waste a lot of time and money doing stuff that doesn’t actually add value. And everybody recognizes that it doesn’t add value. So you end up bringing the whole safety program into disrepute and people treat it cynically. They treat it as a box-ticking exercise. They don’t apply creativity and imagination to it. Much less determination and persistence. And that’s what you need for a good effective system safety program. You need creativity. You need imagination. You need people to be persistent and dedicated to doing a good job. You need that rigor so that you can have the confidence that you’re doing a good job because it’s intangible.

Disadvantages #2

Let’s move onto the second kind of family of disadvantages. And this is the one that I’ve seen the most, actually, in the real world. If you do all 10 tasks and even if you don’t do all 10, you can create too many hazards. If you recall the graphic from earlier, we have 10 tasks. Each task looks at the system from a different angle. What you can get is lots and lots of duplication in hazard identification. You can have essentially the same hazards identified over and over again in each task. And there’s a problem with that, in two ways.

First of all, quality suffers. We end up with a fragmented picture of hazards. We end up with lots and lots of hazards in the hazard log, but not only that. We get fragments of hazards rather than the real thing. Remember I said those tests for what a hazard really is? Very often you can get causes masquerading as hazards. Or other things that that exacerbating factors that make things worse. They’re not a hazard in their own right, but they get recorded as hazards. And that problem results in people being unable to see the big picture of risk. So that undermines what we’re trying to do. And as I say, we get lots of things misidentified and thrown into the pot. This also distracts people. You end up putting effort into managing things that don’t make a difference to safety. They don’t need to be managed. Those are the quality problems.

And then there are quantity problems. And from personal experience, having too many hazards is a problem in itself.  I’ve worked on large programs where we were managing 250 hazards or thereabouts. That is challenging even with a sizable, dedicated team. That is a lot of work in trying to manage that number of hazards effectively. And there’s always the danger that it will slide into becoming a box-ticking exercise. Superficial at best.

I’ve also seen projects that have two and a half thousand hazards or even 4000 hazards in the hazard log. Now, once you get up to that level, that is completely unmanageable. People who have thousands of hazards in a hazard log and they think they’re managing safety are kidding themselves. They don’t understand what safety is if they think that’s going to work. So, you end up with all these items in your hazard log, which become a massive administrative burden. So people end up taking shortcuts and the real hazards are lost. The real issues that you want to focus on are lost in the sea of detail that nobody will ever understand. You won’t be able to control them.

Unfortunately, Mil. Standard 882 is good at generating these grotesque numbers of hazards. If you don’t know how to use the standard and don’t actively manage this issue, it gets to this stage. It can go and does go, badly wrong. This is particularly true on very big programs. And you really need clarity on big projects.

Summary of Module

Let’s summarize what we’ve done with this module. The aim was to help us understand whether we’re doing the right thing and whether we’ve done it right. And standards are terrific for helping us to do that. They help us to ensure we’re doing the right thing. That we’re looking at the right things. And they help us to ensure that we’re doing it rigorously and repeatedly. All the good quality things that we want. And Mil. Standard 882E that we’re looking at is a system safety engineering standard. So it’s designed to deal with complexity and high-performance and high-risk. And it’s got a great pedigree. It’s been around for a long time.

Now that gives advantages. So, we have a system safety program with this standard that helps us to deal with complexity. That can cope with big programs, with lots of risks. That’s great.

The disadvantages of this standard are that if we don’t know how to tailor or manage it properly, it can cost a lot of money. It can take a lot of time to give results which can cause problems for the program. And ultimately, you can accidentally ignore safety if you don’t deliver on time. And it can generate complexity. And it can generate a quantity of data that is so great that it actually undermines the quality of the data. It undermines what we’re trying to achieve. In that, we get a fragmented picture in which we can’t see the true risks. And so we can’t manage them effectively. If we get it wrong with this standard, we can get it really wrong. And that brings us to the end of this module.

This is Module 3 of SSRAP

This is Module 3 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.

Categories
Mil-Std-882E Safety Analysis

System Safety Risk Assessment

Learn about System Safety Risk Assessment with The Safety Artisan.

In this module, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

You Will Learn to:

  • Explain what a system safety approach is and does; and
  • Define what a risk analysis program is; 
System Safety Risk Analysis.

Topics: System Safety Risk Assessment

Aim: How do we deal with real-world complexity?

  • What is System Safety?
  • The Need for Process;
  • A Realistic, Useful, Powerful process:
    • Context, Communication & Consultation; and
    • Monitoring & Review, Risk Treatment.
  • Required Risk Reduction.

Transcript: System Safety Risk Assessment

Click here for the Transcript on System Safety Risk Assessment

In this module, on System Safety Risk Assessment, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

What is System Safety?

To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context. This wider context is operational effectiveness – We want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.

And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work with the intention to produce specified results.

We’ve got three things there. We’ve got a system. We’ve got the operating environment within which it works- or designed to work. And we have the thing that it’s supposed to produce; its function or its application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?

That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.

We Need A Process

We’ve sorted our context. How is all this going to happen? We need a process. In the standard that we’re going to look at in the next module, we have an eight-element process. As you can see there, we start with documenting our approach. Then we identify and document hazards. We document everything according to the standard so forget that.

We assess risk. We plan how we’re going to mitigate the risk. We identify risk mitigation measures or controls as there are often known. Then we apply those controls to reduce risk. We verify and confirm that the risk reduction that we have achieved, or that we believe we will achieve. And then we got to get somebody to accept that risk. In other words, to say that it is an acceptable level of risk. That we can put up with this level of risk in exchange for the benefits that the system is going to give us. Finally, we need to manage risk through the entire lifecycle of the system until we finally get rid of it.

The key point about this is whatever process we follow, we need to approach it with rigor. We stick to a systematic process. We take a structured and rigorous approach to looking at our system.

And as you can see there from the arrows, every step in the eight-element sequence flows into the next step. Each step supports and enables the following steps. We document the results as we go. However, even this example is a little bit too simple.

A More Realistic Process

So, let’s get a more realistic process. What we’ve got here are the same things we’ve had before. We’ve established the context at the beginning. Next, there’s risk assessment. Risk assessment consists of risk identification, risk analysis, and risk evaluation. It asks ‘Where are we?’ in relation to a yardstick or framework that categorizes risk. The category determines whether a risk is acceptable or not.

After determining whether the risk is acceptable or not, we may need to apply some risk treatment. Risk Treatment will reduce the risk further. By then we should have the risk down to an acceptable level.

So, that’s the straight-through process, once through. In the real world, we may have to go around this path several times. Having treated the risk over a period of time, we need to monitor and review it. We need to make sure that the risk turns out, in reality, to be what we estimated it to be. Or at least no worse. If it turns out to be better- Well, that’s great!

And on that monitoring and review cycle, maybe we even need to go back because the context has changed. These changes could include using the system to do something it was not designed to do. Or modifying the system to operate in a wider variety of environments. Whatever it might be, the context has changed. So, we need to look again at the risk assessment and go round that loop again.

And while we’re doing all that, we need to communicate with other people. These other people include end-users, stakeholders, other people who have safety responsibilities. We need to communicate with the people who we have to work with. And we have to consult people. We may have to consult workers. We may have to consult the public, people that we put at risk, other duty holders who hold a duty to manage risk. That’s our cycle. That’s more realistic. In my experience as a safety engineer, this is much more realistic. A once-through process often doesn’t cut it.

Required Risk Reduction

We’re doing all this to drive risk down to an acceptable level. Well, what do we mean by that? Well, there are several different ways that we can do this, and I’ve got to illustrate it here. On the left-hand side of the slide, we have what’s usually known as the ALARP triangle. It’s this thing that looks a bit like a carrot where the width of the triangle indicates the amount of risk. So, at the top of the triangle, we’ve got lots of risks. And if you’re in the UK or Australia where I live, this is the way it’s done. So there will be some level of risk that is intolerable. Then if the risk isn’t intolerable, we can only tolerate it or accept it if it is ALARP or SFARP. And ALARP means that we’ve reduced the risk as low as reasonably practicable. And SFARP means so far as is reasonably practicable. Essentially, they’re the same thing – reasonably practical.

We must ensure that we have applied all reasonably practicable risk reduction measures. And once we’ve done so, if we’re in this tolerable or acceptable region, then we can live with the risk. The law allows us to do that.

That’s how it’s done in the UK and Australia. But in other jurisdictions, like the USA, you might need to use a different approach. A risk matrix approach as we can see on the right-hand side of this slide. This particular risk matrix is from the standard we’re about to look at. And we could take that and say, ‘We’ve determined what the risk is. There is no absolute limit on how much risk we can accept. But the higher the risk, the more senior level of sign-off from management we need’. In effect, you are prioritizing the risk. So you only bring the worst risks to the attention of senior management. You are asking  ‘Will you accept this? Or are you prepared to spend the money? Or will you restrict the operational system to reduce the risk?’. This is good because it makes people with authority consider risks. They are responsible and need to make meaningful decisions.

In short, different approaches are legal in different jurisdictions.

Summary of Module

In Module Two, we’ve asked ourselves, ‘How can we deal with real-world complexity?’. And one way that’s developed to do that is System Safety. System Safety is where we take a systematic approach to safety. This approach applies to both the system itself – the product – and the process of System Safety.

We address product and process. We need that rigorous process to give us confidence that what we’ve done is good enough. We have a realistic, useful and powerful process that enables us to put things in context. It helps us to communicate with everyone we need to, to consult with those that we have a duty to consult with. And also, we put around the basic risk process, this monitoring and review. And of course, we analyze risk to reduce it to acceptable levels. So we’ve got to treat the risk or reduce it or control it in some way to get it to those acceptable levels. In the end, it’s all about getting that required risk reduction to work. That reduction makes the risk acceptable to expose human beings to, for the benefit that it will give us.

This is Module 2 of SSRAP

This is Module 2 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.

Categories
Blog Safety Analysis

SSRAP Module 1 – Hazard and Risk Basics

Learn Hazard and Risk basics with The Safety Artisan.

So what is this risk analysis stuff all about? What is ‘risk’? How do you define or describe it? How do you measure it?

In this free session, I explain the basic terms and show how they link together, and how we can break them down to perform risk analysis. I understand risk and that allows me to explain it in simple terms. I’ve used all my 20+ years in the business to help me unpack the jargon and focus on what’s really important.  

You Will Learn to:

  • Describe fundamental risk concepts.
Recap: Risk Basics

Topics: Hazard and Risk Basics

  • Risk & Mishap;
  • Probability & Severity;
  • Hazard & Causal Factor;
  • Mishap (accident) sequence; and
  • Hazards: Tests & Example

Transcript: Hazard and Risk Basics

Click here for the Transcript on Risk Basics

Let’s get started with Module One. We’re going to recap on some Risk basics to make sure that we have a common understanding of risk. And that’s important because risk analysis is something that we do every day. Every time you cross the road. Every time you buy something expensive. Every time you decide whether you’re going to travel to something, or look it up online, instead. You’re making risk analysis decisions all the time without even realizing it. But we need something a little bit more formal than the instinctive thinking of our risk that we do all the time. And to help us do that, we need a couple of definitions to get us started.

What is Risk?

First of all, what is Risk? It’s a combination of two things. First, the severity of a mishap or accident. Second, the probability that that mishap will occur. So it’s a combination of severity and probability. We will see that illustrated in the next slide.

We’ll begin by talking about ‘mishap’. Well, what is a mishap? A mishap is an event – or a series of events -resulting in unintentional harm. This harm could be death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.

The particular standard we’re looking at today is covering a range of different harms. That’s why we’re focused on safety. And the term ‘mishap’ will also include negative environmental impacts from planned events. So, even if the cause is a deliberate event, we will include that as a mishap.

Probability and Severity

I said that the definition of risk was a combination of probability and severity. Here we got a little illustration of that.

Probability is; how likely is this thing to go wrong? How likely is this thing to happen?

And severity is; How significant is this event? This can vary in seriousness. From death to injury, illness, property damage or equipment loss, damage to the environment, or monetary loss.

And to be honest, we can apply or define risk any way we want. It doesn’t have to be a Safety Risk. We could be thinking about Financial Risk, Reputational Risk, whatever it might be. But what you see there with the little matrix is we measure risk. And whether we say the risk is high, medium or low, or whatever scheme we use. A combination of high severity and high likelihood is going to result in high risk. At the opposite end of the scale, the low probability that a low impact event is going to happen, we would call a low risk.

That’s what we mean by this combination of probability and severity. We put them together and we can measure risk in, to be honest, whatever way we choose to do so. This is a very simple example.

Safety Risks: Hazards

In safety, we have another concept. One that gives us a much finer degree of control over how we’re thinking about risk. We have this concept of hazards. As it says, a hazard is a real or potential condition that could lead to a mishap. It’s not the mishap, it’s a sort of an intermediate stage, as we will see. And the mishap can result in death, injury, property damage, damage to the environment.

Then there’s also this thing called causal factors, or causes. It might be one or several mechanisms that could trigger the hazard. Or they could lead to the hazard, which in turn can lead to a mishap. So the causal factor or the cause can trigger the hazard and then the hazard can lead to the mishap.

(Mishap) Accident Sequence

Here we have an illustration of an accident sequence or a mishap sequence if you prefer. Let’s not get hung up on terminology. So, we may have many causal factors on the left-hand side of this bow tie diagram. Any one of these factors may lead to a particular hazard. A single hazard we’re looking at here. And then that hazard may lead to a range of different consequences.

Not all these consequences are going to be bad. Not all the consequences are going to result in a mishap. There may be lots of consequences where there is no mishap, no accident, no harm whatsoever. There’s going to be a range of possible consequences. What I would like to take away from this diagram is one thought. That thought is ‘Yes, we can have causes leading to a hazard’ – this sort of pinch point in the middle. And from that hazard and number of consequences can arise.

Now that thought is important. It’s a very powerful concept because it helps us to reason about accident sequences. Also, it helps us to do some much more sophisticated work that would otherwise be possible.

Tests for a Hazard

There are three tests, that I know of, for a hazard. The first two are saying the same thing in different ways. We can think of a hazard as being both necessary and enough for harm to occur. We need the hazard to be present before harm can occur, but the hazard is enough for harm to occur. In other words, once the hazard is present, nothing else unusual needs to happen for harm to occur. Once the hazard is there, nothing else needs to go wrong for somebody to get hurt. Normal events can lead to a mishap once the hazard is present. Another helpful way of thinking about it is ‘hazard is an accident waiting to happen’.

Then the third on this list, we can think of a hazard at the point at which we lose control of something. It might be an energy source that we lose control of. It might be something toxic. It might be a physical piece of equipment that we’ve lost control of or a vehicle. It might be a substance. Whatever it might be, we’ve lost control and now somebody could get hurt.

So, those are some tests for a hazard and some different ways of thinking about hazards.

Example of a Hazard

But I always think it’s helpful to have an example. Let’s imagine we’ve got a causal factor. We’ve got some oil that is leaking from its container.

And we can imagine the hazard. The oil has got onto a walkway. Or pavement or a sidewalk or whatever you want to call it. It gets on to an area that human beings would walk on, as the name implies. It’s normal. So once the oil is on the walkway, nothing else unusual needs to happen for there to be an accident. But it doesn’t make the accident inevitable. Because if nobody comes along, there can be no accident. If somebody comes along, but they see the oil and they step over it and avoid it. Or even better, they warn other people about it and tear it up – but that’s another story. But the accident, the mishap is not inevitable.

One of the combinations that is possible is that we get a mishap. A person comes along, doesn’t see the oil, steps on it, slips, and hurts themselves. All these things have to happen in a sequence in this accident sequence for the mishap to occur. For people to get hurt. So there we have a little summary of those risk concepts that we need to get a hold of.

Summary of Module

We’ve covered risk and mishap, probability and severity, hazards, and causal factors. We’ve looked at the mishap or accident sequence, looked at hazards, and at some tests for what makes up a hazard. Including how we tell where the hazard is in the sequence? Where is it between cause, hazard, and consequence, the hazard is? We looked to an example of this in the module.

From this module, we have a common understanding of risk. This will form the foundation for everything that we’re going to do with risk from now on.

This is Module 1 of SSRAP

This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application. You can access the full course here.

You can find more introductory lessons at Start Here.

Categories
Safety Analysis

Risk Analysis Programs

Risk Analysis Programs – Design a System Safety Program for any system in any application.

Introduction to the System Safety Risk Analysis Programs Course.

Risk Analysis Programs: Learning Objectives

At the end of this course, you will be able to:

  • Describe fundamental risk concepts;
  • Explain what a system safety approach is and does;
  • Define what a risk analysis program is;
  • List the hazard analysis tasks that make up a program;
  • Select tasks to meet your needs;
  • Design a tailored analysis program for any application; and
  • Know how to get more information and resources.
Risk Analysis Programs: Click Here for the Transcript

Hello and welcome to this course on Systems Safety Risk Analysis Programs. I’m Simon Di Nucci, The Safety Artisan, and I’ve been a safety engineer and consultant for over 20 years. And I have worked on a wide range of safety programs doing risk analysis on all kinds of things. Ships, planes, trains, air traffic management systems, software systems, you name it. I’ve worked in the U.K., in Australia, and on many systems from the US. I have also spent hundreds of hours training hundreds of people on safety. And now I’ve got the opportunity to share some of that knowledge with you online.

So, what are the benefits of this course?

First of all, you will learn about basic concepts. About system safety, what it is and what it does. You will know how to apply a risk analysis program to a very complex system and how to manage that complexity. So, that’s what you’ll know.

At the end of the course, you will also be able to do things that you might not have been able to do before. You will be able to take the elements of a risk analysis program and the different tasks. You’ll be able to select the right tasks and form a program to suit your application, whatever it might be. Whether you might have a full, high-risk bespoke development system, or you’re taking a commercial system off the shelf and doing something new with it. You might be taking a product and using it in a new application or a new location. Whatever it might be, you will learn how to tailor your risk analysis program. This program will give you the analyses you need. And to meet your legal and regulatory requirements. Once you’ve learned how to do this, you can apply it to almost any system.

Finally, you will feel confident doing this. I will be interpreting the terminology used in the tasks and applying my experience. So, instead of reading the standard and being unsure of your interpretation, you can be sure of what you need to do. Also, I will show you how you can get good results and avoid some of the pitfalls.

So, these are the three benefits of the program.

1) You will know what to do.

2) You will be able to perform risk program tasks, and …

3) You’ll feel confident doing those tasks.

At the end of the course, I will also show you where to find further resources. There are free resources to choose from. But there are also paid resources for those who want to take their studies to the next level. I hope you enjoy the course.

Get the supporting safety analysis courses here.