Categories
Blog Safety Management

Risk Management 101

Welcome to Risk Management 101, where we’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts and then we’re going to build it up again and show you how it’s done. I’ve been involved in risk management, in project risk management, safety risk management, etc., for a long, long time.  I hope that I can put my experience to good use, helping you in whatever you want to do with this information.

Maybe you’re getting an interview. Maybe you want to learn some basics and decide whether you want to know more about risk management or not.  Whatever it might be, I think you’ll find this short session really useful. I hope you enjoy it and thanks for watching.

Welcome to Risk Management 101, where we’re going to…

You can get the RM101 Course as part of the FREE Triple Learning Bundle.

Risk Management 101, Topics

  • Hazard Identification;
  • Hazard Analysis;
  • Risk Estimation;
  • Risk [and ALARP] Evaluation;
  • Risk Reduction; and
  • Risk Acceptance.

Risk Management 101, Transcript

Introduction

Hi everyone and welcome to Risk Management 101. We’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts. Then we’re going to build it up again and show you how it’s done.

My name is Simon Di Nucci and I have a lot of experience working in risk management, project risk management, safety risk management, etc.  I’m hoping that I can put my experience to good use, helping you in whatever you want to do with this information. Whether you’re going for an interview or you want to learn some basics. You can watch this video and decide if you want to know more about risk management or if you don’t need to.  Whatever it might be, you’ll find this short session useful. I hope you enjoy it and thanks for watching.

Topics For This Session

Risk Management 101. So what does it all mean? We’re going to break risk management down into we’ve got six constituent parts. I’m using a particular standard that breaks it down this way. Other standards will do this in different ways. We’ll talk about that later. Here we’ve got risk management broken down into; hazard identification, hazard analysis, risk estimation, risk evaluation (and ALARP), risk reduction, and risk acceptance.

Risk Management

Let’s get right on to that. Risk management – what is it? It’s defined as “the systematic application of management policies, procedures, and practices to the tasks of hazard identification, hazard analysis, risk estimation, risk and ALARP evaluation, risk reduction, and risk acceptance”.

There are a couple of things to note here. We’re talking about management policies, procedures, and practices. The ‘how’ we do it. Whether it’s a high-level policy or low-level common practice. E.g. how things are done in our organization vs how the day-to-day tasks are done? And it’s also worth saying that when we talk about ‘hazards’, that’s a safety ‘ism’. If we were doing security risk management, we could be talking about ‘threats’. We can also be talking about ‘causes’ in day-to-day language. So, we can be talking about something causing a risk or leading to a risk. More on that later, but that’s an overview of what risk management is.

Part 1

Let’s look at it in a different way. For those of you who like a visual representation, here is a graph of the hierarchical breakdown. They need to happen in order, more or less, left to right. And as you can see, there’s a link between risk evaluation and risk reduction. We’ll come on to that. So, it’s not ‘or’ it’s a serial ‘this is what you have to do’. Sometimes they’re linked together more intimately.

Hazard Identification

First of all, hazard identification. So, this is the process where we identify and list hazards and accidents associated with the system. You may notice that some words here are in bold. Where a word is in bold, we are going to give the definition of what it is later.

These hazards could lead to an accident but are only associated with the system. That’s the scope. If we were talking about a system that was an airplane, a ship, or a computer, we would have a very different scope. There would also be a different way that maybe accidents would happen.

On a more practical level, how do we do hazard identification? I’m not going to go into any depth here, but there are certain classic ones. We can consult with our workers and inspect the workplace where they’re operating. In some countries, that’s a legal requirement (Including in Australia where I live). Another option is looking at historical data. And indeed, in some countries and in some industries, that’s a requirement. A requirement means we have to do that. And we can use special analysis techniques. Now, I’m not going to talk about any of those analysis techniques today. You can watch some other sessions on The Safety Artisan to see that.

Hazard Analysis

Having done hazard identification, we’ve asked ourselves ‘What could go wrong?’. We can put some more detail on and ask, ‘How could it go wrong? And how often?’. That kind of stuff. So, we want to go into more detail about the hazards and accidents associated with this particular system. And that will help us to define some accident sequences. We can start with something that creates a hazard and then the hazard may lead to an accident. And that’s what we’re talking about. Later, we will show that using graphics can be helpful.

But again, more on terminology. In different industries, we call it different things. We tend to say ‘accident’ in the UK and Australia. In the U.S., they might call it a ‘mishap’, which is trying to get away from the idea that something was accidental. Nobody meant it to happen. Mishap is a more generic term that avoids that implication. We also talk about ‘losses’ or we talk about ‘breaches’ in the security world. We have some issues where somebody has been able to get in somewhere that they should not. And we can talk about accident sequences. Or, in a more common language, we call it a sequence of events. That’s all it is.

Risk Estimation

Now we’re talking about the risk estimation. We’ve thought about our hazards and accidents and how they might progress from one to another. Let’s think about, ‘How big is the risk of this actually happening?’. Again, we’ll unpack this further later at the next level. But for now, we’re going to talk about the systematic use of available information. Systematic- so, ordered. We’re following a process. This isn’t somebody on their own taking a subjective view ‘Look, I think it’s not that’. It’s a process that is repeatable. We want to do something systematic. It’s thorough, it’s repeatable, and so it’s defendable. We can justify the conclusions that we’ve come to because we’ve done it with some rigour. We’ve done it in a systematic way. That’s important. Particularly if we’re talking about harm coming to people or big losses.

Risk and ALARP / SFARP Evaluation

Now, risk evaluation is just taking that estimated risk just now and comparing it to something and saying, “How serious is this risk?”. Is it something that is very low? If it’s very insignificant then we’re not bothered about it. We can live with it. We can accept it. Or is it bigger than that? Do we need to do something more about it? Again, we want to be systematic. We want to determine whether risk reduction is necessary. Is this acceptable as it is or is it too high and we need to reduce it? That’s the core of risk evaluation.

Tolerability

In this UK-based standard – we’re using terminology is found in different forms around the world. But in the UK, they talk about ‘tolerability’. We’re talking about the absolute level of risk. There probably is an upper limit that’s allowed in the law or in our industry. And there’s a lower limit that we’re aiming for. In an ideal world, we’d like all our risks to be low-level risks. That would be terrific.

So, that’s ‘tolerability’. And you might hear it called different things. And then within the UK system, there are three classes of ‘tolerability’ at risk. We could say it’s either ‘broadly acceptable’- it’s very low. It’s down in the target region where we like to get all our risks. It’s ‘tolerable’- we can expose people to this risk or we can live with this risk, but only if we’ve met certain other criteria. And then there’s the risk that it’s so big. It’s so far up there, that we can’t do that. We can’t have that under any circumstances. It’s unacceptable. You can imagine a traffic light system where we have categorized our risk.

ALARP / SFARP

And then there’s the test of whether our risk can be accepted in the UK. It’s called ALARP. We reduce the risk As Low As Reasonably Practicable. And in other places, you’ll see SFARP. We’ve eliminated or minimized the risk So Far As Is Reasonably Practicable. In the nuclear industry, they talk about ALARA: As Low As Reasonably Achievable. And then different laws use different tests. Whichever one you use, there’s a test that we have to say, “Can we accept the risk?” “Have we done enough risk reduction?”. And whatever you’ve put in those square brackets, that’s the test that you’re using. And that will vary from jurisdiction to jurisdiction. The basic concept of risk evaluation is estimating the level of risk. Then compare it to some standard or some regulation. Whatever it might be, that’s what we do. That’s risk evaluation.

Risk Reduction

We’ve asked, “Do we need to reduce risk further?”. And if we do, we need to do some risk reduction. Again, we’re being systematic. This is not some subjective thing where we go “I have done some stuff, it’ll be alright. That’s enough.”. We’re being a bit more rigorous than that. We’ve got a systematic process for reducing risk. And in many parts of the world, we’re directed to do things in a certain way.

Elimination

This is an illustration from an Australian regulation. In this regulation, we’re aiming to eliminate risk. We want to start with the most effective risk reduction measures. Elimination is “We’ve reduced the risk to zero”. That would be lovely if we could do that but we can’t always do that.

Substitution

What’s the next level? We could get rid of this risk by substituting something less risky. Imagine we’ve got a combustion engine powering something. The combustion engine needs flammable fuel and it produces toxic fumes. It could release carbon monoxide and CO2 and other things that we don’t want. We ask, “Can we get rid of that?”. Could we have an electric motor and have a battery instead? That might be a lot safer than the combustion engine. That is a substitution. There are still risks with electricity. But by doing this we’ve substituted something risky for something less risky.

Isolation

Or we could isolate the hazard. Let’s use the combustion engine as an example again. We can say, “I’ll put that in the fuel and the exhaust somewhere, a long way from people”. Then it’ll be a long way from where it can do harm or cause a loss.” And that’s another way of dealing with it.

Engineering Controls

Or we could say, “I’m going to reduce the risks through engineering controls”. We could put in something engineered. For example, we can put in a smoke detector. A very simple, therefore highly reliable, device. It’s certainly more reliable than a human. You can install one that can detect some noxious gases. It’s also good if it’s a carbon monoxide detector. Humans cannot detect carbon monoxide at all. (Except if you’ve got carbon monoxide poisoning, you’ll know about it. Carbon monoxide poisoning gives you terrible headaches and other symptoms.) But of course, that’s not a good way to detect that you’re breathing in poisonous gas. We do not want to do it that way.

So, we can have an engineering control to protect people. Or we can use an interlock. We can isolate things in a building or behind a wall or whatever. And if somebody opens the door, then that forces the thing to cut out so it’s no longer dangerous. There are different things for engineering controls that we can introduce. They do not rely on people. They work regardless of what any person does.

Administrative / Procedural Controls

Next on the list, we could reduce exposure to the hazard by using administrative controls. That’s giving somebody some rules to follow a procedure. “Do this. Don’t do that.” Now, that’s all good. We can give people warning signs and warn people not to approach something. But, of course, sometimes people break the rules for good reasons. Maybe they don’t understand. Or, maybe they don’t know the danger. Perhaps they’ve got to do something or maybe the procedure that we’ve given them doesn’t work very well. It’s too difficult to get the job done, so people cut corners. So, procedural protection can be weak. And a bit hit-and-miss sometimes.

Personal Protective Equipment

Finally, we can give people personal protective equipment. We can give them some eye protection. I’m wearing glasses because I’m short-sighted. But you can get some goggles to protect your eyes from damage. Damage like splashes, flying fragments, sparks, etc. We can have a hard hat so that if we’re on a building site and something drops from above on us that protects the old brain box.

It won’t stop the accident from happening, but it will help reduce the severity of the accident. That’s the least effective. We’re doing nothing to prevent the accident from happening. We’re reducing the severity in certain circumstances. For example, if you drop a ton of bricks on me, it doesn’t matter whether I’m wearing a hard hat or not. I’m still going to get crushed. But with one brick, I should be able to survive that if I’m wearing a hard hat.

Risk Acceptance

Let’s move on to risk acceptance. At some stage, if we have reduced the risk to a point where we can accept it. That is, we can live with it and we’ve decided that we’re going to need to do whatever it is that is exposing us to the risk. We need to use the system. For example, we want to get in our car to enable us to go from A to B quickly and independently. So, we’re going to accept the risk of driving in our car. We’ve decided we’re going to do that. We make risk-acceptance decisions every day, often without thinking about it. We get in a car every day on average and we don’t worry about the risk, but it’s always there. We’ve just decided to accept it.

But in this example, it’s not an individual deciding to do something on the spur of the moment. Nor is it based on personal experience. We’ve got a systematic process where a bunch of people come together. The relevant stakeholders agree that a risk has been assessed or has been estimated and has been evaluated. They agree that the risk reduction is good enough and that we will accept that risk. There’s a bit more to it than you and I saying “That’ll be alright.”

Part 2

Let’s summarise where we’ve got to. We’ve talked about these six components of risk management. That’s terrific. And as you can see, they all go together. Risk evaluation and risk reduction are more tightly coupled. That’s because when we do some risk reduction, we then re-evaluate the risk. We ask ‘Can we accept it?’. If the answer is ‘No.’ we need to do some more work. Then we do some more risk reduction. So those tend to be a bit more coupled together at the end. That’s the level we’ve got to. We’re now going to go to the next level.

So, we’re going to explain these things. We’ve talked about hazard identification and hazard analysis, but what is a hazard? And what is an accident? And what is an accident sequence? We’re going to unpack that a bit more. We’re going to take it to the next level. And throughout this, we’re talking about risk over and over again. Well, what is ‘risk’? We’re going to unpack that to the next level as well.

This is a safety standard. We’re talking about harm to people. How likely is that harm and how severe might it be? But it might be something else. It might be a loss or a security breach. Or a financial loss, a negative result for our project. We might find ourselves running late. Or we’re running over budget. We might be failing to meet quality requirements. Or we’re failing to deliver the full functionality that we said we would. Whatever it might be.

Hazard

So, let’s unpack this at the next level. A hazard is a term that we use, particularly in safety. As I say, we call it other things in different realms. But in the safety world, it’s a physical situation or it’s a state of a system.

As it says, it often follows from some initiating event that we may call a ‘cause’. The hazard may lead to an accident. However, the key thing to remember is once a hazard exists, an accident is possible, but it’s not certain. You can imagine the sort of cartoon banana skin on the pavement gag. Well, the banana skin is the hazard. In the cartoon, the cartoon character always steps on the banana skin. They always fall over the comic effect. But in the real world, nobody may tread on the banana skin and slip over. There could be nobody there to slip over all the banana skin. Or even if somebody does, they could catch themselves. Or they fall, but it’s on a soft surface and they don’t hurt themselves so there’s no harm.

So, the accident isn’t certain. And in fact, we can have what we call ‘non-accident’ outcomes. We can have harmless consequences. A hazard is an important midway step. I heard it called an accident waiting to happen, which is a helpful definition. An accident waiting to happen, but it doesn’t mean that the accident is inevitable.

Accident

But accidents can happen. Again, the ‘accident’, ‘mishap’, or ‘unintended event’. Something we did not want or a sequence of events that caused harm. And in this case, we’re talking about harm to people. And as I say, it might be a security breach. It might be a financial loss or reputational damage. Something might happen that is very embarrassing for an organization or an individual. Or again, we could have a hiccup with our project.

Harm

But in this case, we’re talking about harm. With this kind of standard, we’re using what you might call a body count approach to the harm. We’re talking about actual death, physical injury, or damage to the health of people.

This standard also considers the damage to property and the environment. Now, very often we are legally required to protect people and the environment from harm. Property less so. However, there will be financial implications of losses of property or damage to the systems. We don’t want that. But it’s not always criminally illegal to do that. Whereas usually, hurting people and damaging the environment is. So, this is ‘harm’. We do not want this thing to happen. We do not want this impact.

Safety is a much tougher business in this instance. If we have a problem with our project, it’s embarrassing but we could recover it. It’s more difficult to do that when we hurt somebody.

Risk

And always in these terms, we’re talking about ‘risk’. What is ‘risk’? Risk is a combination of two things. It’s a combination of the likelihood of harm or loss and the severity of that harm or loss. It’s those two things together. And we’ve got a very simple illustration here, a little table. And they’re often known as a risk matrix but don’t worry about that too much. Whatever you want to call it. We’ve got a little two by two table here and we’ve got likelihood in the white text and severity in the black.

Low Risk

We can imagine where there’s a risk where we have a low likelihood of a ‘low harm’ or a ‘low impact’ accident or outcome. We say, ‘That’s unlikely to happen, and even if it does not much is going to happen.’ It’s going to be a very small impact. So, we’d say that that’s a low risk.

Then at the other end of the spectrum, we can imagine something that has a high likelihood of happening. And that likelihood also has a high impact. Things that happen that we definitely do not want to happen. And we say, ‘That’s a high risk and that’s something that we are very, very concerned about.’

Medium Risk

And then in the middle, we could have a combination of an outcome that is quite likely, but it’s of low severity. Or it’s of high severity, but it’s unlikely to happen. And we say, ‘That’s a medium risk’.

Now, this is a very simplified matrix for teaching purposes only. In the real world, you will see matrices that are four by four, five by five, or even six by six, or combinations thereof. And in security where they talk about threat and vulnerability and the outcomes. Here, you might see multiple matrices used. They use multiple matrices to progressively build up a picture of the risk. They use matrices as building blocks. So, it may not be only one matrix used in a more complex thing you’ve got to model. But here we’ve got a nice, simple example. This illustrates what risk is. It’s a combination of severity and likelihood of harm or loss. And that’s what risk is, fundamentally. And if we have a firm grasp of these fundamentals, it’ll help us to reason and deal with almost anything. With enough application.

Accident Sequence

Now, let’s move on and talk about accident sequences. We’re talking about a progression in this case. We’re imagining a left-to-right path. A progression of events that results in an accident. This diagram, which looks like a bow tie, is meant to represent the idea that we can have one hazard. There might be many causes that lead to this hazard. There might be many different things that could create the hazard or initiate the hazard. And the hazard may have many different consequences.

Consequences

As I’ve said before, nothing at all may happen. That might be the consequence of the hazard. Most of the time that’s what’s going to happen. But there may be a variety of consequences. Somebody might get a minor injury or there might be a more serious accident where one or more people are killed. A good example of this is fire. So, the hazard is the fire. The causes might be various. We could be dealing with flammable chemicals, or a lightning strike, or an electricity arc flash. Or we could be dealing with very high temperatures where things spontaneously burst into flames. Or we could have a chemical in the presence of pure oxygen. Some things will spontaneously burst into flames in the presence of pure oxygen. So there’re a variety of causes that lead to the fire.

An Example

And the fire might be very small and burn itself out. It causes very little damage and nobody gets hurt. Or it might lead to a much bigger fire that, in theory, could kill lots of people. So, there’s a huge range of consequences potentially from one hazard. But the accident sequence is how we would describe and capture this progression. From initiating events to the hazard to the possible consequences. And by modeling the accident sequence, of course, we can think about how we could interrupt it.

Part 3

We’ve broken risk management down into those six constituent parts. We’ve gone to the next level, in that we’ve sort of gone down to the concepts that underpin these things. These hazards, the accidents, and the accident sequence. We’ve talked about risk itself and what we don’t want to happen. The harm, the loss, the financial loss, the embarrassment, the failed or late or budget project, a security breach, the undesired event, etc. We had an objective which was to do something safely or to complete a project and the risk is that that won’t happen. That there’ll be an impact on what we were trying to do that is negative. That is undesirable.

There are just only more concepts that we need to look at to complete the pattern, as you can see. We’ve been talking about the system. And we’ve been talking about doing things systematically. Then a system works in an operating environment. So, let’s unpack that.

System

First of all, we have a system. The system is going to be a combination of things. I wouldn’t call a pen or a pencil a system. It’s only got a couple of components. You could pull it apart. But it’s too simple to be worth calling it a system. We wouldn’t call it a pen system, would we? So, a system is something more complex. It’s a combination of things and we need to define the boundary. I’ll come back to that.

But within this boundary, we’ve got some different elements in the system that work together. Or they’re used together within a defined operating environment. So, we’re going to expose this system to a range of conditions in which it is designed to work. The intention is the system is going to do whatever it does to perform a given task. It can do one defined task or achieve a specific purpose.

I talked before about getting in our car. A car is complex enough to be called a system. We get in our car and we drive it on the roads. Or if we’ve got a four-wheel drive, we can drive Off-Road. Or we can use it in a more demanding operating environment to achieve a specific purpose. We want to transport ourselves, and sometimes some stuff, from A to B. That’s what we’re trying to do with the system.

Within the System

And within that system, we may have personnel/people, we may have procedures. A bunch of rules about how you drive a car legally in different countries. We’ve got materials and physical things – what the car is made of. We could have tools to repair it, and change wheels. We’ve got some other equipment, like a satnav. We’ve got facilities. We need to take a car somewhere to fill up with fuel or to recharge it. We’ve got services like garages, repairs, servicing, etc. And there could be some software in there as well. Of course, these days in the car, there’s software everywhere in most complex devices.

So, our system is a combination of lots of different things. These things are working together to achieve some kind of goal or some kind of result. There’s somewhere we want to get to. And it’s designed to work in a particular operating environment. Cars work on roads really well. Off-road cars can work on tracks. Put them in deep water, they tend not to work so well. So, let’s talk about that operating environment.

Operating Environment

What we’ve got here, is the total set of all external, natural, and induced conditions. (That’s external to the system, so outside the boundary.) So, it might be these conditions-. It might be natural or it might be generated by something else, which a system is exposed to at any given moment. We need to get a good understanding of the system, the operating environment, and what we want it to do.

If we have a good understanding of those three things, then we will be well on the way to being able to understand the risks associated with that system. That’s one of the key things with risk management. If you’ve got those three things, that’s crucial. You will not be able to do effective risk management if you don’t have a grasp of those things. And if you do have a thorough grasp of those things, it’s going to help you do effective risk management.

Conclusion

So, we’ve talked about risk management. We’ve broken it down into some big sections. Those six sections; the hazard identification; analysis; risk estimation; evaluation; reduction; and acceptance. We’ve seen how those things depend on only a few concepts. We’ve got the concepts of ‘hazards’, ‘risks’, and ‘accidents’. As well as the undesirable consequences that the risk might result in. The risk is measured based on the likelihood and severity of that harm or loss occurring.

When we’re dealing with a more complex system, we need to understand that system and the environment in which it operates. Of course, we’ve put it in that environment for a purpose. And that unpacking has allowed us to break down quite a big concept, risk management. A lot of people, like myself, spend years and years learning how to do this. It takes time to gain experience because it’s a complex thing. But if we break it down, we can understand what we’re doing. We can work our way down the fundamentals. And then if we’ve got a good grasp of the fundamentals, that supports getting the more complex stuff right. So, that’s what risk management is all about. That’s your risk management 101 and I hope that you find that helpful.

Copyright Statement

I just need to say briefly that those quotations from the standard. I can do that under a Creative Commons license. The CC4.0. That allows me to do that within limits that I am careful to observe. But this video presentation is copyrighted by the Safety Artisan.

For More…

And you can see more like these at the Safety Artisan website. That’s www.safetyartisan.com. And as you can see, it’s a secure site so you can visit without fear of a security breach. So, do head over there. Subscribe to the monthly newsletter to get discounts on paid videos and regular updates of what’s coming up. both paid and free.

So, it just remains for me to say thanks very much for watching and I look forward to catching up with you again very soon.

End of Risk Management 101

You can get the RM101 Course as part of the FREE Triple Learning Bundle. For more introductory sessions on this site start here.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Blog Risk Assessment

System Safety Risk Analysis

In this module, System Safety Risk Analysis, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.

This post is part of a series:

Aim: How do we deal with real-world complexity?

  • What is System Safety?
  • The Need for Process;
  • A Realistic, Useful, Powerful process:
  • Context, Communication & Consultation;
  • Monitoring & Review, Risk Treatment; and
  • Required Risk Reduction.

Transcript: System Safety Risk Analysis

What is System Safety?

To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context.

This wider context is operational effectiveness – we want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.

And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work to produce specified results.

We’ve got three things here: a system; the operating environment in which it is designed to work; and, we have its function or application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?

That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.

This is Module 2 of SSRAP

This is Module 2 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.

The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos here and order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.

Categories
Functional Safety software safety

Updating Legal Presumptions for Computer Reliability

TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice!

Background

The ‘Horizon’ Scandal in the UK was a major miscarriage of justice:

Between 1999 and 2015, over 900 sub postmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions carried out by the Post Office. Other sub postmasters were prosecuted but not convicted, forced to cover Horizon shortfalls with their own money, or had their contracts terminated. The court cases, criminal convictions, imprisonments, loss of livelihoods and homes, debts and bankruptcies, took a heavy toll on the victims and their families, leading to stress, illness, family breakdown, and at least four suicides.

Wikipedia, British Post Office scandal

‘Horizon’ was a faulty computer system, produced by Fujitsu.  The Post Office had lobbied the British Government to reverse the burden of proof so that courts assumed that computer systems were reliable until proven otherwise.  This made it very difficult for sub-postmasters – small-business franchise owners – to defend themselves in court.

A 1984 act of parliament ruled that computer evidence was only admissible if it could be shown that the computer was used and operating properly. But that act was repealed in 1999, just months before the first trials of the Horizon system began. When post office operators were accused of having stolen money, the hallucinatory evidence of the Horizon system was deemed sufficient proof. Without any evidence to the contrary, the defendants could not force the system to be tested in court and their loss was all but guaranteed.

Alex Hern writing in The Guardian in January 2024.

This shocking miscarriage of justice was based on an equally shocking presumption.  One that anyone with a background in software development would find ridiculous. 

Introduction 

Legal experts warn that failure to immediately update laws regarding computer reliability could lead to a recurrence of scandals like the Horizon case. Critics argue that the current presumption of computer reliability shifts the burden of proof in criminal cases, potentially compromising fair trials.

The Presumption of Computer Reliability

English and Welsh law assume computers to be reliable unless proven otherwise, a principle criticized for its reversal of the burden of proof. Stephen Mason, a leading barrister in electronic evidence, emphasizes the unfairness of this presumption, stating it impedes individuals from challenging computer-generated evidence.

It is also patently unrealistic.  As I explain in my article on the Principles of Safe Software Development, there are numerous examples of computer systems going wrong:

  • Drug Infusion Pumps,
  • The NASA Mars Polar Lander,
  • The Airbus A320 accident at Warsaw,
  • Boeing 777 FADEC malfunction,
  • Patriot Missile Software Problem in Gulf War II, and many more…

Making software dependable or safe requires enormous effort and care.

Historical Context and the Horizon Scandal

Dating back to an old common law principle, presuming the reliability of mechanical systems, the UK Post Office also lobbied to have the principle applied to digital systems. The implications of this change became evident during the Horizon scandal, where flawed computer evidence led to wrongful accusations against post office operators. Repealing a 1984 act further weakened safeguards against unreliable computer evidence, exacerbating the issue.

International Influence and Legal Precedents

The influence of English common law extends internationally, perpetuating the presumption of computer reliability in legal systems worldwide. Mason highlights cases from various countries supporting this standard, underscoring its global impact.

“[The Law] says, for the person who’s saying ‘there’s something wrong with this computer’, that they have to prove it. Even if it’s the person accusing them who has the information.”

Stephen Mason

Modern Challenges and the Rise of AI

Advancements in AI technology intensify the need to reevaluate legal presumptions. Noah Waisberg, CEO of Zuva, warns against assuming the infallibility of AI systems, which operate probabilistically and may lack consistency.

With a traditional rules-based system, it’s generally fair to assume that a computer will do as instructed. Of course, bugs happen, meaning it would be risky to assume any computer program is error-free…Machine-learning-based systems don’t work that way. They are probabilistic … you shouldn’t count on them to behave consistently – only to work in line with their projected accuracy…It will be hard to say that they are reliable enough to support a criminal conviction.

Noah Waisberg

This poses significant challenges in relying on AI-generated evidence for criminal convictions.

Section 5: Proposed Legal Reforms

James Christie is a software consultant, who co-authored recommendations for an update to the UK law.  He proposes two-stage reforms to address the issue.

The first would require providers of evidence to show the court that they have developed and managed their systems responsibly, and to disclose their record of known bugs … If they can’t … the onus would then be on the provider of evidence to show the court why none of these failings or problems affect the quality of evidence, and why it should still be considered reliable.

James Christie

First, evidence providers must demonstrate responsible development and management of their systems, including disclosure of known bugs. Second, if unable to do so, providers must justify why these shortcomings do not affect the evidence’s reliability.

The Reality of Software Development

First of all, we need to understand how mistakes made in software can lead to failures and ultimately accidents.

Errors in Software Development

This is illustrated well by this standard BS 5760. We see that during development people, either on their own or using tools make mistakes. That’s inevitable. And there will be many mistakes in the software – as we will see. These mistakes can lead to faults or defects being present in the software. Again, inevitably, some of them get through.

BS 5760-8:1998. Reliability of systems, equipment and components. Guide to assessment of the reliability of systems containing software

If we jump over the fence, the software is now in use. All these faults are in the software but they lie hidden. Until that is, some revealing mechanism comes along and triggers them. That revealing mechanism might be a change in the environment and operator scenario or changing inputs that maybe the software is seeing from sensors.

That doesn’t mean that a failure is inevitable because lots of errors don’t lead to failures that matter. But some do. And that is how we get from mistakes to false or defects in the software to run time errors.

What Happens to Errors in Software Products?

A long time ago (1984!), a very well-known paper in the IBM Journal of Research looked at how long it took faults in IBM operating system software to become failures for the first time. We are not talking about cowboys producing software on the web that may or may not work okay, or people in their bedrooms producing apps. We’re talking about a very sophisticated product here that it was in use all around the world.

Yet, what Adams found was that lots of software faults took more than 5,000 operating years to be revealed. He found that more than 90% of faults in the software would take longer than 50 years to become failures.

‘Optimizing Preventive Service of Software Products’ Edward N. Adams, IBM Journal of Research and Development, 1984, Vol 28, Iss. 1

There are two things that Adams’s work tells us.

First, in any significant piece of software, there is a huge reservoir of faults waiting to be revealed. So if people start telling you that their software contains no defects or faults, either they’re dumb enough to believe that or they think you are. What we see in reality is that even in a very high-quality software product, there are a lot of latent defects.

Second, many of them – the vast majority of them – will take a long, long time to reveal themselves. Testing will not reveal them. Using Beta versions will not reveal them. Fifty years of use will not reveal them. They’re still there.

[This Section is a short extract from my course Principles of Safe Software Development.]

Conclusion

Legal experts stress the urgency of updating laws to reflect the fallibility of computers, crucial for ensuring fair trials and preventing miscarriages of justice. The UK Ministry of Justice acknowledges the need for scrutiny, pending the outcome of the Horizon inquiry, signaling a potential shift towards addressing issues of computer reliability in the legal framework.

Hopefully, the legal people will come to realize what software engineers have known for a long time.  Software reliability is difficult to achieve and must be demonstrated.

Categories
Blog Safety Analysis

Hazard and Risk Basics

What are the Hazard and Risk basics? So, what is this risk analysis stuff all about? What is ‘risk’? How do you define or describe it? How do you measure it? When? Why? Who…?

In this free session, I explain the basic terms and show how they link together, and how we can break them down to perform risk analysis. I understand hazards and risks because I’ve been analyzing them for a long time. Moreover, I’ve done this for aircraft, ships, submarines, sensors, command-and-control systems, and lots of software!

Everyone does it slightly differently, but my 25+ years of diverse experience lets me focus on the basics. That allows me to explain it in simple terms. I’ve unpacked the jargon and focus on what’s important.  

This post is part of a series:

    Recap: Risk Basics

    Topics: Hazard and Risk Basics

    • Risk & Mishap;
    • Probability & Severity;
    • Hazard & Causal Factor;
    • Mishap (accident) sequence; and
    • Hazards: Tests & Example

    Transcript: Hazard and Risk Basics

    Let’s get started with Module One. We’re going to recap some Risk basics to make sure that we have a common understanding of risk. And that’s important because risk analysis is something that we do every day. Every time you cross the road, or you buy something expensive, or you decide whether you’re going to travel to something, or look it up online, instead.

    You’re making risk analysis decisions all the time without even realizing it. But we need something a little bit more formal than the instinctive thinking of our risk that we do all the time. And to help us do that, we need a couple of definitions to get us started.

    What is Risk?

    First of all, what is Risk? It’s a combination of two things. First, the severity of a mishap or accident. Second, the probability that that mishap will occur. So it’s a combination of severity and probability. We will see that illustrated in the next slide.

    We’ll begin by talking about ‘mishap’. Well, what is a mishap? A mishap is an event – or a series of events -resulting in unintentional harm. This harm could be death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.

    The particular standard we’re looking at today covers a range of different harms. That’s why we’re focused on safety. And the term ‘mishap’ will also include negative environmental impacts from planned events. So, even if the cause is a deliberate event, we will include that as a mishap.

    Probability and Severity

    I said that the definition of risk was a combination of probability and severity. Here we got a little illustration of that…

    This is Module 1 of SSRAP

    This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.

    The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos here and order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.

    Categories
    Blog Risk Assessment

    SSRAP: Start the Course

    This post, ‘SSRAP: Start the Course’, gives an overview of System Safety Risk Assessment Programs. It describes the Learning Objectives of the Course and its five modules. We’re going to learn how to:

    • Describe fundamental risk concepts.
    • Explain what a Systems Safety Approach to Risk is.
    • Define within that System Safety Approach, what a Risk Analysis Program is.
    • List Hazard Analysis Tasks that make up a program.
    • Select tasks to meet our needs.
    Start of the Course: Highlights

    This post is part of a series:

    SSRAP: Start of the Course – Transcript

    Welcome to this course on System Safety Risk Analysis Programs. It’s a five-part course for beginners and practitioners. It will also benefit a wider range of people.

    Learning Objectives

    In this course, we will learn how to do several things. First of all, we’re going to learn how to describe fundamental risk concepts. We’re going to explain what a Systems Safety Approach to Risk is and what it does. We will define within that System Safety Approach, what a Risk Analysis Program is. We’re going to be able to list Hazard Analysis Tasks that make up a program. We’ll be able to select tasks to meet our needs.

    At the end of this task, we should be able to design a tailored Risk Analysis Program for any application. And also, we’re going to learn how to get some more information resources on how to do that.

    Topics for this Course

    So how is that going to work? Well. In five modules. In Module One, we’re going to go over some risk basics. The reason for this is to make sure we’ve got a common understanding.

    In Module Two, we’re going to look at Systems Safety Risk Analysis. What it is, what it does, and the benefits it delivers.

    In Module Three, we will look at a particular System Safety Program Standard. We will understand what it was designed to do and learn what it’s good and not so good at.

    In Module Four, we’re going to take all the previous knowledge from Modules One to Three and put it together. We will use that information to design a Risk Analysis Program. This information can also help design any number of programs depending on what we want to do.

    And then finally, in Module Five, we’ll look at where to get more resources to take us deeper to the next level…

    This is SSRAP: Start of the Course

    This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.

    The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos here and order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.

    Categories
    Blog Tools & Techniques

    Three Insightful Methods for Causal Analysis

    In this post, we will look at Three Insightful Methods for Causal Analysis.  Only three?!  If you search online, you will probably find eight methods coming up:

    • Pareto Charts;
    • Failure Mode and Effect Analysis (FMEA);
    • Five Whys;
    • Ishikawa Fishbone Diagram;
    • Fault Tree Analysis;
    • 8D Report Template Checklist;
    • DMAIC Template; and
    • Scatter Diagrams.

    However, not all these methods are created equal!  Only some provide real insight to the challenge of causal analysis.  So, I’ve picked the best ones – based on my 25 years’ experience in system safety – and put them in this post.

    What are Causes and Why are They Important?

    Before we go any further, I just want to explain some basic terms.  When we’re doing safety analysis we have hazards and as the sort of bow tie diagram suggests, one hazard can have many causes and one hazard can have many consequences.

    The Accident Sequence Illustrated.

    Now, some of those consequences will be harmless but some may result in harm to people. And that progression from causes to hazards to consequences is known as an accident sequence. We tend to Okay? So we’re looking at the worst-case scenario where somebody gets hurt.

    (It’s not really the focus of this post, but the test for a hazard is it’s necessary for the accident. If there’s no hazard, there’s no accident. Once the hazard is present, nothing else weird or unusual needs to happen. For the accident to occur. So, the hazard is both necessary and sufficient.)

    I’ve mentioned consequences, but today we’re talking about causes. So, we will analyze the left-hand side of the bow tie.

    Three Insightful Causal Analysis Methods

    Pareto Analysis

    So, let’s start with a Pareto Analysis. I suspect most of us have seen this before. If we look at the causes of a certain outcome. What we often find is that a few causes are dominant.

    An Example of a Pareto Chart.

    In this chart, we’ve got types of medication errors.  In this case ‘a dose missed,’ ‘wrong time,’ ‘wrong drug,’ and then ‘overdose’ accounts for 70% of the causation.  Everything else is only 30%.

    (Now, here they drew a line at 80% as the cutoff because sometimes Pareto is known as the eighty-twenty rule. And that’s suggesting that maybe 80% of the outcome is caused by 20 percent of the inputs or causes.  In other words, most of the output variable is driven by only 20% of the input variables.  That’s just a rule of thumb, and it doesn’t have to be 80/20, it might be 70/30, or 60/40, it doesn’t matter.)

    The point is there are some dominant causes. If we can identify the dominant causes, and we work hard on just those top 2, 3, 4, or 5 causes, then we can get a disproportionate reduction in risk by concentrating on those few things.  Whereas, we could spend an awful lot of effort at attacking all the other causes and make very little difference.

    It’s a simple technique, but by being led by the data we can become far more effective at risk management.

    Failure Mode and Effect Analysis (FMEA)

    FMEA is covered in another webinar. 

    Ishikawa Diagrams

    So an Ishikawa diagram or a fishbone diagram, as it’s often called for obvious reasons. Is a causal diagram (Image By FabianLange at de.wikipedia), and it’s often used.

    Example of an Ishiawa, or Fishbone, Diagram Structured for Causal Analysis.

    In accident investigations, the Ishikawa diagram becomes a vital tool. I recall learning its application through the tragic case of the Piper Alpha oil rig disaster. Despite the grim nature of such events, they demand thorough causal analysis. Whether we opt for predefined groupings like equipment, process, people, materials, environment, and management, or let the data guide us, the essence remains unchanged: we investigate accidents to identify potential outcomes or problems and determine their contributing factors.

    What makes this method invaluable is its ability to transcend technical issues alone. By encouraging us to consider the broader socio-technical environment, it prompts a holistic view of complex systems. The diagram visually represents primary causes directly linked to the main ‘fishbone’ of analysis, while secondary causes may contribute to or stem from these primary factors. The potential for tertiary causes exists in theory, but it may complicate matters without appropriate tools.

    Utilizing this technique for brainstorming is highly effective. Displaying it on a whiteboard and collectively contemplating it as a group fosters focused discussions. Subsequently, formal documentation in various formats ensures thorough record-keeping. This method proves particularly powerful for unraveling complexities within systems, a topic worthy of a dedicated webinar.

    Fault Tree Analysis

    Fault Tree Analysis is another widely used technique. We’ll have a webinar devoted to FTA later.

    The Eight Disciplines Method

    The Eight Disciplines method is one of those I often get mixed up with something else. It was introduced by the Ford Motor Co. (I’ve never used it) but it looks like a sensible method. There are actually nine steps:

    • Prepare and Plan
    • Form your Team
    • Identify the Problem
    • Develop an Interim Containment Plan
    • Verify Root Causes & Escape Points
    • Choose Permanent Corrective Actions
    • Implement Corrective Actions
    • Take Preventative Measures
    • Celebrate with Your Team!

    Effective problem-solving requires careful planning, especially when it’s a team effort. Let’s break it down into three key steps:

    1. Immediate Action: Start by addressing the urgency. What can we do right now to contain the problem while we develop a more comprehensive solution? It’s crucial to manage the issue in the short term as we work on a more refined approach.
    2. Identify Root Causes: Investigate when and how the situation spiraled out of control. Pinpoint the opportunities for errors within the process. Understanding the root causes and timing issues is essential before moving forward.
    3. Implement Permanent Solutions: Now that we’ve dissected the problem, it’s time to implement long-term corrective actions. This involves establishing better control measures and preventive strategies to avoid similar issues in the future.

    Finally, it’s important to celebrate with your team once the solution is in place. Whether it’s going out for a meal or another form of recognition, acknowledging the effort is crucial.

    This structured approach acknowledges the multi-stage nature of problem-solving. It emphasizes the need for short-term fixes, data-driven decision-making for long-term solutions, and proactive measures to prevent recurrences. Even if you take away nothing else, remembering these key points can guide you through the process. For more detailed information, check out the provided link, and stay tuned for a downloadable PDF with additional resources.

    Bonus – Cause Analysis Reports

    And a little bonus here, something I picked up while looking through this stuff if you go to smartsheet.com, you’ll find a whole bunch of nice templates on course analysis reports. Okay? So I haven’t been through them all but there looks like quite a lot of good stuff in there if you’re interested.

    We’ve created root cause analysis templates you can use to complete your own investigations. Whether you need root cause analysis Excel templates, a root cause analysis template for Word, or a PDF template, we have one that’s right for your organization.”

    https://www.smartsheet.com/free-root-cause-analysis-templates-complete-collection

    More Resources

    Interested in accessing more content from the Safety Artisan? Head over to my Thinkific platform, where you’ll find my courses and all the webinars available at the academy. Plus, you can test it out with a 7-day free membership trial. For those looking for an extended trial, use the code ‘one-month-free‘ to enjoy a full month on us. I am continually updating our content, adding new material every month to keep things fresh.

    Additionally, sign up for free email updates to stay informed about upcoming webinars and other exciting events.

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.

    Categories
    Blog Risk Assessment

    Introduction to System Safety Risk Assessment

    In this ‘Introduction to System Safety Risk Assessment’, we will pull together several key ideas.

    First, we’ll talk about System Safety. This is safety engineering done in a Systems Engineering Framework. We are doing safety within a rigorous process.

    Second, we’re talking about Risk Assessment. This is a term for putting together different activities within another process. This process may be basic, or it might be quite sophisticated, as illustrated, below.

    Shows the elements, progression and cycle of the Risk Assessment Process from ISO 31000
    The Risk Assessment Process

    Third, and finally, we will put all this together into a System Safety Program. This is hinted at in the diagram, above, but a real system safety program needs to do a lot more than this. It needs to tie into the project it supports, to systems engineering, to resources, quality, V&V, etc. Designing such a program is complex, so we typically follow a standard, like Mil-Std-882E.

    You can hear more about this in the introductory video, below.

    Introduction Video

    This post is part of a series:

    Transcript:

    Introduction

    Hello,

    Welcome to this course on Systems Safety Risk Analysis Programs. I’m Simon Di Nucci, The Safety Artisan, and I’ve been a safety engineer and consultant for over 20 years. I’ve worked on a wide range of safety programs doing risk analysis on all kinds of things. Ships, planes, trains, air traffic management systems, software systems, you name it.

    I’ve worked in the U.K., in Australia, and on many systems from the U.S. I’ve also spent hundreds of hours training hundreds of people on safety. And now I’ve got the opportunity to share some of that knowledge with you online.

    So, what are the benefits of this course?

    First of all, you will learn about basic concepts. About system safety, what it is and what it does. You will know how to apply a risk analysis program to a very complex system and how to manage that complexity. So, that’s what you’ll know.

    At the end of the course, you will also be able to do things that you might not have been able to do before. You will be able to take the elements of a risk analysis program and the different tasks. You can select the right tasks and form a program to suit your application, whatever it might be. Whether you might:

    • Have a full, high-risk bespoke development system,
    • Be taking a commercial system off the shelf and doing something new with it, or
    • Take a product and use it in a new application or a new location.

    Whatever it might be, you will learn how to tailor your risk analysis program. This program will give you the analyses you need. And to meet your legal and regulatory requirements. Once you’ve learned how to do this, you can apply it to almost any system.

    Finally, you will feel confident doing this. I will be interpreting the terminology used in the tasks and applying my experience. So, instead of reading the standard and being unsure of your interpretation, you can be sure of what you need to do. Also, I will show you how you can get good results and avoid some of the pitfalls.

    These are the three benefits of the Course

    1. You will know what to do.
    2. You will be able to perform risk program tasks, and
    3. You’ll feel confident doing those tasks.

    At the end of the course, I will also show you where to find further resources. There are free resources to choose from. But there are also paid resources for those who want to take your studies to the next level. I hope you enjoy the course.

    This is Module 1 of SSRAP

    This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.

    The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos here and order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.

    Categories
    Blog

    The 2024 Blog Digest – Q1/Q2

    The 2024 Blog Digest – Q1/Q2 brings you all of The Safety Artisan’s blog posts from the first six months of this year. I hope that you find this a useful resource!

    The 2024 Blog Digest – Q1/Q2: 25 Posts!

    • Crafting a Safety Case and Safety Case Report – Part 2
      In Crafting a Safety Case and Safety Case Report – Part 2, we move on to review and sign off on the artifacts. Introduction In any high-stakes environment—whether it’s defense, engineering, or aviation—Safety Case Reports play an essential role in validating the safety of a system. A meticulous review and sign-off process ensures that these… Read more: Crafting a Safety Case and Safety Case Report – Part 2
    • Crafting a Safety Case and Safety Case Report
      Crafting a Safety Case and Safety Case Report: A Comprehensive Guide for Project Safety Assurance – PART 1 [Picture by Eric Bruton from Pexels.com] Introduction Building a robust Safety Case and Safety Case Report is essential to ensuring the safety and regulatory compliance of complex systems within the Ministry of Defence (MOD) and similarly regulated… Read more: Crafting a Safety Case and Safety Case Report
    • In-Service Safety Management System
      In-Service Safety Management System: Ensuring Long-Term Safety for Military Equipment Safety is paramount when it comes to military operations, especially for in-service equipment relied upon by personnel daily. This article delves into the intricacies of maintaining an In-Service Safety Management System, offering insight into how safety practices are implemented, monitored, and evolved over time. Introduction:… Read more: In-Service Safety Management System
    • Comprehensive Project Safety Management Plans: A Guide
      Comprehensive Project Safety Management Plans. Safety is a critical element in any large-scale project, especially in the context of defense and complex systems. One essential tool for managing safety is a Safety Management Plan (SMP). In this article, we’ll break down the process and structure of an effective SMP, highlighting its objectives, content, and how… Read more: Comprehensive Project Safety Management Plans: A Guide
    • Guide to Establishing and Running a Project Safety Committee (PSC)
      Our Second Safety Management Procedure is the Project Safety Committee. Okay, so committees are not the sexiest subject, but we need to get stakeholders together to make things happen! Project Safety Committee: Introduction In safety-critical industries such as defense, aerospace, and engineering, maintaining a robust safety management system (SMS) is paramount. A Project Safety Committee… Read more: Guide to Establishing and Running a Project Safety Committee (PSC)
    • Project Safety Initiation
      In ‘Project Safety Initiation’ we look at what you need to do to get your safety project or program started. Introduction Definitions A stakeholder is anyone who will be affected by the introduction of the system and who needs to be consulted or informed about the development and fielding of the system, and anyone who contributes to… Read more: Project Safety Initiation
    • Members Get a Free Intro Course, 50% Off & Updates
      Members Get a Free Intro Course, 50% Off & Updates. I will send you the links and discount codes via email. So, tick the email box and check your junk mail to receive the offers. You will get an email series showcasing the free/paid resources. Also, regular updates on new articles: never miss another post!… Read more: Members Get a Free Intro Course, 50% Off & Updates
    • More Resources for Risk Assessment
      Welcome to Module Five, More Resources for Risk Assessment. We’re on the home straight now! This is the last of the five modules. I will let you know where to get more resources and help on these topics. Course Learning Objectives More Resources for Risk Assessment: Transcript Copyright/Source Statement “First, I want to point out… Read more: More Resources for Risk Assessment
    • Designing Your Risk Assessment Program
      Designing Your Risk Assessment Program. Which Ingredients should we use? In this post, I draw upon my 25+ years in system safety to give you some BOLD advice! I’m going to dare to suggest which analysis tasks are essential to every System Safety Program. I also suggest which tasks are optional depending on the system… Read more: Designing Your Risk Assessment Program
    • Understanding Your Risk Assessment Standard
      When Understanding Your Risk Assessment Standard, we need to know a few things. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we will ask… Read more: Understanding Your Risk Assessment Standard
    • Risk Management 101
      Welcome to Risk Management 101, where we’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts and then we’re going to build it up again and show you how it’s done. I’ve been involved in risk management, in project risk management, safety risk management,… Read more: Risk Management 101
    • System Safety Risk Analysis
      In this module, System Safety Risk Analysis, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.
    • Updating Legal Presumptions for Computer Reliability
      TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice! Background The ‘Horizon’ Scandal in the UK was a major miscarriage of justice: Between 1999 and 2015, over 900 sub postmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions… Read more: Updating Legal Presumptions for Computer Reliability
    • Hazard and Risk Basics
      What are the Hazard and Risk basics? So, what is this risk analysis stuff all about? What is ‘risk’? How do you define or describe it? How do you measure it? When? Why? Who…? In this free session, I explain the basic terms and show how they link together, and how we can break them… Read more: Hazard and Risk Basics
    • SSRAP: Start the Course
      This post, ‘SSRAP: Start the Course’, gives an overview of System Safety Risk Assessment Programs. It describes the Learning Objectives of the Course and its five modules. We’re going to learn how to: This post is part of a series: SSRAP: Start of the Course – Transcript Welcome to this course on System Safety Risk… Read more: SSRAP: Start the Course
    • Three Insightful Methods for Causal Analysis
      In this post, we will look at Three Insightful Methods for Causal Analysis.  Only three?!  If you search online, you will probably find eight methods coming up: However, not all these methods are created equal!  Only some provide real insight to the challenge of causal analysis.  So, I’ve picked the best ones – based on… Read more: Three Insightful Methods for Causal Analysis
    • Introduction to System Safety Risk Assessment
      In this ‘Introduction to System Safety Risk Assessment’, we will pull together several key ideas. First, we’ll talk about System Safety. This is safety engineering done in a Systems Engineering Framework. We are doing safety within a rigorous process. Second, we’re talking about Risk Assessment. This is a term for putting together different activities within… Read more: Introduction to System Safety Risk Assessment
    • The 2024 Blog Digest – Q1/Q2
      The 2024 Blog Digest – Q1/Q2 brings you all of The Safety Artisan’s blog posts from the first six months of this year. I hope that you find this a useful resource! The 2024 Blog Digest – Q1/Q2: 25 Posts! There’s More! Head over to my Thinkfic Site for courses & webinars. Subscribe for a… Read more: The 2024 Blog Digest – Q1/Q2
    • Environmental Hazard Analysis
      This is the full-length (one hour) session on Environmental Hazard Analysis (EHA), which is Task 210 in Mil-Std-882E. I explore the aim, task description, and contracting requirements of this Task, but this is only half the video. In the commentary, I then look at environmental requirements in the USA, UK, and Australia, before examining how… Read more: Environmental Hazard Analysis
    • System of Systems Hazard Analysis
      In this full-length (38-minute) session, The Safety Artisan looks at System of Systems Hazard Analysis, or SoSHA, which is Task 209 in Mil-Std-882E. SoSHA analyses collections of systems, which are often put together to create a new capability, which is enabled by human brokering between the different systems. We explore the aim, description, and contracting… Read more: System of Systems Hazard Analysis
    • Health Hazard Analysis
      In this full-length (55-minute) session, The Safety Artisan looks at Health Hazard Analysis, or HHA, which is Task 207 in Mil-Std-882E. I explore the aim, description, and contracting requirements of this complex Task. It covers: physical, chemical & biological hazards; Hazardous Materials (HAZMAT); ergonomics, aka Human Factors; the Operational Environment; and non/ionizing radiation. I will… Read more: Health Hazard Analysis
    • Preliminary Hazard Identification & Analysis Guide: Free
      Get the Preliminary Hazard Identification & Analysis Guide for free! It’s a 50-page .pdf download, collated from reliable sources. Contents: Preliminary Hazard Identification & Analysis Guide – Introduction Hazard Identification has been defined as: “The process of identifying and listing the hazards and accidents associated with a system.” Hazard Analysis has been defined as: “The… Read more: Preliminary Hazard Identification & Analysis Guide: Free
    • Safety and Risk Audit
      So, what I’m talking about today is safety and risk audit, that is about process, Q&A, and some personal experience. Also something called layered process audits, which I ran into while researching this webinar. I thought that sounded interesting – and it is! Those are today’s topics for the webinar. Audit Process I’m talking about… Read more: Safety and Risk Audit
    • Operating & Support Hazard Analysis
      In this full-length session, I look at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. I explore Task 206’s aim, description, scope, and contracting requirements. There’s value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the… Read more: Operating & Support Hazard Analysis
    • System Requirements Hazard Analysis
      In this 45-minute session, I’m looking at System Requirements Hazard Analysis, or SRHA, which is Task 203 in the Mil-Std-882E standard. I will explore Task 203’s aim, description, scope, and contracting requirements.  SRHA is an important and complex task, which must be done on several levels to succeed.  This video explains the issues and discusses… Read more: System Requirements Hazard Analysis

    There’s More!

    Head over to my Thinkfic Site for courses & webinars. Subscribe for a free course starter pack and regular email support. Leave a comment, below!

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.

    Categories
    Blog Mil-Std-882E

    Environmental Hazard Analysis

    This is the full-length (one hour) session on Environmental Hazard Analysis (EHA), which is Task 210 in Mil-Std-882E. I explore the aim, task description, and contracting requirements of this Task, but this is only half the video. In the commentary, I then look at environmental requirements in the USA, UK, and Australia, before examining how to apply EHA in detail under the Australian/international regime. This uses my practical experience of applying EHA. 

    You Will Learn to:

    • Conduct EHA according to the standard;
    • Record EHA results correctly;
    • Contract for EHA successfully;
    • Be aware of the regulatory scene in the US, UK, and Australia;
    • Appreciate the complexities of conducting EHA in Australia; and
    • Recognize when your EHA program requires specialist support.
    This is the seven-minute demo of the full-length (one hour) session on Environmental Hazard Analysis.

    Topics: Environmental Hazard Analysis

    • Environmental Hazard Analysis (EHA) Purpose;
    • Task Description (7+ slides);
    • Documentation, HAZMAT & Contracting (2 slides each);
    • Commentary (8 slides); and
    • Conclusion.

    Transcript: Environmental Hazard Analysis

    Introduction

    Hi, everyone, and welcome to the Safety Artisan. Today, we’re going to be talking about Environmental Hazard Analysis – A big topic! And I’m covering this as part of the series on the System Safety Engineering Standard – Mil. Standard 882E. But it doesn’t really matter what standard we are using the topic is still relevant.

    Environmental Hazard Analysis is a big topic because we’ll cover everything, not just hazards. At the end of this session, you should be able to enjoy three benefits. First of all, you should know how to approach Environmental hazard analysis from:

    • The point of view of the requirements,
    • The Hazard Analysis itself (the process), and
    • Some national and international variations in the English-speaking world.

    So, you should know how to do the basics and also to recognize when maybe you need to bring in a specialist.

    But maybe most important of all, number three is you should have the confidence to be able to get started. So I’m hoping that this session is really going to help you get started, know what you can do, and then maybe recognize when you need to bring in some specialist help or go and seek some further information.

    As you’ll see, it’s a big, complex subject. I can get you started today, but that’s all I can do in one session. And in fact, I think that’s all anyone can do in one session. Anyway, let’s get on with it and see what we’ve got.

    Environmental Hazard Analysis, Mil-Std-882E Task 210

    Environmental Hazard Analysis, which is Task 210 under Mil. Standard 882E. So let’s look at what we’re going to talk about today.

    Topics for this Session

    And you’ll see why it’s going to be quite a lengthy session. I think it will last an hour because we’re going to go through the Purpose and Task Description of Environmental Hazard Analysis as set out in the Mil. Standard. And it says seven-plus slides because there are seven mainstream slides plus some illustrations in there as well. Then we’ve got a couple of slides each on Documentation, Hazardous Materials or HAZMAT, and Contracting. Then eight slides of Commentary and this is the major value add because I’ll be talking about applying Environmental Hazard Analysis in a US, UK, and Australian jurisdiction under the different laws, which I have some experience of.

    I worked closely with environmental specialists on the Eurofighter Typhoon project, and I’ve also worked closely with the same specialists on US programs which had been bought by different countries. And then finally, I’ve been closely involved in a major environmental – or safety and environmental – project here in Australia. So I’ve been exposed and learned the hard way about how things work or don’t work here in Australia. So I’ve got some relevant experience to share with you, as well as some learned material to share with you. And then a little Conclusion, because I say this will take us an hour so there’s quite a lot of material to cover. So, let’s get right on with it.

    EHA

    So the purpose of Environmental Hazards Analysis, or EHA, as it says, is to support design development decisions. Now all of the 882 tasks are meant to do this, but actually, the wording in Task 210 is the clearest of all of them. Really makes it explicit what we’re trying to do, which is excellent.

    So we’re going to identify hazards throughout the life cycle – cradle to grave, whatever system it is. We’re going to document and record those hazards and their leading particulars within the Hazard Tracking System or Hazard Log, as we more often call it. We’re going to manage the hazards using the same system safety process in Section Four as we use for safety. This is the process that you will have heard in the other lessons that I’ve given. And very often under 882, Safety and Environmental Hazards are considered together. There are pros and cons with that approach, but nevertheless, a lot of the work is common. We’ll see why later on.

    In this American standard, it says we are to provide specific data to support the National Environmental Policy Act and executive order requirements. So the NEPA is an American piece of legislation and therefore I use this color blue to indicate anything that’s an American-specific requirement. So if you’re not operating in America, you’ll need to find the equivalent to manage to and to comply with. Moving on…

    …see the full transcript here (TBD).

    Links: Environmental Hazard Analysis

    The links mentioned in the video are here:

    You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.

    Categories
    Mil-Std-882E Safety Analysis

    System of Systems Hazard Analysis

    In this full-length (38-minute) session, The Safety Artisan looks at System of Systems Hazard Analysis, or SoSHA, which is Task 209 in Mil-Std-882E. SoSHA analyses collections of systems, which are often put together to create a new capability, which is enabled by human brokering between the different systems. We explore the aim, description, and contracting requirements of this Task, and an extended example to illustrate SoSHA. (We refer to other lessons for special techniques for Human Factors analysis.)

    This is the seven-minute demo version of the full 38-minute video.

    System of Systems Hazard Analysis: Topics

    • System of Systems (SoS) HA Purpose;
    • Task Description (2 slides);
    • Documentation (2 slides);
    • Contracting (2 slides);
    • Example (7 slides); and
    • Summary.

    Transcript: System of Systems Hazard Analysis

    Introduction

    Hello everyone and welcome to the Safety Artisan. I’m Simon and today we’re going to be talking about System of Systems Hazard Analysis – a bit of a mouthful that. What does it actually mean? Well, we shall see.

    System of Systems Hazard Analysis

    So, for Systems of Systems Hazard Analysis, we’re using task 209 as the description of what to do taken from a military standard, 882E. But to be honest, it doesn’t really matter whether you’re doing a military system or a civil system, whatever it might be – if you’ve got a system of systems, then this will help you to do it.

    Topics for this Session

    So, we look at the purpose of system of systems. By the way, if you’re wondering what that is what I’m talking about is when we take different things that we’ve developed elsewhere, e.g. platforms, electronic systems, whatever it might be, and we put them together. Usually, with humans gluing the system together somewhere, it must be said, to make it all tick and fit together.

    Then we want this collection of systems to do something new, to give us some new capability, which we didn’t have before. So, that’s what I’m talking about when I say system of systems. I’ll show you an example – it’s the best way.

    We’ve got a couple of slides on task description, a couple of slides or documentation, and a couple of slides on contracting. Task 209 has a very short task description, and therefore I’ve decided to go through an example. So, we’ve got seven slides of an example of a system of systems, safety case, and safety case report that I wrote. Hopefully, that will illustrate far better than just reading out the description. And that will also give us some issues that can emerge with systems of systems and I’ll summarize those at the end.

    SOSHA Purpose

    So, let’s get on. I’m going to call it the SOSHA for short; Systems of Systems Hazard Analysis. The purpose of the SOSHA, task 209, is to document or perform and document the analysis of the system of systems and identify unique system of systems hazards. So, things we don’t get from each system in isolation. This task is going to produce special requirements to deal with these hazards, which otherwise would not exist. Until we put the things together and start using them for something new – We’ve not done this before…

    see the full transcript here.

    End: System of Systems Hazard Analysis

    So, that is the end of the presentation and it just remains for me to say thanks very much for watching and listening. It’s been good to spend some time with you and I look forward to talking to you next time about environmental analysis, which is Task 210 in the military standard … until then, goodbye.

    Meet the Author

    Learn safety engineering with me, an industry professional with 25 years of experience, I have:

    •Worked on aircraft, ships, submarines, ATMS, trains, and software;

    •Tiny programs to some of the biggest (Eurofighter, Future Submarine);

    •In the UK and Australia, on US and European programs;

    •Taught safety to hundreds of people in the classroom, and thousands online;

    •Presented on safety topics at several international conferences.