When Understanding Your Risk Assessment Standard, we need to know a few things. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we will ask ‘Are we doing the right thing?’ And ‘Are we doing it right?’
So, what will we do and why are we doing it? First, the use of safety standards is very common for many reasons. It helps us to have confidence that what we’re doing is good enough. We’ve met a standard of performance in the absolute sense. It helps us to say, ‘We’ve achieved standardization or commonality in what we’re doing’.
We can also use it to help us achieve a compromise. That can be a compromise across different stakeholders or different organizations. Standardization gives us some of the other benefits as well. If we’re all doing the same thing rather than we’re all doing different things, it makes it easier to train staff. This is one example of how a standard helps.
However, we need to understand this tool that we’re going to use. What it does, what it’s designed to do, and what it is not designed to do. That’s important for any standard or any tool. In safety, it’s particularly important because safety is in many respects an intangible. This is because we’re always looking to prevent a future problem from occurring. In the present, it’s a little bit abstract. It’s a bit intangible. So, we need to make sure that in concept what we’re doing makes sense and it’s coherent. That it works together. If we look at those five bullet points there, we need to understand the concept of each standard. We need to understand the basis of each one.
They’re not all based on the same concept. Thus, some of them are contradictory or incompatible. We need to understand the design of the standard. What the standard does, what the aim of the standard is, and why it came into existence. And who brought it into existence. To do what for who – who’s the ultimate customer here?
For risk analysis standards, we need to understand what kind of risks it addresses. Because the way you treat a financial risk might be very different from a safety risk. In the world of finance, you might have a portfolio of products, like loans. These products might have some risks associated with them. One or two loans might go bad and you might lose money on those. But as long as the whole portfolio is making money that might be acceptable to you. You might say, ‘I’m not worried about that 10% of my loans have gone south and all gone wrong. I’m still making plenty of profit out of the other 90%’. It doesn’t work that way with safety. You can’t say ‘It’s OK that I’ve killed a few people over here because all this a lot over here are still alive!’. It doesn’t work like that!
Also, what kind of evidence does the standard produce? Because in safety, we are very often working in a legal framework that requires us to do certain things. It requires us to achieve a certain level of safety and prove that we have done so. So, we need certain kinds of evidence. In different jurisdictions and different industries, some evidence is acceptable. Some are not. You need to know which is for your area. And then finally, let’s think about the pros and cons of the standard, what does it do well? And what does it do not so well?
System Safety Pedigree
We’re going to look at a standard called Military Standard 882E. This standard was first developed several decades ago. It was created by the US government and military to help them bring into service complex cutting-edge military equipment. Equipment that was always on the cutting edge. That pushes the limits of what you can achieve in performance.
That’s a lot of complexity. Lots of critical weapon systems, and so forth. So they needed something that could cope with all that complexity. It’s a system safety engineering standard. It’s used by engineers, but also by many other specialists. As I said, it’s got a background in military systems. These days you find these principles used pretty much everywhere. So, all the approaches to System Safety that 882 introduced are in other standards. They are also in other countries.
It addresses risks to people, equipment, and the environment, as we heard earlier. And because it’s an American standard, it’s about system safety. It’s very much about identifying requirements. What do we need to happen to get safety? To do that, it produces lots of requirements. It performs analyses of all those requirements and generates further requirements. And it produces requirements for test evidence. We then need to fulfill these requirements. It’s got several important advantages and disadvantages. We’re going to discuss these in the next few slides…
This is Module 3 of SSRAP
‘Understanding Your Risk Assessment Standard’ is Module 3 of the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
Welcome to Risk Management 101, where we’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts and then we’re going to build it up again and show you how it’s done. I’ve been involved in risk management, in project risk management, safety risk management, etc., for a long, long time. I hope that I can put my experience to good use, helping you in whatever you want to do with this information.
Maybe you’re getting an interview. Maybe you want to learn some basics and decide whether you want to know more about risk management or not. Whatever it might be, I think you’ll find this short session really useful. I hope you enjoy it and thanks for watching.
Hi everyone and welcome to Risk Management 101. We’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts. Then we’re going to build it up again and show you how it’s done.
My name is Simon Di Nucci and I have a lot of experience working in risk management, project risk management, safety risk management, etc. I’m hoping that I can put my experience to good use, helping you in whatever you want to do with this information. Whether you’re going for an interview or you want to learn some basics. You can watch this video and decide if you want to know more about risk management or if you don’t need to. Whatever it might be, you’ll find this short session useful. I hope you enjoy it and thanks for watching.
Topics For This Session
Risk Management 101. So what does it all mean? We’re going to break risk management down into we’ve got six constituent parts. I’m using a particular standard that breaks it down this way. Other standards will do this in different ways. We’ll talk about that later. Here we’ve got risk management broken down into; hazard identification, hazard analysis, risk estimation, risk evaluation (and ALARP), risk reduction, and risk acceptance.
Risk Management
Let’s get right on to that. Risk management – what is it? It’s defined as “the systematic application of management policies, procedures, and practices to the tasks of hazard identification, hazard analysis, risk estimation, risk and ALARP evaluation, risk reduction, and risk acceptance”.
There are a couple of things to note here. We’re talking about management policies, procedures, and practices. The ‘how’ we do it. Whether it’s a high-level policy or low-level common practice. E.g. how things are done in our organization vs how the day-to-day tasks are done? And it’s also worth saying that when we talk about ‘hazards’, that’s a safety ‘ism’. If we were doing security risk management, we could be talking about ‘threats’. We can also be talking about ‘causes’ in day-to-day language. So, we can be talking about something causing a risk or leading to a risk. More on that later, but that’s an overview of what risk management is.
Part 1
Let’s look at it in a different way. For those of you who like a visual representation, here is a graph of the hierarchical breakdown. They need to happen in order, more or less, left to right. And as you can see, there’s a link between risk evaluation and risk reduction. We’ll come on to that. So, it’s not ‘or’ it’s a serial ‘this is what you have to do’. Sometimes they’re linked together more intimately.
Hazard Identification
First of all, hazard identification. So, this is the process where we identify and list hazards and accidents associated with the system. You may notice that some words here are in bold. Where a word is in bold, we are going to give the definition of what it is later.
These hazards could lead to an accident but are only associated with the system. That’s the scope. If we were talking about a system that was an airplane, a ship, or a computer, we would have a very different scope. There would also be a different way that maybe accidents would happen.
On a more practical level, how do we do hazard identification? I’m not going to go into any depth here, but there are certain classic ones. We can consult with our workers and inspect the workplace where they’re operating. In some countries, that’s a legal requirement (Including in Australia where I live). Another option is looking at historical data. And indeed, in some countries and in some industries, that’s a requirement. A requirement means we have to do that. And we can use special analysis techniques. Now, I’m not going to talk about any of those analysis techniques today. You can watch some other sessions on The Safety Artisan to see that.
Hazard Analysis
Having done hazard identification, we’ve asked ourselves ‘What could go wrong?’. We can put some more detail on and ask, ‘How could it go wrong? And how often?’. That kind of stuff. So, we want to go into more detail about the hazards and accidents associated with this particular system. And that will help us to define some accident sequences. We can start with something that creates a hazard and then the hazard may lead to an accident. And that’s what we’re talking about. Later, we will show that using graphics can be helpful.
But again, more on terminology. In different industries, we call it different things. We tend to say ‘accident’ in the UK and Australia. In the U.S., they might call it a ‘mishap’, which is trying to get away from the idea that something was accidental. Nobody meant it to happen. Mishap is a more generic term that avoids that implication. We also talk about ‘losses’ or we talk about ‘breaches’ in the security world. We have some issues where somebody has been able to get in somewhere that they should not. And we can talk about accident sequences. Or, in a more common language, we call it a sequence of events. That’s all it is.
Risk Estimation
Now we’re talking about the risk estimation. We’ve thought about our hazards and accidents and how they might progress from one to another. Let’s think about, ‘How big is the risk of this actually happening?’. Again, we’ll unpack this further later at the next level. But for now, we’re going to talk about the systematic use of available information. Systematic- so, ordered. We’re following a process. This isn’t somebody on their own taking a subjective view ‘Look, I think it’s not that’. It’s a process that is repeatable. We want to do something systematic. It’s thorough, it’s repeatable, and so it’s defendable. We can justify the conclusions that we’ve come to because we’ve done it with some rigour. We’ve done it in a systematic way. That’s important. Particularly if we’re talking about harm coming to people or big losses.
Risk and ALARP / SFARP Evaluation
Now, risk evaluation is just taking that estimated risk just now and comparing it to something and saying, “How serious is this risk?”. Is it something that is very low? If it’s very insignificant then we’re not bothered about it. We can live with it. We can accept it. Or is it bigger than that? Do we need to do something more about it? Again, we want to be systematic. We want to determine whether risk reduction is necessary. Is this acceptable as it is or is it too high and we need to reduce it? That’s the core of risk evaluation.
Tolerability
In this UK-based standard – we’re using terminology is found in different forms around the world. But in the UK, they talk about ‘tolerability’. We’re talking about the absolute level of risk. There probably is an upper limit that’s allowed in the law or in our industry. And there’s a lower limit that we’re aiming for. In an ideal world, we’d like all our risks to be low-level risks. That would be terrific.
So, that’s ‘tolerability’. And you might hear it called different things. And then within the UK system, there are three classes of ‘tolerability’ at risk. We could say it’s either ‘broadly acceptable’- it’s very low. It’s down in the target region where we like to get all our risks. It’s ‘tolerable’- we can expose people to this risk or we can live with this risk, but only if we’ve met certain other criteria. And then there’s the risk that it’s so big. It’s so far up there, that we can’t do that. We can’t have that under any circumstances. It’s unacceptable. You can imagine a traffic light system where we have categorized our risk.
ALARP / SFARP
And then there’s the test of whether our risk can be accepted in the UK. It’s called ALARP. We reduce the risk As Low As Reasonably Practicable. And in other places, you’ll see SFARP. We’ve eliminated or minimized the risk So Far As Is Reasonably Practicable. In the nuclear industry, they talk about ALARA: As Low As Reasonably Achievable. And then different laws use different tests. Whichever one you use, there’s a test that we have to say, “Can we accept the risk?” “Have we done enough risk reduction?”. And whatever you’ve put in those square brackets, that’s the test that you’re using. And that will vary from jurisdiction to jurisdiction. The basic concept of risk evaluation is estimating the level of risk. Then compare it to some standard or some regulation. Whatever it might be, that’s what we do. That’s risk evaluation.
Risk Reduction
We’ve asked, “Do we need to reduce risk further?”. And if we do, we need to do some risk reduction. Again, we’re being systematic. This is not some subjective thing where we go “I have done some stuff, it’ll be alright. That’s enough.”. We’re being a bit more rigorous than that. We’ve got a systematic process for reducing risk. And in many parts of the world, we’re directed to do things in a certain way.
Elimination
This is an illustration from an Australian regulation. In this regulation, we’re aiming to eliminate risk. We want to start with the most effective risk reduction measures. Elimination is “We’ve reduced the risk to zero”. That would be lovely if we could do that but we can’t always do that.
Substitution
What’s the next level? We could get rid of this risk by substituting something less risky. Imagine we’ve got a combustion engine powering something. The combustion engine needs flammable fuel and it produces toxic fumes. It could release carbon monoxide and CO2 and other things that we don’t want. We ask, “Can we get rid of that?”. Could we have an electric motor and have a battery instead? That might be a lot safer than the combustion engine. That is a substitution. There are still risks with electricity. But by doing this we’ve substituted something risky for something less risky.
Isolation
Or we could isolate the hazard. Let’s use the combustion engine as an example again. We can say, “I’ll put that in the fuel and the exhaust somewhere, a long way from people”. Then it’ll be a long way from where it can do harm or cause a loss.” And that’s another way of dealing with it.
Engineering Controls
Or we could say, “I’m going to reduce the risks through engineering controls”. We could put in something engineered. For example, we can put in a smoke detector. A very simple, therefore highly reliable, device. It’s certainly more reliable than a human. You can install one that can detect some noxious gases. It’s also good if it’s a carbon monoxide detector. Humans cannot detect carbon monoxide at all. (Except if you’ve got carbon monoxide poisoning, you’ll know about it. Carbon monoxide poisoning gives you terrible headaches and other symptoms.) But of course, that’s not a good way to detect that you’re breathing in poisonous gas. We do not want to do it that way.
So, we can have an engineering control to protect people. Or we can use an interlock. We can isolate things in a building or behind a wall or whatever. And if somebody opens the door, then that forces the thing to cut out so it’s no longer dangerous. There are different things for engineering controls that we can introduce. They do not rely on people. They work regardless of what any person does.
Administrative / Procedural Controls
Next on the list, we could reduce exposure to the hazard by using administrative controls. That’s giving somebody some rules to follow a procedure. “Do this. Don’t do that.” Now, that’s all good. We can give people warning signs and warn people not to approach something. But, of course, sometimes people break the rules for good reasons. Maybe they don’t understand. Or, maybe they don’t know the danger. Perhaps they’ve got to do something or maybe the procedure that we’ve given them doesn’t work very well. It’s too difficult to get the job done, so people cut corners. So, procedural protection can be weak. And a bit hit-and-miss sometimes.
Personal Protective Equipment
Finally, we can give people personal protective equipment. We can give them some eye protection. I’m wearing glasses because I’m short-sighted. But you can get some goggles to protect your eyes from damage. Damage like splashes, flying fragments, sparks, etc. We can have a hard hat so that if we’re on a building site and something drops from above on us that protects the old brain box.
It won’t stop the accident from happening, but it will help reduce the severity of the accident. That’s the least effective. We’re doing nothing to prevent the accident from happening. We’re reducing the severity in certain circumstances. For example, if you drop a ton of bricks on me, it doesn’t matter whether I’m wearing a hard hat or not. I’m still going to get crushed. But with one brick, I should be able to survive that if I’m wearing a hard hat.
Risk Acceptance
Let’s move on to risk acceptance. At some stage, if we have reduced the risk to a point where we can accept it. That is, we can live with it and we’ve decided that we’re going to need to do whatever it is that is exposing us to the risk. We need to use the system. For example, we want to get in our car to enable us to go from A to B quickly and independently. So, we’re going to accept the risk of driving in our car. We’ve decided we’re going to do that. We make risk-acceptance decisions every day, often without thinking about it. We get in a car every day on average and we don’t worry about the risk, but it’s always there. We’ve just decided to accept it.
But in this example, it’s not an individual deciding to do something on the spur of the moment. Nor is it based on personal experience. We’ve got a systematic process where a bunch of people come together. The relevant stakeholders agree that a risk has been assessed or has been estimated and has been evaluated. They agree that the risk reduction is good enough and that we will accept that risk. There’s a bit more to it than you and I saying “That’ll be alright.”
Part 2
Let’s summarise where we’ve got to. We’ve talked about these six components of risk management. That’s terrific. And as you can see, they all go together. Risk evaluation and risk reduction are more tightly coupled. That’s because when we do some risk reduction, we then re-evaluate the risk. We ask ‘Can we accept it?’. If the answer is ‘No.’ we need to do some more work. Then we do some more risk reduction. So those tend to be a bit more coupled together at the end. That’s the level we’ve got to. We’re now going to go to the next level.
So, we’re going to explain these things. We’ve talked about hazard identification and hazard analysis, but what is a hazard? And what is an accident? And what is an accident sequence? We’re going to unpack that a bit more. We’re going to take it to the next level. And throughout this, we’re talking about risk over and over again. Well, what is ‘risk’? We’re going to unpack that to the next level as well.
This is a safety standard. We’re talking about harm to people. How likely is that harm and how severe might it be? But it might be something else. It might be a loss or a security breach. Or a financial loss, a negative result for our project. We might find ourselves running late. Or we’re running over budget. We might be failing to meet quality requirements. Or we’re failing to deliver the full functionality that we said we would. Whatever it might be.
Hazard
So, let’s unpack this at the next level. A hazard is a term that we use, particularly in safety. As I say, we call it other things in different realms. But in the safety world, it’s a physical situation or it’s a state of a system.
As it says, it often follows from some initiating event that we may call a ‘cause’. The hazard may lead to an accident. However, the key thing to remember is once a hazard exists, an accident is possible, but it’s not certain. You can imagine the sort of cartoon banana skin on the pavement gag. Well, the banana skin is the hazard. In the cartoon, the cartoon character always steps on the banana skin. They always fall over the comic effect. But in the real world, nobody may tread on the banana skin and slip over. There could be nobody there to slip over all the banana skin. Or even if somebody does, they could catch themselves. Or they fall, but it’s on a soft surface and they don’t hurt themselves so there’s no harm.
So, the accident isn’t certain. And in fact, we can have what we call ‘non-accident’ outcomes. We can have harmless consequences. A hazard is an important midway step. I heard it called an accident waiting to happen, which is a helpful definition. An accident waiting to happen, but it doesn’t mean that the accident is inevitable.
Accident
But accidents can happen. Again, the ‘accident’, ‘mishap’, or ‘unintended event’. Something we did not want or a sequence of events that caused harm. And in this case, we’re talking about harm to people. And as I say, it might be a security breach. It might be a financial loss or reputational damage. Something might happen that is very embarrassing for an organization or an individual. Or again, we could have a hiccup with our project.
Harm
But in this case, we’re talking about harm. With this kind of standard, we’re using what you might call a body count approach to the harm. We’re talking about actual death, physical injury, or damage to the health of people.
This standard also considers the damage to property and the environment. Now, very often we are legally required to protect people and the environment from harm. Property less so. However, there will be financial implications of losses of property or damage to the systems. We don’t want that. But it’s not always criminally illegal to do that. Whereas usually, hurting people and damaging the environment is. So, this is ‘harm’. We do not want this thing to happen. We do not want this impact.
Safety is a much tougher business in this instance. If we have a problem with our project, it’s embarrassing but we could recover it. It’s more difficult to do that when we hurt somebody.
Risk
And always in these terms, we’re talking about ‘risk’. What is ‘risk’? Risk is a combination of two things. It’s a combination of the likelihood of harm or loss and the severity of that harm or loss. It’s those two things together. And we’ve got a very simple illustration here, a little table. And they’re often known as a risk matrix but don’t worry about that too much. Whatever you want to call it. We’ve got a little two by two table here and we’ve got likelihood in the white text and severity in the black.
Low Risk
We can imagine where there’s a risk where we have a low likelihood of a ‘low harm’ or a ‘low impact’ accident or outcome. We say, ‘That’s unlikely to happen, and even if it does not much is going to happen.’ It’s going to be a very small impact. So, we’d say that that’s a low risk.
Then at the other end of the spectrum, we can imagine something that has a high likelihood of happening. And that likelihood also has a high impact. Things that happen that we definitely do not want to happen. And we say, ‘That’s a high risk and that’s something that we are very, very concerned about.’
Medium Risk
And then in the middle, we could have a combination of an outcome that is quite likely, but it’s of low severity. Or it’s of high severity, but it’s unlikely to happen. And we say, ‘That’s a medium risk’.
Now, this is a very simplified matrix for teaching purposes only. In the real world, you will see matrices that are four by four, five by five, or even six by six, or combinations thereof. And in security where they talk about threat and vulnerability and the outcomes. Here, you might see multiple matrices used. They use multiple matrices to progressively build up a picture of the risk. They use matrices as building blocks. So, it may not be only one matrix used in a more complex thing you’ve got to model. But here we’ve got a nice, simple example. This illustrates what risk is. It’s a combination of severity and likelihood of harm or loss. And that’s what risk is, fundamentally. And if we have a firm grasp of these fundamentals, it’ll help us to reason and deal with almost anything. With enough application.
Accident Sequence
Now, let’s move on and talk about accident sequences. We’re talking about a progression in this case. We’re imagining a left-to-right path. A progression of events that results in an accident. This diagram, which looks like a bow tie, is meant to represent the idea that we can have one hazard. There might be many causes that lead to this hazard. There might be many different things that could create the hazard or initiate the hazard. And the hazard may have many different consequences.
Consequences
As I’ve said before, nothing at all may happen. That might be the consequence of the hazard. Most of the time that’s what’s going to happen. But there may be a variety of consequences. Somebody might get a minor injury or there might be a more serious accident where one or more people are killed. A good example of this is fire. So, the hazard is the fire. The causes might be various. We could be dealing with flammable chemicals, or a lightning strike, or an electricity arc flash. Or we could be dealing with very high temperatures where things spontaneously burst into flames. Or we could have a chemical in the presence of pure oxygen. Some things will spontaneously burst into flames in the presence of pure oxygen. So there’re a variety of causes that lead to the fire.
An Example
And the fire might be very small and burn itself out. It causes very little damage and nobody gets hurt. Or it might lead to a much bigger fire that, in theory, could kill lots of people. So, there’s a huge range of consequences potentially from one hazard. But the accident sequence is how we would describe and capture this progression. From initiating events to the hazard to the possible consequences. And by modeling the accident sequence, of course, we can think about how we could interrupt it.
Part 3
We’ve broken risk management down into those six constituent parts. We’ve gone to the next level, in that we’ve sort of gone down to the concepts that underpin these things. These hazards, the accidents, and the accident sequence. We’ve talked about risk itself and what we don’t want to happen. The harm, the loss, the financial loss, the embarrassment, the failed or late or budget project, a security breach, the undesired event, etc. We had an objective which was to do something safely or to complete a project and the risk is that that won’t happen. That there’ll be an impact on what we were trying to do that is negative. That is undesirable.
There are just only more concepts that we need to look at to complete the pattern, as you can see. We’ve been talking about the system. And we’ve been talking about doing things systematically. Then a system works in an operating environment. So, let’s unpack that.
System
First of all, we have a system. The system is going to be a combination of things. I wouldn’t call a pen or a pencil a system. It’s only got a couple of components. You could pull it apart. But it’s too simple to be worth calling it a system. We wouldn’t call it a pen system, would we? So, a system is something more complex. It’s a combination of things and we need to define the boundary. I’ll come back to that.
But within this boundary, we’ve got some different elements in the system that work together. Or they’re used together within a defined operating environment. So, we’re going to expose this system to a range of conditions in which it is designed to work. The intention is the system is going to do whatever it does to perform a given task. It can do one defined task or achieve a specific purpose.
I talked before about getting in our car. A car is complex enough to be called a system. We get in our car and we drive it on the roads. Or if we’ve got a four-wheel drive, we can drive Off-Road. Or we can use it in a more demanding operating environment to achieve a specific purpose. We want to transport ourselves, and sometimes some stuff, from A to B. That’s what we’re trying to do with the system.
Within the System
And within that system, we may have personnel/people, we may have procedures. A bunch of rules about how you drive a car legally in different countries. We’ve got materials and physical things – what the car is made of. We could have tools to repair it, and change wheels. We’ve got some other equipment, like a satnav. We’ve got facilities. We need to take a car somewhere to fill up with fuel or to recharge it. We’ve got services like garages, repairs, servicing, etc. And there could be some software in there as well. Of course, these days in the car, there’s software everywhere in most complex devices.
So, our system is a combination of lots of different things. These things are working together to achieve some kind of goal or some kind of result. There’s somewhere we want to get to. And it’s designed to work in a particular operating environment. Cars work on roads really well. Off-road cars can work on tracks. Put them in deep water, they tend not to work so well. So, let’s talk about that operating environment.
Operating Environment
What we’ve got here, is the total set of all external, natural, and induced conditions. (That’s external to the system, so outside the boundary.) So, it might be these conditions-. It might be natural or it might be generated by something else, which a system is exposed to at any given moment. We need to get a good understanding of the system, the operating environment, and what we want it to do.
If we have a good understanding of those three things, then we will be well on the way to being able to understand the risks associated with that system. That’s one of the key things with risk management. If you’ve got those three things, that’s crucial. You will not be able to do effective risk management if you don’t have a grasp of those things. And if you do have a thorough grasp of those things, it’s going to help you do effective risk management.
Conclusion
So, we’ve talked about risk management. We’ve broken it down into some big sections. Those six sections; the hazard identification; analysis; risk estimation; evaluation; reduction; and acceptance. We’ve seen how those things depend on only a few concepts. We’ve got the concepts of ‘hazards’, ‘risks’, and ‘accidents’. As well as the undesirable consequences that the risk might result in. The risk is measured based on the likelihood and severity of that harm or loss occurring.
When we’re dealing with a more complex system, we need to understand that system and the environment in which it operates. Of course, we’ve put it in that environment for a purpose. And that unpacking has allowed us to break down quite a big concept, risk management. A lot of people, like myself, spend years and years learning how to do this. It takes time to gain experience because it’s a complex thing. But if we break it down, we can understand what we’re doing. We can work our way down the fundamentals. And then if we’ve got a good grasp of the fundamentals, that supports getting the more complex stuff right. So, that’s what risk management is all about. That’s your risk management 101 and I hope that you find that helpful.
Copyright Statement
I just need to say briefly that those quotations from the standard. I can do that under a Creative Commons license. The CC4.0. That allows me to do that within limits that I am careful to observe. But this video presentation is copyrighted by the Safety Artisan.
For More…
And you can see more like these at the Safety Artisan website. That’s www.safetyartisan.com. And as you can see, it’s a secure site so you can visit without fear of a security breach. So, do head over there. Subscribe to the monthly newsletter to get discounts on paid videos and regular updates of what’s coming up. both paid and free.
So, it just remains for me to say thanks very much for watching and I look forward to catching up with you again very soon.
In this module, System Safety Risk Analysis, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.
To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context.
This wider context is operational effectiveness – we want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.
And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work to produce specified results.
We’ve got three things here: a system; the operating environment in which it is designed to work; and, we have its function or application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?
That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.
This is Module 2 of SSRAP
This is Module 2 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice!
Background
The ‘Horizon’ Scandal in the UK was a major miscarriage of justice:
‘Horizon’ was a faulty computer system, produced by Fujitsu. The Post Office had lobbied the British Government to reverse the burden of proof so that courts assumed that computer systems were reliable until proven otherwise. This made it very difficult for sub-postmasters – small-business franchise owners – to defend themselves in court.
This shocking miscarriage of justice was based on an equally shocking presumption. One that anyone with a background in software development would find ridiculous.
Introduction
Legal experts warn that failure to immediately update laws regarding computer reliability could lead to a recurrence of scandals like the Horizon case. Critics argue that the current presumption of computer reliability shifts the burden of proof in criminal cases, potentially compromising fair trials.
The Presumption of Computer Reliability
English and Welsh law assume computers to be reliable unless proven otherwise, a principle criticized for its reversal of the burden of proof. Stephen Mason, a leading barrister in electronic evidence, emphasizes the unfairness of this presumption, stating it impedes individuals from challenging computer-generated evidence.
It is also patently unrealistic. As I explain in my article on the Principles of Safe Software Development, there are numerous examples of computer systems going wrong:
Drug Infusion Pumps,
The NASA Mars Polar Lander,
The Airbus A320 accident at Warsaw,
Boeing 777 FADEC malfunction,
Patriot Missile Software Problem in Gulf War II, and many more…
Making software dependable or safe requires enormous effort and care.
Historical Context and the Horizon Scandal
Dating back to an old common law principle, presuming the reliability of mechanical systems, the UK Post Office also lobbied to have the principle applied to digital systems. The implications of this change became evident during the Horizon scandal, where flawed computer evidence led to wrongful accusations against post office operators. Repealing a 1984 act further weakened safeguards against unreliable computer evidence, exacerbating the issue.
International Influence and Legal Precedents
The influence of English common law extends internationally, perpetuating the presumption of computer reliability in legal systems worldwide. Mason highlights cases from various countries supporting this standard, underscoring its global impact.
Modern Challenges and the Rise of AI
Advancements in AI technology intensify the need to reevaluate legal presumptions. Noah Waisberg, CEO of Zuva, warns against assuming the infallibility of AI systems, which operate probabilistically and may lack consistency.
This poses significant challenges in relying on AI-generated evidence for criminal convictions.
Section 5: Proposed Legal Reforms
James Christie is a software consultant, who co-authored recommendations for an update to the UK law. He proposes two-stage reforms to address the issue.
First, evidence providers must demonstrate responsible development and management of their systems, including disclosure of known bugs. Second, if unable to do so, providers must justify why these shortcomings do not affect the evidence’s reliability.
The Reality of Software Development
First of all, we need to understand how mistakes made in software can lead to failures and ultimately accidents.
Errors in Software Development
This is illustrated well by this standard BS 5760. We see that during development people, either on their own or using tools make mistakes. That’s inevitable. And there will be many mistakes in the software – as we will see. These mistakes can lead to faults or defects being present in the software. Again, inevitably, some of them get through.
If we jump over the fence, the software is now in use. All these faults are in the software but they lie hidden. Until that is, some revealing mechanism comes along and triggers them. That revealing mechanism might be a change in the environment and operator scenario or changing inputs that maybe the software is seeing from sensors.
That doesn’t mean that a failure is inevitable because lots of errors don’t lead to failures that matter. But some do. And that is how we get from mistakes to false or defects in the software to run time errors.
What Happens to Errors in Software Products?
A long time ago (1984!), a very well-known paper in the IBM Journal of Research looked at how long it took faults in IBM operating system software to become failures for the first time. We are not talking about cowboys producing software on the web that may or may not work okay, or people in their bedrooms producing apps. We’re talking about a very sophisticated product here that it was in use all around the world.
Yet, what Adams found was that lots of software faults took more than 5,000 operating years to be revealed. He found that more than 90% of faults in the software would take longer than 50 years to become failures.
There are two things that Adams’s work tells us.
First, in any significant piece of software, there is a huge reservoir of faults waiting to be revealed. So if people start telling you that their software contains no defects or faults, either they’re dumb enough to believe that or they think you are. What we see in reality is that even in a very high-quality software product, there are a lot of latent defects.
Second, many of them – the vast majority of them – will take a long, long time to reveal themselves. Testing will not reveal them. Using Beta versions will not reveal them. Fifty years of use will not reveal them. They’re still there.
Legal experts stress the urgency of updating laws to reflect the fallibility of computers, crucial for ensuring fair trials and preventing miscarriages of justice. The UK Ministry of Justice acknowledges the need for scrutiny, pending the outcome of the Horizon inquiry, signaling a potential shift towards addressing issues of computer reliability in the legal framework.
Hopefully, the legal people will come to realize what software engineers have known for a long time. Software reliability is difficult to achieve and must be demonstrated.
What are the Hazard and Risk basics? So, what is this risk analysis stuff all about? What is ‘risk’? How do you define or describe it? How do you measure it? When? Why? Who…?
In this free session, I explain the basic terms and show how they link together, and how we can break them down to perform risk analysis. I understand hazards and risks because I’ve been analyzing them for a long time. Moreover, I’ve done this for aircraft, ships, submarines, sensors, command-and-control systems, and lots of software!
Everyone does it slightly differently, but my 25+ years of diverse experience lets me focus on the basics. That allows me to explain it in simple terms. I’ve unpacked the jargon and focus on what’s important.
Let’s get started with Module One. We’re going to recap some Risk basics to make sure that we have a common understanding of risk. And that’s important because risk analysis is something that we do every day. Every time you cross the road, or you buy something expensive, or you decide whether you’re going to travel to something, or look it up online, instead.
You’re making risk analysis decisions all the time without even realizing it. But we need something a little bit more formal than the instinctive thinking of our risk that we do all the time. And to help us do that, we need a couple of definitions to get us started.
What is Risk?
First of all, what is Risk? It’s a combination of two things. First, the severity of a mishap or accident. Second, the probability that that mishap will occur. So it’s a combination of severity and probability. We will see that illustrated in the next slide.
We’ll begin by talking about ‘mishap’. Well, what is a mishap? A mishap is an event – or a series of events -resulting in unintentional harm. This harm could be death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.
The particular standard we’re looking at today covers a range of different harms. That’s why we’re focused on safety. And the term ‘mishap’ will also include negative environmental impacts from planned events. So, even if the cause is a deliberate event, we will include that as a mishap.
Probability and Severity
I said that the definition of risk was a combination of probability and severity. Here we got a little illustration of that…
This is Module 1 of SSRAP
This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
This post, ‘SSRAP: Start the Course’, gives an overview of System Safety Risk Assessment Programs. It describes the Learning Objectives of the Course and its five modules. We’re going to learn how to:
Describe fundamental risk concepts.
Explain what a Systems Safety Approach to Risk is.
Define within that System Safety Approach, what a Risk Analysis Program is.
List Hazard Analysis Tasks that make up a program.
Welcome to this course on System Safety Risk Analysis Programs. It’s a five-part course for beginners and practitioners. It will also benefit a wider range of people.
Learning Objectives
In this course, we will learn how to do several things. First of all, we’re going to learn how to describe fundamental risk concepts. We’re going to explain what a Systems Safety Approach to Risk is and what it does. We will define within that System Safety Approach, what a Risk Analysis Program is. We’re going to be able to list Hazard Analysis Tasks that make up a program. We’ll be able to select tasks to meet our needs.
At the end of this task, we should be able to design a tailored Risk Analysis Program for any application. And also, we’re going to learn how to get some more information resources on how to do that.
Topics for this Course
So how is that going to work? Well. In five modules. In Module One, we’re going to go over some risk basics. The reason for this is to make sure we’ve got a common understanding.
In Module Two, we’re going to look at Systems Safety Risk Analysis. What it is, what it does, and the benefits it delivers.
In Module Three, we will look at a particular System Safety Program Standard. We will understand what it was designed to do and learn what it’s good and not so good at.
In Module Four, we’re going to take all the previous knowledge from Modules One to Three and put it together. We will use that information to design a Risk Analysis Program. This information can also help design any number of programs depending on what we want to do.
And then finally, in Module Five, we’ll look at where to get more resources to take us deeper to the next level…
This is SSRAP: Start of the Course
This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
In this post, we will look at Three Insightful Methods for Causal Analysis. Only three?! If you search online, you will probably find eight methods coming up:
Pareto Charts;
Failure Mode and Effect Analysis (FMEA);
Five Whys;
Ishikawa Fishbone Diagram;
Fault Tree Analysis;
8D Report Template Checklist;
DMAIC Template; and
Scatter Diagrams.
However, not all these methods are created equal! Only some provide real insight to the challenge of causal analysis. So, I’ve picked the best ones – based on my 25 years’ experience in system safety – and put them in this post.
What are Causes and Why are They Important?
Before we go any further, I just want to explain some basic terms. When we’re doing safety analysis we have hazards and as the sort of bow tie diagram suggests, one hazard can have many causes and one hazard can have many consequences.
Now, some of those consequences will be harmless but some may result in harm to people. And that progression from causes to hazards to consequences is known as an accident sequence. We tend to Okay? So we’re looking at the worst-case scenario where somebody gets hurt.
(It’s not really the focus of this post, but the test for a hazard is it’s necessary for the accident. If there’s no hazard, there’s no accident. Once the hazard is present, nothing else weird or unusual needs to happen. For the accident to occur. So, the hazard is both necessary and sufficient.)
I’ve mentioned consequences, but today we’re talking about causes. So, we will analyze the left-hand side of the bow tie.
Three Insightful Causal Analysis Methods
Pareto Analysis
So, let’s start with a Pareto Analysis. I suspect most of us have seen this before. If we look at the causes of a certain outcome. What we often find is that a few causes are dominant.
In this chart, we’ve got types of medication errors. In this case ‘a dose missed,’ ‘wrong time,’ ‘wrong drug,’ and then ‘overdose’ accounts for 70% of the causation. Everything else is only 30%.
(Now, here they drew a line at 80% as the cutoff because sometimes Pareto is known as the eighty-twenty rule. And that’s suggesting that maybe 80% of the outcome is caused by 20 percent of the inputs or causes. In other words, most of the output variable is driven by only 20% of the input variables. That’s just a rule of thumb, and it doesn’t have to be 80/20, it might be 70/30, or 60/40, it doesn’t matter.)
The point is there are some dominant causes. If we can identify the dominant causes, and we work hard on just those top 2, 3, 4, or 5 causes, then we can get a disproportionate reduction in risk by concentrating on those few things. Whereas, we could spend an awful lot of effort at attacking all the other causes and make very little difference.
It’s a simple technique, but by being led by the data we can become far more effective at risk management.
So an Ishikawa diagram or a fishbone diagram, as it’s often called for obvious reasons. Is a causal diagram (Image By FabianLange at de.wikipedia), and it’s often used.
In accident investigations, the Ishikawa diagram becomes a vital tool. I recall learning its application through the tragic case of the Piper Alpha oil rig disaster. Despite the grim nature of such events, they demand thorough causal analysis. Whether we opt for predefined groupings like equipment, process, people, materials, environment, and management, or let the data guide us, the essence remains unchanged: we investigate accidents to identify potential outcomes or problems and determine their contributing factors.
What makes this method invaluable is its ability to transcend technical issues alone. By encouraging us to consider the broader socio-technical environment, it prompts a holistic view of complex systems. The diagram visually represents primary causes directly linked to the main ‘fishbone’ of analysis, while secondary causes may contribute to or stem from these primary factors. The potential for tertiary causes exists in theory, but it may complicate matters without appropriate tools.
Utilizing this technique for brainstorming is highly effective. Displaying it on a whiteboard and collectively contemplating it as a group fosters focused discussions. Subsequently, formal documentation in various formats ensures thorough record-keeping. This method proves particularly powerful for unraveling complexities within systems, a topic worthy of a dedicated webinar.
Fault Tree Analysis
Fault Tree Analysis is another widely used technique. We’ll have a webinar devoted to FTA later.
The Eight Disciplines Method
The Eight Disciplines method is one of those I often get mixed up with something else. It was introduced by the Ford Motor Co. (I’ve never used it) but it looks like a sensible method. There are actually nine steps:
Prepare and Plan
Form your Team
Identify the Problem
Develop an Interim Containment Plan
Verify Root Causes & Escape Points
Choose Permanent Corrective Actions
Implement Corrective Actions
Take Preventative Measures
Celebrate with Your Team!
Effective problem-solving requires careful planning, especially when it’s a team effort. Let’s break it down into three key steps:
Immediate Action: Start by addressing the urgency. What can we do right now to contain the problem while we develop a more comprehensive solution? It’s crucial to manage the issue in the short term as we work on a more refined approach.
Identify Root Causes: Investigate when and how the situation spiraled out of control. Pinpoint the opportunities for errors within the process. Understanding the root causes and timing issues is essential before moving forward.
Implement Permanent Solutions: Now that we’ve dissected the problem, it’s time to implement long-term corrective actions. This involves establishing better control measures and preventive strategies to avoid similar issues in the future.
Finally, it’s important to celebrate with your team once the solution is in place. Whether it’s going out for a meal or another form of recognition, acknowledging the effort is crucial.
This structured approach acknowledges the multi-stage nature of problem-solving. It emphasizes the need for short-term fixes, data-driven decision-making for long-term solutions, and proactive measures to prevent recurrences. Even if you take away nothing else, remembering these key points can guide you through the process. For more detailed information, check out the provided link, and stay tuned for a downloadable PDF with additional resources.
Bonus – Cause Analysis Reports
And a little bonus here, something I picked up while looking through this stuff if you go to smartsheet.com, you’ll find a whole bunch of nice templates on course analysis reports. Okay? So I haven’t been through them all but there looks like quite a lot of good stuff in there if you’re interested.
More Resources
Interested in accessing more content from the Safety Artisan? Head over to my Thinkific platform, where you’ll find my courses and all the webinars available at the academy. Plus, you can test it out with a 7-day free membership trial. For those looking for an extended trial, use the code ‘one-month-free‘ to enjoy a full month on us. I am continually updating our content, adding new material every month to keep things fresh.
Additionally, sign up for free email updates to stay informed about upcoming webinars and other exciting events.
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
Second, we’re talking about Risk Assessment. This is a term for putting together different activities within another process. This process may be basic, or it might be quite sophisticated, as illustrated, below.
Third, and finally, we will put all this together into a System Safety Program. This is hinted at in the diagram, above, but a real system safety program needs to do a lot more than this. It needs to tie into the project it supports, to systems engineering, to resources, quality, V&V, etc. Designing such a program is complex, so we typically follow a standard, like Mil-Std-882E.
You can hear more about this in the introductory video, below.
This post is part of a series:
This Post is the Intro to the System Safety Risk Assessment Programs Course.
Welcome to this course on Systems Safety Risk Analysis Programs. I’m Simon Di Nucci, The Safety Artisan, and I’ve been a safety engineer and consultant for over 20 years. I’ve worked on a wide range of safety programs doing risk analysis on all kinds of things. Ships, planes, trains, air traffic management systems, software systems, you name it.
I’ve worked in the U.K., in Australia, and on many systems from the U.S. I’ve also spent hundreds of hours training hundreds of people on safety. And now I’ve got the opportunity to share some of that knowledge with you online.
So, what are the benefits of this course?
First of all, you will learn about basic concepts. About system safety, what it is and what it does. You will know how to apply a risk analysis program to a very complex system and how to manage that complexity. So, that’s what you’ll know.
At the end of the course, you will also be able to do things that you might not have been able to do before. You will be able to take the elements of a risk analysis program and the different tasks. You can select the right tasks and form a program to suit your application, whatever it might be. Whether you might:
Have a full, high-risk bespoke development system,
Be taking a commercial system off the shelf and doing something new with it, or
Take a product and use it in a new application or a new location.
Whatever it might be, you will learn how to tailor your risk analysis program. This program will give you the analyses you need. And to meet your legal and regulatory requirements. Once you’ve learned how to do this, you can apply it to almost any system.
Finally, you will feel confident doing this. I will be interpreting the terminology used in the tasks and applying my experience. So, instead of reading the standard and being unsure of your interpretation, you can be sure of what you need to do. Also, I will show you how you can get good results and avoid some of the pitfalls.
These are the three benefits of the Course
You will know what to do.
You will be able to perform risk program tasks, and
You’ll feel confident doing those tasks.
At the end of the course, I will also show you where to find further resources. There are free resources to choose from. But there are also paid resources for those who want to take your studies to the next level. I hope you enjoy the course.
This is Module 1 of SSRAP
This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
The 2024 Blog Digest – Q1/Q2 brings you all of The Safety Artisan’s blog posts from the first six months of this year. I hope that you find this a useful resource!
When Understanding Your Risk Assessment Standard, we need to know a few things. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we will ask… Read more: Understanding Your Risk Assessment Standard
Welcome to Risk Management 101, where we’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts and then we’re going to build it up again and show you how it’s done. I’ve been involved in risk management, in project risk management, safety risk management,… Read more: Risk Management 101
In this module, System Safety Risk Analysis, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.
TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice! Background The ‘Horizon’ Scandal in the UK was a major miscarriage of justice: Between 1999 and 2015, over 900 sub postmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions… Read more: Updating Legal Presumptions for Computer Reliability
What are the Hazard and Risk basics? So, what is this risk analysis stuff all about? What is ‘risk’? How do you define or describe it? How do you measure it? When? Why? Who…? In this free session, I explain the basic terms and show how they link together, and how we can break them… Read more: Hazard and Risk Basics
This post, ‘SSRAP: Start the Course’, gives an overview of System Safety Risk Assessment Programs. It describes the Learning Objectives of the Course and its five modules. We’re going to learn how to: This post is part of a series: SSRAP: Start of the Course – Transcript Welcome to this course on System Safety Risk… Read more: SSRAP: Start the Course
In this post, we will look at Three Insightful Methods for Causal Analysis. Only three?! If you search online, you will probably find eight methods coming up: However, not all these methods are created equal! Only some provide real insight to the challenge of causal analysis. So, I’ve picked the best ones – based on… Read more: Three Insightful Methods for Causal Analysis
In this ‘Introduction to System Safety Risk Assessment’, we will pull together several key ideas. First, we’ll talk about System Safety. This is safety engineering done in a Systems Engineering Framework. We are doing safety within a rigorous process. Second, we’re talking about Risk Assessment. This is a term for putting together different activities within… Read more: Introduction to System Safety Risk Assessment
The 2024 Blog Digest – Q1/Q2 brings you all of The Safety Artisan’s blog posts from the first six months of this year. I hope that you find this a useful resource! The 2024 Blog Digest – Q1/Q2: 25 Posts! There’s More! Head over to my Thinkfic Site for courses & webinars. Subscribe for a… Read more: The 2024 Blog Digest – Q1/Q2
This is the full-length (one hour) session on Environmental Hazard Analysis (EHA), which is Task 210 in Mil-Std-882E. I explore the aim, task description, and contracting requirements of this Task, but this is only half the video. In the commentary, I then look at environmental requirements in the USA, UK, and Australia, before examining how… Read more: Environmental Hazard Analysis
In this full-length (38-minute) session, The Safety Artisan looks at System of Systems Hazard Analysis, or SoSHA, which is Task 209 in Mil-Std-882E. SoSHA analyses collections of systems, which are often put together to create a new capability, which is enabled by human brokering between the different systems. We explore the aim, description, and contracting… Read more: System of Systems Hazard Analysis
In this full-length (55-minute) session, The Safety Artisan looks at Health Hazard Analysis, or HHA, which is Task 207 in Mil-Std-882E. I explore the aim, description, and contracting requirements of this complex Task. It covers: physical, chemical & biological hazards; Hazardous Materials (HAZMAT); ergonomics, aka Human Factors; the Operational Environment; and non/ionizing radiation. I will… Read more: Health Hazard Analysis
Get the Preliminary Hazard Identification & Analysis Guide for free! It’s a 50-page .pdf download, collated from reliable sources. Contents: Preliminary Hazard Identification & Analysis Guide – Introduction Hazard Identification has been defined as: “The process of identifying and listing the hazards and accidents associated with a system.” Hazard Analysis has been defined as: “The… Read more: Preliminary Hazard Identification & Analysis Guide: Free
So, what I’m talking about today is safety and risk audit, that is about process, Q&A, and some personal experience. Also something called layered process audits, which I ran into while researching this webinar. I thought that sounded interesting – and it is! Those are today’s topics for the webinar. Audit Process I’m talking about… Read more: Safety and Risk Audit
In this full-length session, I look at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. I explore Task 206’s aim, description, scope, and contracting requirements. There’s value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the… Read more: Operating & Support Hazard Analysis
In this 45-minute session, I’m looking at System Requirements Hazard Analysis, or SRHA, which is Task 203 in the Mil-Std-882E standard. I will explore Task 203’s aim, description, scope, and contracting requirements. SRHA is an important and complex task, which must be done on several levels to succeed. This video explains the issues and discusses… Read more: System Requirements Hazard Analysis
So, how do we identify and analyze functional hazards? I’ve seen a lot of projects and programs. We’re great at doing the physical hazards, but not so good at the functional hazards. So, when I talk about physical and functional hazards, the physical stuff, I think we’re probably all very familiar with them. They’re all… Read more: Identify and Analyze Functional Hazards
So today, we’re talking about the Foundations of System Safety assessment. And as it says, it’s a free webinar from The Safety Artisan, and it’s one of a series. So, before we go on, I’ll just introduce myself. Why should you bother to listen to me? Well, in 25 years of experience in system safety,… Read more: Foundations of System Safety
TL;DR This article on Failure Mode Effects Analysis explains this powerful and commonly used family of techniques. You can access this webinar (and all the others) here. I have used FMEA and related techniques on many programs and it can produce powerful results quickly and cheaply. Recently, I’ve seen some criticism of FEMA on social… Read more: Failure Mode Effects Analysis
In my webinar ‘Five Ways to Identify Hazards’ I look at a mix of techniques. We need these diverse techniques to assure us (give justified confidence) that we have identified the full range of hazards associated with a system. To do this I draw on my 25 years of experience (see ‘Meet the Author‘, below)… Read more: Five Ways to Identify Hazards
In this post, ‘Exploring Causal Analysis: Techniques and Insights’, I provide a quick summary of my recent webinar. You can see a short video introduction below, or access the full webinar at my Safety Engineering Academy. Introduction: Causal analysis is a vital aspect of system safety engineering, offering insights into the root causes of issues… Read more: Exploring Causal Analysis: Techniques and Insights
In this post ‘Full Function Hazard Logs: A Deep Dive into Relational Databases’, I explore some things we can do with a hazard log built upon a database. In my 25-year career in safety engineering, I’ve seen many hazard logs and hazard tracking systems. Most of them were hosted in Microsoft Excel, but there were… Read more: Full Function Hazard Logs: A Deep Dive into Relational Databases
In this 45-minute session, I look at System Hazard Analysis with Mil-Std-882E. SHA is Task 205 in the Standard. I explore Task 205’s aim, description, scope, and contracting requirements. I also provide commentary, based on working with this Standard since 1996, which explains SHA. How to use it to complement Sub-System Hazard Analysis (SSHA, Task… Read more: System Hazard Analysis with Mil-Std-882E
In this video, I look at Functional Hazard Analysis with Mil-Std-882E (FHA, which is Task 208 in Mil-Std-882E). FHA analyses software, complex electronic hardware, and human interactions. I explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (I refer to other lessons for special techniques for software safety… Read more: Functional Hazard Analysis with Mil-Std-882E
In this 45-minute session, I look at how to do a Preliminary Hazard Analysis with Mil-Std-882E. Preliminary Hazard Analysis, or PHA, is Task 202 in the Standard. I explore Task 202’s aim, description, scope, and contracting requirements. There’s value-adding commentary, and I explain the issues with PHA – how to do it well and avoid… Read more: How to do Preliminary Hazard Analysis with Mil-Std-882E
There’s More!
Head over to my Thinkfic Site for courses & webinars. Subscribe for a free course starter pack and regular email support. Leave a comment, below!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
This is the full-length (one hour) session on Environmental Hazard Analysis (EHA), which is Task 210 in Mil-Std-882E. I explore the aim, task description, and contracting requirements of this Task, but this is only half the video. In the commentary, I then look at environmental requirements in the USA, UK, and Australia, before examining how to apply EHA in detail under the Australian/international regime. This uses my practical experience of applying EHA.
You Will Learn to:
Conduct EHA according to the standard;
Record EHA results correctly;
Contract for EHA successfully;
Be aware of the regulatory scene in the US, UK, and Australia;
Appreciate the complexities of conducting EHA in Australia; and
Recognize when your EHA program requires specialist support.
Hi, everyone, and welcome to the Safety Artisan. Today, we’re going to be talking about Environmental Hazard Analysis – A big topic! And I’m covering this as part of the series on the System Safety Engineering Standard – Mil. Standard 882E. But it doesn’t really matter what standard we are using the topic is still relevant.
Environmental Hazard Analysis is a big topic because we’ll cover everything, not just hazards. At the end of this session, you should be able to enjoy three benefits. First of all, you should know how to approach Environmental hazard analysis from:
The point of view of the requirements,
The Hazard Analysis itself (the process), and
Some national and international variations in the English-speaking world.
So, you should know how to do the basics and also to recognize when maybe you need to bring in a specialist.
But maybe most important of all, number three is you should have the confidence to be able to get started. So I’m hoping that this session is really going to help you get started, know what you can do, and then maybe recognize when you need to bring in some specialist help or go and seek some further information.
As you’ll see, it’s a big, complex subject. I can get you started today, but that’s all I can do in one session. And in fact, I think that’s all anyone can do in one session. Anyway, let’s get on with it and see what we’ve got.
Environmental Hazard Analysis, which is Task 210 under Mil. Standard 882E. So let’s look at what we’re going to talk about today.
Topics for this Session
And you’ll see why it’s going to be quite a lengthy session. I think it will last an hour because we’re going to go through the Purpose and Task Description of Environmental Hazard Analysis as set out in the Mil. Standard. And it says seven-plus slides because there are seven mainstream slides plus some illustrations in there as well. Then we’ve got a couple of slides each on Documentation, Hazardous Materials or HAZMAT, and Contracting. Then eight slides of Commentary and this is the major value add because I’ll be talking about applying Environmental Hazard Analysis in a US, UK, and Australian jurisdiction under the different laws, which I have some experience of.
I worked closely with environmental specialists on the Eurofighter Typhoon project, and I’ve also worked closely with the same specialists on US programs which had been bought by different countries. And then finally, I’ve been closely involved in a major environmental – or safety and environmental – project here in Australia. So I’ve been exposed and learned the hard way about how things work or don’t work here in Australia. So I’ve got some relevant experience to share with you, as well as some learned material to share with you. And then a little Conclusion, because I say this will take us an hour so there’s quite a lot of material to cover. So, let’s get right on with it.
EHA
So the purpose of Environmental Hazards Analysis, or EHA, as it says, is to support design development decisions. Now all of the 882 tasks are meant to do this, but actually, the wording in Task 210 is the clearest of all of them. Really makes it explicit what we’re trying to do, which is excellent.
So we’re going to identify hazards throughout the life cycle – cradle to grave, whatever system it is. We’re going to document and record those hazards and their leading particulars within the Hazard Tracking System or Hazard Log, as we more often call it. We’re going to manage the hazards using the same system safety process in Section Four as we use for safety. This is the process that you will have heard in the other lessons that I’ve given. And very often under 882, Safety and Environmental Hazards are considered together. There are pros and cons with that approach, but nevertheless, a lot of the work is common. We’ll see why later on.
In this American standard, it says we are to provide specific data to support the National Environmental Policy Act and executive order requirements. So the NEPA is an American piece of legislation and therefore I use this color blue to indicate anything that’s an American-specific requirement. So if you’re not operating in America, you’ll need to find the equivalent to manage to and to comply with. Moving on…
Our website uses cookies to provide you with the best experience. By continuing to use our website, you agree to our use of cookies. For more information, read our Privacy Policy on the "About" Page.