Members Get a Free Intro Course, 50% Off & Updates. I will send you the links and discount codes via email. So, tick the email box and check your junk mail to receive the offers. You will get an email series showcasing the free/paid resources. Also, regular updates on new articles: never miss another post!
Video Introduction
Introduction from the Beginning – the Key Things Haven’t Changed!
Introduction to the Safety Artisan
Hi everyone, and welcome to the safety artisan, a series of instructional videos on safety. I’m Simon and I’m a safety engineer and consultant with over 20 years of experience (now 25+) working in systems safety, safety engineering, safety in design and a whole bunch of related disciplines including software safety. I’ve got a lot of background in this area, and I hope that you’ll be able to support this enterprise. The tagline for the business is professional pragmatic and impartial. But what does that really mean? Well, in my time as a safety engineer and consultant, I’ve worked with lots of clients…
… doing many different things:
Aerospace, air traffic management, software, ships and submarines, other transport systems, etc. I often find that clients are making two kinds of mistake. They’re either not doing enough work to meet their obligations, or they’re doing too much work. The first one is perhaps obvious, as safety standards and safety legislation are very demanding. People aren’t always aware of what their obligations are, and therefore they’re not always meeting them. But when you’re a consultant and, it must be said, demanding a lot of money from clients to do this work, I think the suspicion is sometimes that the consultant is just asking to do more work to get more money.
Now, that’s not actually what ethical consultants do, but I’m sure not everyone believes that. So, here, I hope to get away from that paradigm, and we can actually share information just because it’s factual. Accepting what I say doesn’t mean that I’ll take any more money off you and you can check out what you see and decide whether you like it. The other issue is perhaps less obvious: people do too much work. But the reality is there are people all over the place doing safety work that just doesn’t make a difference – i.e. it doesn’t demonstrate that you’ve met requirements or that risk is managed.
And that’s also a difficult sell…
…because questioning what the tribe is doing, questioning the culture of the organization is difficult and frankly risky for individuals. So they don’t want to do it. So again, here in the privacy of a video, it’s just you and me. I can tell you stuff, you can give me feedback on the website or at Patreon.com. You can ask questions and hopefully we can get to a better understanding of the facts, without worrying about sums of money changing hands or convincing your peers that change is necessary.
So I hope you find this helpful and I hope you’re able to support me… [I’m not on Patreon anymore!] … You can always look me up on LinkedIn and check out my experience and qualifications. Thanks very much for listening and I look forward to talking to you again.
Remember: Members Get a Free Intro Course, 50% Off & Updates!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
More Resources for Risk Assessment Programs Course
Welcome to Module Five, More Resources for Risk Assessment. We’re on the home straight now! This is the last of the five modules. I will let you know where to get more resources and help on these topics.
Course Learning Objectives
Describe fundamental risk concepts;
Explain what a system safety approach is and does;
Define what a risk analysis program is;
List the hazard analysis tasks that make up a program;
Select tasks to meet your needs;
Design a tailored risk analysis program for any application; and
Know how to get more information and resources.
More Resources for Risk Assessment: Transcript
Copyright/Source Statement
“First, I want to point out that I’ve been referring to a standard; Military Standard 882E, a copyright-free publication. It’s a US standard and is available to download for free at many different locations. One of them is the US Defence Acquisition University. As far as I can tell, this is the official home of it now. You can search for ‘DAU’ or ‘Defence Acquisition University’ [to find it]. And when you go there, this is a search function, which is very good. You’ll find 882E very easily. But here’s the link for reference now.
So that is copyright-free. This presentation, of course, is copyright The Safety Artisan of this year (2021). But it’s also worth saying that there’s a lot more out there. There’s more help you can get than the standard by itself. The Defence Acquisition University for some reason doesn’t seem to publish much on 882E, either in the way of guidance or help on how to use this standard.
For More…
If you want more information, please feel free to go to The Safety Artisan channel on YouTube; subscribe to the channel and click on the bell symbol to get informed whenever a new video comes out. There are lots of free videos on The Safety Artisan channel. And also short free demo versions of the paid videos. So, if you want to look at a video to see whether you think it’s worth buying, there will be a free version on there. Either a two-minute thing with subtitles or, for a lot of the lessons, there’s a full seven minutes. It’s the first seven minutes of the lesson. So you can get a flavor of what’s there.
And then for more videos and resources, you can visit this site, www.safetyartisan.com. That’s got all the information there. It’s a secure site. Here you can sign up for regular emails from The Safety Artisan. And that will get you a free Course Triple Bundle. Please feel free to help yourself and look at the free goodies!
Mil-Std-882E Analysis Tasks
But also, there are ‘paid lessons’ on each one of the 10 [Mil-Std-882E] Tasks. Lessons on average are about – most of the lessons are about forty-five minutes. Some are a little bit shorter at thirty-five minutes. And the Environmental one is an hour. As is, the Health Hazard Analysis one. That’s because those are very complex tasks. So they vary from about 35 to 60 minutes in length each.
What and Why?
And for each of those old video training sessions, you will get some in-depth training on each task. Your training video will include a full description of the task, plus a commentary that I provide. You will get a full written transcript of the video as well. And if you go there, the page will tell you the benefits of each task. What it’s designed to do and how to apply it. Its pros and cons. And my expert tips from long and sometimes bitter experience on how to get the most out of these tasks. Also, pitfalls to avoid.
In Conclusion – Learning Objectives
Let’s recap, for this entire course, the five modules. You should now be able to describe your fundamental research concepts from Module One. From Module Two, you should be able to explain what a system safety approach is and does. You should be able to define what a risk analysis program is. You should be able to list the Hazard Analysis Tasks that make up a Safety Program. Or a Risk Analysis Program.
Critically, you should be able to select which tasks you need to meet your needs. And by doing that repeatedly, you should then be able to design a tailored Risk Analysis Program. And you should be able to do this for pretty much any application. And in the final module, you will have learned how to get more information. And where to find more in-depth resources on each of those 10 tasks. That’s in case you should need to go to the next level.
So, that’s what we’ve covered in this session.
End
And it just remains for me to say thanks very much for buying this [course] video and supporting the work of The Safety Artisan. I’m Simon and I would like to say a personal thanks very much to you. Goodbye and hope to see you again soon.”
This is Module 5 of SSRAP
This is Module 5 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on sale now, so check out all the free preview videos here!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
The Ten Viewpoints of Mil-Std-882E Hazard Analyses
Designing Your Risk Assessment Program. Which Ingredients should we use? In this post, I draw upon my 25+ years in system safety to give you some BOLD advice! I’m going to dare to suggest which analysis tasks are essential to every System Safety Program. I also suggest which tasks are optional depending on the system that you are analyzing.
Which Ingredients should we use?
Everything – high novelty, challenging requirements, bespoke development and massive scrutiny);
The Bare Essentials;
New Designs and Integrations;
The Human Element;
Electronics, Software and Data;
Combining existing Systems; and
Environmental Protection.
Video Highlights
Designing Your Safety Program – Highlights (SSRAP M4)
Topics
Designing Your Risk Assessment Program: Transcript
We’re onto Module Four – Designing Your Program.
This module aims to show you how to design a systematic, effective strategy for Risk Analysis. An effective program for Risk Analysis that isn’t wasteful. This module is a little bit longer than the others but bear with me! This is the real meat of what I promised you. So, let’s get started.
Multiple Points of View
As I said in a previous slide, we will deal with multiple points of view. We will use multiple points of view to look at the system from many different angles.
Ten different angles, in this case, one for each task. Each of those tasks brings a different perspective. So, each task has a different purpose. What they have in common is they are all there to bring out a different aspect of the system. They are different kinds of analysis, but they all have the same aim. To identify hazards and analyze hazards.
From that, we can then identify what we need to do to control those hazards. And that, in turn, gives us safety requirements. Sometimes they’re called ‘derived safety requirements’. They need to be met for the system to be safe. That’s the whole point of what we’re doing, as mentioned before.
Which Ingredients?
But if you’ve got everything then you only need all those 10 tasks if everything is in the red. Perhaps you’ve got a very novel system. You’ve got challenging performance requirements. You’ve got lots of bespoke development. And you’ve got a very critical system that’s going to get a lot of scrutiny. So, you need all 10 only if you’ve got a development from hell. Where you’ve got a very challenging development and you need all the tools you can get.
Now, that’s fine. That’s what the standard’s designed for. But very rarely are we going to work on a program where we’re pulling out all the stops. More often, we’re going to be working on something where there are some challenging areas and some less so. And we don’t need the entire program. We don’t need all 10 tasks to achieve success. And it’s OK to tailor your safety analysis to deliver value for money. In fact, this approach is better.
So, we’ve got some options here. I’m going to take you through the bare essentials. Those are what you need to do for every safety program. The work that we would do to address new designs and new integrations. Work that we would do to address the human element. This includes both parts of human factors. That’s the human contribution to safety and the impact that the system might have on human health. So, there’s a bit of back and forth in there in the two tasks there.
Then if our system has got programmable electronic software, we might need to look at that. Or if it has data that is being developed or modified, we need to look at that too. We need to assess the safety implications of the modifications/development. We might consider combining existing systems into a system of systems. And then finally, we might have to do environmental protection. So, the bare essentials plus those five optional elements are the ones that we will look at.
The Essentials #1
Let’s start with the essentials. I’m going to say it’s axiomatic – that every program needs these three tasks. It needs Preliminary Hazard Identification. It needs Preliminary Hazard Analysis. And it needs System Requirements Hazard Analysis. The last one is about identifying safety requirements for the system.
Now, that’s a very bold statement, is it for me to say you must have these elements in every safety program? Let me justify that, first of all, before I explain it a little bit in the next slide.
The first thing to note is that you can do these tasks early on. They are quick and cheap tasks if you do them early enough. If you do them early enough, it’s low granularity. So, it can be a quick and simple analysis. And because of that, it’s cheap. But don’t let that fool you! Getting in early and thinking about Risk early gives us valuable insight. Insight that we can then take action on. So we get actionable results early enough in the program to do something about it if we do it.
The second point to note with these three is that every other task depends on their outputs. Indeed, if you’re going to successfully tailor a safety program, you need the output from these tasks. They will help you focus on what’s important and what’s less important.
Thirdly, from experience, almost every program suffers from not doing these three tasks. Whether that be well enough, early enough, or both. I’ve never been on a program where we said, ‘We did too much Preliminary Hazard Identification Analysis!’. Nor ‘We did too much identification of safety requirements!’. That has never, ever happened in more than 20 years of experience working on safety programs.
It’s always been the opposite. We wish we’d done more. We wish we’d gone in earlier with these tasks. Then we would have known something that would have helped us to make sensible decisions. Ultimately, it would have saved a lot of time and money too! Think of these essentials as an investment, because that’s what they are…
This is Module 4 of SSRAP
This is Module 4 of the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
The Ten Viewpoints of Mil-Std-882E Hazard Analyses
When Understanding Your Risk Assessment Standard, we need to know a few things. The standard is the thing that we’re going to use to achieve things – the tool. And that’s important because tools designed to do certain things usually perform well. But they don’t always perform well on other things. So we will ask ‘Are we doing the right thing?’ And ‘Are we doing it right?’
So, what will we do and why are we doing it? First, the use of safety standards is very common for many reasons. It helps us to have confidence that what we’re doing is good enough. We’ve met a standard of performance in the absolute sense. It helps us to say, ‘We’ve achieved standardization or commonality in what we’re doing’.
We can also use it to help us achieve a compromise. That can be a compromise across different stakeholders or different organizations. Standardization gives us some of the other benefits as well. If we’re all doing the same thing rather than we’re all doing different things, it makes it easier to train staff. This is one example of how a standard helps.
However, we need to understand this tool that we’re going to use. What it does, what it’s designed to do, and what it is not designed to do. That’s important for any standard or any tool. In safety, it’s particularly important because safety is in many respects an intangible. This is because we’re always looking to prevent a future problem from occurring. In the present, it’s a little bit abstract. It’s a bit intangible. So, we need to make sure that in concept what we’re doing makes sense and it’s coherent. That it works together. If we look at those five bullet points there, we need to understand the concept of each standard. We need to understand the basis of each one.
They’re not all based on the same concept. Thus, some of them are contradictory or incompatible. We need to understand the design of the standard. What the standard does, what the aim of the standard is, and why it came into existence. And who brought it into existence. To do what for who – who’s the ultimate customer here?
For risk analysis standards, we need to understand what kind of risks it addresses. Because the way you treat a financial risk might be very different from a safety risk. In the world of finance, you might have a portfolio of products, like loans. These products might have some risks associated with them. One or two loans might go bad and you might lose money on those. But as long as the whole portfolio is making money that might be acceptable to you. You might say, ‘I’m not worried about that 10% of my loans have gone south and all gone wrong. I’m still making plenty of profit out of the other 90%’. It doesn’t work that way with safety. You can’t say ‘It’s OK that I’ve killed a few people over here because all this a lot over here are still alive!’. It doesn’t work like that!
Also, what kind of evidence does the standard produce? Because in safety, we are very often working in a legal framework that requires us to do certain things. It requires us to achieve a certain level of safety and prove that we have done so. So, we need certain kinds of evidence. In different jurisdictions and different industries, some evidence is acceptable. Some are not. You need to know which is for your area. And then finally, let’s think about the pros and cons of the standard, what does it do well? And what does it do not so well?
System Safety Pedigree
We’re going to look at a standard called Military Standard 882E. This standard was first developed several decades ago. It was created by the US government and military to help them bring into service complex cutting-edge military equipment. Equipment that was always on the cutting edge. That pushes the limits of what you can achieve in performance.
That’s a lot of complexity. Lots of critical weapon systems, and so forth. So they needed something that could cope with all that complexity. It’s a system safety engineering standard. It’s used by engineers, but also by many other specialists. As I said, it’s got a background in military systems. These days you find these principles used pretty much everywhere. So, all the approaches to System Safety that 882 introduced are in other standards. They are also in other countries.
It addresses risks to people, equipment, and the environment, as we heard earlier. And because it’s an American standard, it’s about system safety. It’s very much about identifying requirements. What do we need to happen to get safety? To do that, it produces lots of requirements. It performs analyses of all those requirements and generates further requirements. And it produces requirements for test evidence. We then need to fulfill these requirements. It’s got several important advantages and disadvantages. We’re going to discuss these in the next few slides…
This is Module 3 of SSRAP
‘Understanding Your Risk Assessment Standard’ is Module 3 of the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
Welcome to Risk Management 101, where we’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts and then we’re going to build it up again and show you how it’s done. I’ve been involved in risk management, in project risk management, safety risk management, etc., for a long, long time. I hope that I can put my experience to good use, helping you in whatever you want to do with this information.
Maybe you’re getting an interview. Maybe you want to learn some basics and decide whether you want to know more about risk management or not. Whatever it might be, I think you’ll find this short session really useful. I hope you enjoy it and thanks for watching.
Welcome to Risk Management 101, where we’re going to…
Hi everyone and welcome to Risk Management 101. We’re going to go through these basic concepts of risk management. We’re going to break it down into the constituent parts. Then we’re going to build it up again and show you how it’s done.
My name is Simon Di Nucci and I have a lot of experience working in risk management, project risk management, safety risk management, etc. I’m hoping that I can put my experience to good use, helping you in whatever you want to do with this information. Whether you’re going for an interview or you want to learn some basics. You can watch this video and decide if you want to know more about risk management or if you don’t need to. Whatever it might be, you’ll find this short session useful. I hope you enjoy it and thanks for watching.
Topics For This Session
Risk Management 101. So what does it all mean? We’re going to break risk management down into we’ve got six constituent parts. I’m using a particular standard that breaks it down this way. Other standards will do this in different ways. We’ll talk about that later. Here we’ve got risk management broken down into; hazard identification, hazard analysis, risk estimation, risk evaluation (and ALARP), risk reduction, and risk acceptance.
Risk Management
Let’s get right on to that. Risk management – what is it? It’s defined as “the systematic application of management policies, procedures, and practices to the tasks of hazard identification, hazard analysis, risk estimation, risk and ALARP evaluation, risk reduction, and risk acceptance”.
There are a couple of things to note here. We’re talking about management policies, procedures, and practices. The ‘how’ we do it. Whether it’s a high-level policy or low-level common practice. E.g. how things are done in our organization vs how the day-to-day tasks are done? And it’s also worth saying that when we talk about ‘hazards’, that’s a safety ‘ism’. If we were doing security risk management, we could be talking about ‘threats’. We can also be talking about ‘causes’ in day-to-day language. So, we can be talking about something causing a risk or leading to a risk. More on that later, but that’s an overview of what risk management is.
Part 1
Let’s look at it in a different way. For those of you who like a visual representation, here is a graph of the hierarchical breakdown. They need to happen in order, more or less, left to right. And as you can see, there’s a link between risk evaluation and risk reduction. We’ll come on to that. So, it’s not ‘or’ it’s a serial ‘this is what you have to do’. Sometimes they’re linked together more intimately.
Hazard Identification
First of all, hazard identification. So, this is the process where we identify and list hazards and accidents associated with the system. You may notice that some words here are in bold. Where a word is in bold, we are going to give the definition of what it is later.
These hazards could lead to an accident but are only associated with the system. That’s the scope. If we were talking about a system that was an airplane, a ship, or a computer, we would have a very different scope. There would also be a different way that maybe accidents would happen.
On a more practical level, how do we do hazard identification? I’m not going to go into any depth here, but there are certain classic ones. We can consult with our workers and inspect the workplace where they’re operating. In some countries, that’s a legal requirement (Including in Australia where I live). Another option is looking at historical data. And indeed, in some countries and in some industries, that’s a requirement. A requirement means we have to do that. And we can use special analysis techniques. Now, I’m not going to talk about any of those analysis techniques today. You can watch some other sessions on The Safety Artisan to see that.
Hazard Analysis
Having done hazard identification, we’ve asked ourselves ‘What could go wrong?’. We can put some more detail on and ask, ‘How could it go wrong? And how often?’. That kind of stuff. So, we want to go into more detail about the hazards and accidents associated with this particular system. And that will help us to define some accident sequences. We can start with something that creates a hazard and then the hazard may lead to an accident. And that’s what we’re talking about. Later, we will show that using graphics can be helpful.
But again, more on terminology. In different industries, we call it different things. We tend to say ‘accident’ in the UK and Australia. In the U.S., they might call it a ‘mishap’, which is trying to get away from the idea that something was accidental. Nobody meant it to happen. Mishap is a more generic term that avoids that implication. We also talk about ‘losses’ or we talk about ‘breaches’ in the security world. We have some issues where somebody has been able to get in somewhere that they should not. And we can talk about accident sequences. Or, in a more common language, we call it a sequence of events. That’s all it is.
Risk Estimation
Now we’re talking about the risk estimation. We’ve thought about our hazards and accidents and how they might progress from one to another. Let’s think about, ‘How big is the risk of this actually happening?’. Again, we’ll unpack this further later at the next level. But for now, we’re going to talk about the systematic use of available information. Systematic- so, ordered. We’re following a process. This isn’t somebody on their own taking a subjective view ‘Look, I think it’s not that’. It’s a process that is repeatable. We want to do something systematic. It’s thorough, it’s repeatable, and so it’s defendable. We can justify the conclusions that we’ve come to because we’ve done it with some rigour. We’ve done it in a systematic way. That’s important. Particularly if we’re talking about harm coming to people or big losses.
Risk and ALARP / SFARP Evaluation
Now, risk evaluation is just taking that estimated risk just now and comparing it to something and saying, “How serious is this risk?”. Is it something that is very low? If it’s very insignificant then we’re not bothered about it. We can live with it. We can accept it. Or is it bigger than that? Do we need to do something more about it? Again, we want to be systematic. We want to determine whether risk reduction is necessary. Is this acceptable as it is or is it too high and we need to reduce it? That’s the core of risk evaluation.
Tolerability
In this UK-based standard – we’re using terminology is found in different forms around the world. But in the UK, they talk about ‘tolerability’. We’re talking about the absolute level of risk. There probably is an upper limit that’s allowed in the law or in our industry. And there’s a lower limit that we’re aiming for. In an ideal world, we’d like all our risks to be low-level risks. That would be terrific.
So, that’s ‘tolerability’. And you might hear it called different things. And then within the UK system, there are three classes of ‘tolerability’ at risk. We could say it’s either ‘broadly acceptable’- it’s very low. It’s down in the target region where we like to get all our risks. It’s ‘tolerable’- we can expose people to this risk or we can live with this risk, but only if we’ve met certain other criteria. And then there’s the risk that it’s so big. It’s so far up there, that we can’t do that. We can’t have that under any circumstances. It’s unacceptable. You can imagine a traffic light system where we have categorized our risk.
ALARP / SFARP
And then there’s the test of whether our risk can be accepted in the UK. It’s called ALARP. We reduce the risk As Low As Reasonably Practicable. And in other places, you’ll see SFARP. We’ve eliminated or minimized the risk So Far As Is Reasonably Practicable. In the nuclear industry, they talk about ALARA: As Low As Reasonably Achievable. And then different laws use different tests. Whichever one you use, there’s a test that we have to say, “Can we accept the risk?” “Have we done enough risk reduction?”. And whatever you’ve put in those square brackets, that’s the test that you’re using. And that will vary from jurisdiction to jurisdiction. The basic concept of risk evaluation is estimating the level of risk. Then compare it to some standard or some regulation. Whatever it might be, that’s what we do. That’s risk evaluation.
Risk Reduction
We’ve asked, “Do we need to reduce risk further?”. And if we do, we need to do some risk reduction. Again, we’re being systematic. This is not some subjective thing where we go “I have done some stuff, it’ll be alright. That’s enough.”. We’re being a bit more rigorous than that. We’ve got a systematic process for reducing risk. And in many parts of the world, we’re directed to do things in a certain way.
Elimination
This is an illustration from an Australian regulation. In this regulation, we’re aiming to eliminate risk. We want to start with the most effective risk reduction measures. Elimination is “We’ve reduced the risk to zero”. That would be lovely if we could do that but we can’t always do that.
Substitution
What’s the next level? We could get rid of this risk by substituting something less risky. Imagine we’ve got a combustion engine powering something. The combustion engine needs flammable fuel and it produces toxic fumes. It could release carbon monoxide and CO2 and other things that we don’t want. We ask, “Can we get rid of that?”. Could we have an electric motor and have a battery instead? That might be a lot safer than the combustion engine. That is a substitution. There are still risks with electricity. But by doing this we’ve substituted something risky for something less risky.
Isolation
Or we could isolate the hazard. Let’s use the combustion engine as an example again. We can say, “I’ll put that in the fuel and the exhaust somewhere, a long way from people”. Then it’ll be a long way from where it can do harm or cause a loss.” And that’s another way of dealing with it.
Engineering Controls
Or we could say, “I’m going to reduce the risks through engineering controls”. We could put in something engineered. For example, we can put in a smoke detector. A very simple, therefore highly reliable, device. It’s certainly more reliable than a human. You can install one that can detect some noxious gases. It’s also good if it’s a carbon monoxide detector. Humans cannot detect carbon monoxide at all. (Except if you’ve got carbon monoxide poisoning, you’ll know about it. Carbon monoxide poisoning gives you terrible headaches and other symptoms.) But of course, that’s not a good way to detect that you’re breathing in poisonous gas. We do not want to do it that way.
So, we can have an engineering control to protect people. Or we can use an interlock. We can isolate things in a building or behind a wall or whatever. And if somebody opens the door, then that forces the thing to cut out so it’s no longer dangerous. There are different things for engineering controls that we can introduce. They do not rely on people. They work regardless of what any person does.
Administrative / Procedural Controls
Next on the list, we could reduce exposure to the hazard by using administrative controls. That’s giving somebody some rules to follow a procedure. “Do this. Don’t do that.” Now, that’s all good. We can give people warning signs and warn people not to approach something. But, of course, sometimes people break the rules for good reasons. Maybe they don’t understand. Or, maybe they don’t know the danger. Perhaps they’ve got to do something or maybe the procedure that we’ve given them doesn’t work very well. It’s too difficult to get the job done, so people cut corners. So, procedural protection can be weak. And a bit hit-and-miss sometimes.
Personal Protective Equipment
Finally, we can give people personal protective equipment. We can give them some eye protection. I’m wearing glasses because I’m short-sighted. But you can get some goggles to protect your eyes from damage. Damage like splashes, flying fragments, sparks, etc. We can have a hard hat so that if we’re on a building site and something drops from above on us that protects the old brain box.
It won’t stop the accident from happening, but it will help reduce the severity of the accident. That’s the least effective. We’re doing nothing to prevent the accident from happening. We’re reducing the severity in certain circumstances. For example, if you drop a ton of bricks on me, it doesn’t matter whether I’m wearing a hard hat or not. I’m still going to get crushed. But with one brick, I should be able to survive that if I’m wearing a hard hat.
Risk Acceptance
Let’s move on to risk acceptance. At some stage, if we have reduced the risk to a point where we can accept it. That is, we can live with it and we’ve decided that we’re going to need to do whatever it is that is exposing us to the risk. We need to use the system. For example, we want to get in our car to enable us to go from A to B quickly and independently. So, we’re going to accept the risk of driving in our car. We’ve decided we’re going to do that. We make risk-acceptance decisions every day, often without thinking about it. We get in a car every day on average and we don’t worry about the risk, but it’s always there. We’ve just decided to accept it.
But in this example, it’s not an individual deciding to do something on the spur of the moment. Nor is it based on personal experience. We’ve got a systematic process where a bunch of people come together. The relevant stakeholders agree that a risk has been assessed or has been estimated and has been evaluated. They agree that the risk reduction is good enough and that we will accept that risk. There’s a bit more to it than you and I saying “That’ll be alright.”
Part 2
Let’s summarise where we’ve got to. We’ve talked about these six components of risk management. That’s terrific. And as you can see, they all go together. Risk evaluation and risk reduction are more tightly coupled. That’s because when we do some risk reduction, we then re-evaluate the risk. We ask ‘Can we accept it?’. If the answer is ‘No.’ we need to do some more work. Then we do some more risk reduction. So those tend to be a bit more coupled together at the end. That’s the level we’ve got to. We’re now going to go to the next level.
So, we’re going to explain these things. We’ve talked about hazard identification and hazard analysis, but what is a hazard? And what is an accident? And what is an accident sequence? We’re going to unpack that a bit more. We’re going to take it to the next level. And throughout this, we’re talking about risk over and over again. Well, what is ‘risk’? We’re going to unpack that to the next level as well.
This is a safety standard. We’re talking about harm to people. How likely is that harm and how severe might it be? But it might be something else. It might be a loss or a security breach. Or a financial loss, a negative result for our project. We might find ourselves running late. Or we’re running over budget. We might be failing to meet quality requirements. Or we’re failing to deliver the full functionality that we said we would. Whatever it might be.
Hazard
So, let’s unpack this at the next level. A hazard is a term that we use, particularly in safety. As I say, we call it other things in different realms. But in the safety world, it’s a physical situation or it’s a state of a system.
As it says, it often follows from some initiating event that we may call a ‘cause’. The hazard may lead to an accident. However, the key thing to remember is once a hazard exists, an accident is possible, but it’s not certain. You can imagine the sort of cartoon banana skin on the pavement gag. Well, the banana skin is the hazard. In the cartoon, the cartoon character always steps on the banana skin. They always fall over the comic effect. But in the real world, nobody may tread on the banana skin and slip over. There could be nobody there to slip over all the banana skin. Or even if somebody does, they could catch themselves. Or they fall, but it’s on a soft surface and they don’t hurt themselves so there’s no harm.
So, the accident isn’t certain. And in fact, we can have what we call ‘non-accident’ outcomes. We can have harmless consequences. A hazard is an important midway step. I heard it called an accident waiting to happen, which is a helpful definition. An accident waiting to happen, but it doesn’t mean that the accident is inevitable.
Accident
But accidents can happen. Again, the ‘accident’, ‘mishap’, or ‘unintended event’. Something we did not want or a sequence of events that caused harm. And in this case, we’re talking about harm to people. And as I say, it might be a security breach. It might be a financial loss or reputational damage. Something might happen that is very embarrassing for an organization or an individual. Or again, we could have a hiccup with our project.
Harm
But in this case, we’re talking about harm. With this kind of standard, we’re using what you might call a body count approach to the harm. We’re talking about actual death, physical injury, or damage to the health of people.
This standard also considers the damage to property and the environment. Now, very often we are legally required to protect people and the environment from harm. Property less so. However, there will be financial implications of losses of property or damage to the systems. We don’t want that. But it’s not always criminally illegal to do that. Whereas usually, hurting people and damaging the environment is. So, this is ‘harm’. We do not want this thing to happen. We do not want this impact.
Safety is a much tougher business in this instance. If we have a problem with our project, it’s embarrassing but we could recover it. It’s more difficult to do that when we hurt somebody.
Risk
And always in these terms, we’re talking about ‘risk’. What is ‘risk’? Risk is a combination of two things. It’s a combination of the likelihood of harm or loss and the severity of that harm or loss. It’s those two things together. And we’ve got a very simple illustration here, a little table. And they’re often known as a risk matrix but don’t worry about that too much. Whatever you want to call it. We’ve got a little two by two table here and we’ve got likelihood in the white text and severity in the black.
Low Risk
We can imagine where there’s a risk where we have a low likelihood of a ‘low harm’ or a ‘low impact’ accident or outcome. We say, ‘That’s unlikely to happen, and even if it does not much is going to happen.’ It’s going to be a very small impact. So, we’d say that that’s a low risk.
Then at the other end of the spectrum, we can imagine something that has a high likelihood of happening. And that likelihood also has a high impact. Things that happen that we definitely do not want to happen. And we say, ‘That’s a high risk and that’s something that we are very, very concerned about.’
Medium Risk
And then in the middle, we could have a combination of an outcome that is quite likely, but it’s of low severity. Or it’s of high severity, but it’s unlikely to happen. And we say, ‘That’s a medium risk’.
Now, this is a very simplified matrix for teaching purposes only. In the real world, you will see matrices that are four by four, five by five, or even six by six, or combinations thereof. And in security where they talk about threat and vulnerability and the outcomes. Here, you might see multiple matrices used. They use multiple matrices to progressively build up a picture of the risk. They use matrices as building blocks. So, it may not be only one matrix used in a more complex thing you’ve got to model. But here we’ve got a nice, simple example. This illustrates what risk is. It’s a combination of severity and likelihood of harm or loss. And that’s what risk is, fundamentally. And if we have a firm grasp of these fundamentals, it’ll help us to reason and deal with almost anything. With enough application.
Accident Sequence
Now, let’s move on and talk about accident sequences. We’re talking about a progression in this case. We’re imagining a left-to-right path. A progression of events that results in an accident. This diagram, which looks like a bow tie, is meant to represent the idea that we can have one hazard. There might be many causes that lead to this hazard. There might be many different things that could create the hazard or initiate the hazard. And the hazard may have many different consequences.
Consequences
As I’ve said before, nothing at all may happen. That might be the consequence of the hazard. Most of the time that’s what’s going to happen. But there may be a variety of consequences. Somebody might get a minor injury or there might be a more serious accident where one or more people are killed. A good example of this is fire. So, the hazard is the fire. The causes might be various. We could be dealing with flammable chemicals, or a lightning strike, or an electricity arc flash. Or we could be dealing with very high temperatures where things spontaneously burst into flames. Or we could have a chemical in the presence of pure oxygen. Some things will spontaneously burst into flames in the presence of pure oxygen. So there’re a variety of causes that lead to the fire.
An Example
And the fire might be very small and burn itself out. It causes very little damage and nobody gets hurt. Or it might lead to a much bigger fire that, in theory, could kill lots of people. So, there’s a huge range of consequences potentially from one hazard. But the accident sequence is how we would describe and capture this progression. From initiating events to the hazard to the possible consequences. And by modeling the accident sequence, of course, we can think about how we could interrupt it.
Part 3
We’ve broken risk management down into those six constituent parts. We’ve gone to the next level, in that we’ve sort of gone down to the concepts that underpin these things. These hazards, the accidents, and the accident sequence. We’ve talked about risk itself and what we don’t want to happen. The harm, the loss, the financial loss, the embarrassment, the failed or late or budget project, a security breach, the undesired event, etc. We had an objective which was to do something safely or to complete a project and the risk is that that won’t happen. That there’ll be an impact on what we were trying to do that is negative. That is undesirable.
There are just only more concepts that we need to look at to complete the pattern, as you can see. We’ve been talking about the system. And we’ve been talking about doing things systematically. Then a system works in an operating environment. So, let’s unpack that.
System
First of all, we have a system. The system is going to be a combination of things. I wouldn’t call a pen or a pencil a system. It’s only got a couple of components. You could pull it apart. But it’s too simple to be worth calling it a system. We wouldn’t call it a pen system, would we? So, a system is something more complex. It’s a combination of things and we need to define the boundary. I’ll come back to that.
But within this boundary, we’ve got some different elements in the system that work together. Or they’re used together within a defined operating environment. So, we’re going to expose this system to a range of conditions in which it is designed to work. The intention is the system is going to do whatever it does to perform a given task. It can do one defined task or achieve a specific purpose.
I talked before about getting in our car. A car is complex enough to be called a system. We get in our car and we drive it on the roads. Or if we’ve got a four-wheel drive, we can drive Off-Road. Or we can use it in a more demanding operating environment to achieve a specific purpose. We want to transport ourselves, and sometimes some stuff, from A to B. That’s what we’re trying to do with the system.
Within the System
And within that system, we may have personnel/people, we may have procedures. A bunch of rules about how you drive a car legally in different countries. We’ve got materials and physical things – what the car is made of. We could have tools to repair it, and change wheels. We’ve got some other equipment, like a satnav. We’ve got facilities. We need to take a car somewhere to fill up with fuel or to recharge it. We’ve got services like garages, repairs, servicing, etc. And there could be some software in there as well. Of course, these days in the car, there’s software everywhere in most complex devices.
So, our system is a combination of lots of different things. These things are working together to achieve some kind of goal or some kind of result. There’s somewhere we want to get to. And it’s designed to work in a particular operating environment. Cars work on roads really well. Off-road cars can work on tracks. Put them in deep water, they tend not to work so well. So, let’s talk about that operating environment.
Operating Environment
What we’ve got here, is the total set of all external, natural, and induced conditions. (That’s external to the system, so outside the boundary.) So, it might be these conditions-. It might be natural or it might be generated by something else, which a system is exposed to at any given moment. We need to get a good understanding of the system, the operating environment, and what we want it to do.
If we have a good understanding of those three things, then we will be well on the way to being able to understand the risks associated with that system. That’s one of the key things with risk management. If you’ve got those three things, that’s crucial. You will not be able to do effective risk management if you don’t have a grasp of those things. And if you do have a thorough grasp of those things, it’s going to help you do effective risk management.
Conclusion
So, we’ve talked about risk management. We’ve broken it down into some big sections. Those six sections; the hazard identification; analysis; risk estimation; evaluation; reduction; and acceptance. We’ve seen how those things depend on only a few concepts. We’ve got the concepts of ‘hazards’, ‘risks’, and ‘accidents’. As well as the undesirable consequences that the risk might result in. The risk is measured based on the likelihood and severity of that harm or loss occurring.
When we’re dealing with a more complex system, we need to understand that system and the environment in which it operates. Of course, we’ve put it in that environment for a purpose. And that unpacking has allowed us to break down quite a big concept, risk management. A lot of people, like myself, spend years and years learning how to do this. It takes time to gain experience because it’s a complex thing. But if we break it down, we can understand what we’re doing. We can work our way down the fundamentals. And then if we’ve got a good grasp of the fundamentals, that supports getting the more complex stuff right. So, that’s what risk management is all about. That’s your risk management 101 and I hope that you find that helpful.
Copyright Statement
I just need to say briefly that those quotations from the standard. I can do that under a Creative Commons license. The CC4.0. That allows me to do that within limits that I am careful to observe. But this video presentation is copyrighted by the Safety Artisan.
For More…
And you can see more like these at the Safety Artisan website. That’s www.safetyartisan.com. And as you can see, it’s a secure site so you can visit without fear of a security breach. So, do head over there. Subscribe to the monthly newsletter to get discounts on paid videos and regular updates of what’s coming up. both paid and free.
So, it just remains for me to say thanks very much for watching and I look forward to catching up with you again very soon.
In this module, System Safety Risk Analysis, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.
To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context.
This wider context is operational effectiveness – we want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.
And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work to produce specified results.
We’ve got three things here: a system; the operating environment in which it is designed to work; and, we have its function or application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?
That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.
This is Module 2 of SSRAP
This is Module 2 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
TL;DR Updating Legal Presumptions for Computer Reliability must happen if we are to have justice!
Background
The ‘Horizon’ Scandal in the UK was a major miscarriage of justice:
Between 1999 and 2015, over 900 sub postmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions carried out by the Post Office. Other sub postmasters were prosecuted but not convicted, forced to cover Horizon shortfalls with their own money, or had their contracts terminated. The court cases, criminal convictions, imprisonments, loss of livelihoods and homes, debts and bankruptcies, took a heavy toll on the victims and their families, leading to stress, illness, family breakdown, and at least four suicides.
‘Horizon’ was a faulty computer system, produced by Fujitsu. The Post Office had lobbied the British Government to reverse the burden of proof so that courts assumed that computer systems were reliable until proven otherwise. This made it very difficult for sub-postmasters – small-business franchise owners – to defend themselves in court.
A 1984 act of parliament ruled that computer evidence was only admissible if it could be shown that the computer was used and operating properly. But that act was repealed in 1999, just months before the first trials of the Horizon system began. When post office operators were accused of having stolen money, the hallucinatory evidence of the Horizon system was deemed sufficient proof. Without any evidence to the contrary, the defendants could not force the system to be tested in court and their loss was all but guaranteed.
Alex Hern writing in The Guardian in January 2024.
This shocking miscarriage of justice was based on an equally shocking presumption. One that anyone with a background in software development would find ridiculous.
Introduction
Legal experts warn that failure to immediately update laws regarding computer reliability could lead to a recurrence of scandals like the Horizon case. Critics argue that the current presumption of computer reliability shifts the burden of proof in criminal cases, potentially compromising fair trials.
The Presumption of Computer Reliability
English and Welsh law assume computers to be reliable unless proven otherwise, a principle criticized for its reversal of the burden of proof. Stephen Mason, a leading barrister in electronic evidence, emphasizes the unfairness of this presumption, stating it impedes individuals from challenging computer-generated evidence.
It is also patently unrealistic. As I explain in my article on the Principles of Safe Software Development, there are numerous examples of computer systems going wrong:
Drug Infusion Pumps,
The NASA Mars Polar Lander,
The Airbus A320 accident at Warsaw,
Boeing 777 FADEC malfunction,
Patriot Missile Software Problem in Gulf War II, and many more…
Making software dependable or safe requires enormous effort and care.
Historical Context and the Horizon Scandal
Dating back to an old common law principle, presuming the reliability of mechanical systems, the UK Post Office also lobbied to have the principle applied to digital systems. The implications of this change became evident during the Horizon scandal, where flawed computer evidence led to wrongful accusations against post office operators. Repealing a 1984 act further weakened safeguards against unreliable computer evidence, exacerbating the issue.
International Influence and Legal Precedents
The influence of English common law extends internationally, perpetuating the presumption of computer reliability in legal systems worldwide. Mason highlights cases from various countries supporting this standard, underscoring its global impact.
“[The Law] says, for the person who’s saying ‘there’s something wrong with this computer’, that they have to prove it. Even if it’s the person accusing them who has the information.”
Advancements in AI technology intensify the need to reevaluate legal presumptions. Noah Waisberg, CEO of Zuva, warns against assuming the infallibility of AI systems, which operate probabilistically and may lack consistency.
With a traditional rules-based system, it’s generally fair to assume that a computer will do as instructed. Of course, bugs happen, meaning it would be risky to assume any computer program is error-free…Machine-learning-based systems don’t work that way. They are probabilistic … you shouldn’t count on them to behave consistently – only to work in line with their projected accuracy…It will be hard to say that they are reliable enough to support a criminal conviction.
Noah Waisberg
This poses significant challenges in relying on AI-generated evidence for criminal convictions.
Section 5: Proposed Legal Reforms
James Christie is a software consultant, who co-authored recommendations for an update to the UK law. He proposes two-stage reforms to address the issue.
The first would require providers of evidence to show the court that they have developed and managed their systems responsibly, and to disclose their record of known bugs … If they can’t … the onus would then be on the provider of evidence to show the court why none of these failings or problems affect the quality of evidence, and why it should still be considered reliable.
First, evidence providers must demonstrate responsible development and management of their systems, including disclosure of known bugs. Second, if unable to do so, providers must justify why these shortcomings do not affect the evidence’s reliability.
The Reality of Software Development
First of all, we need to understand how mistakes made in software can lead to failures and ultimately accidents.
Errors in Software Development
This is illustrated well by this standard BS 5760. We see that during development people, either on their own or using tools make mistakes. That’s inevitable. And there will be many mistakes in the software – as we will see. These mistakes can lead to faults or defects being present in the software. Again, inevitably, some of them get through.
BS 5760-8:1998. Reliability of systems, equipment and components. Guide to assessment of the reliability of systems containing software
If we jump over the fence, the software is now in use. All these faults are in the software but they lie hidden. Until that is, some revealing mechanism comes along and triggers them. That revealing mechanism might be a change in the environment and operator scenario or changing inputs that maybe the software is seeing from sensors.
That doesn’t mean that a failure is inevitable because lots of errors don’t lead to failures that matter. But some do. And that is how we get from mistakes to false or defects in the software to run time errors.
What Happens to Errors in Software Products?
A long time ago (1984!), a very well-known paper in the IBM Journal of Research looked at how long it took faults in IBM operating system software to become failures for the first time. We are not talking about cowboys producing software on the web that may or may not work okay, or people in their bedrooms producing apps. We’re talking about a very sophisticated product here that it was in use all around the world.
Yet, what Adams found was that lots of software faults took more than 5,000 operating years to be revealed. He found that more than 90% of faults in the software would take longer than 50 years to become failures.
‘Optimizing Preventive Service of Software Products’ Edward N. Adams, IBM Journal of Research and Development, 1984, Vol 28, Iss. 1
There are two things that Adams’s work tells us.
First, in any significant piece of software, there is a huge reservoir of faults waiting to be revealed. So if people start telling you that their software contains no defects or faults, either they’re dumb enough to believe that or they think you are. What we see in reality is that even in a very high-quality software product, there are a lot of latent defects.
Second, many of them – the vast majority of them – will take a long, long time to reveal themselves. Testing will not reveal them. Using Beta versions will not reveal them. Fifty years of use will not reveal them. They’re still there.
Legal experts stress the urgency of updating laws to reflect the fallibility of computers, crucial for ensuring fair trials and preventing miscarriages of justice. The UK Ministry of Justice acknowledges the need for scrutiny, pending the outcome of the Horizon inquiry, signaling a potential shift towards addressing issues of computer reliability in the legal framework.
Hopefully, the legal people will come to realize what software engineers have known for a long time. Software reliability is difficult to achieve and must be demonstrated.
What are the Hazard and Risk basics? So, what is this risk analysis stuff all about? What is ‘risk’? How do you define or describe it? How do you measure it? When? Why? Who…?
In this free session, I explain the basic terms and show how they link together, and how we can break them down to perform risk analysis. I understand hazards and risks because I’ve been analyzing them for a long time. Moreover, I’ve done this for aircraft, ships, submarines, sensors, command-and-control systems, and lots of software!
Everyone does it slightly differently, but my 25+ years of diverse experience lets me focus on the basics. That allows me to explain it in simple terms. I’ve unpacked the jargon and focus on what’s important.
Let’s get started with Module One. We’re going to recap some Risk basics to make sure that we have a common understanding of risk. And that’s important because risk analysis is something that we do every day. Every time you cross the road, or you buy something expensive, or you decide whether you’re going to travel to something, or look it up online, instead.
You’re making risk analysis decisions all the time without even realizing it. But we need something a little bit more formal than the instinctive thinking of our risk that we do all the time. And to help us do that, we need a couple of definitions to get us started.
What is Risk?
First of all, what is Risk? It’s a combination of two things. First, the severity of a mishap or accident. Second, the probability that that mishap will occur. So it’s a combination of severity and probability. We will see that illustrated in the next slide.
We’ll begin by talking about ‘mishap’. Well, what is a mishap? A mishap is an event – or a series of events -resulting in unintentional harm. This harm could be death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.
The particular standard we’re looking at today covers a range of different harms. That’s why we’re focused on safety. And the term ‘mishap’ will also include negative environmental impacts from planned events. So, even if the cause is a deliberate event, we will include that as a mishap.
Probability and Severity
I said that the definition of risk was a combination of probability and severity. Here we got a little illustration of that…
This is Module 1 of SSRAP
This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
This post, ‘SSRAP: Start the Course’, gives an overview of System Safety Risk Assessment Programs. It describes the Learning Objectives of the Course and its five modules. We’re going to learn how to:
Describe fundamental risk concepts.
Explain what a Systems Safety Approach to Risk is.
Define within that System Safety Approach, what a Risk Analysis Program is.
List Hazard Analysis Tasks that make up a program.
Welcome to this course on System Safety Risk Analysis Programs. It’s a five-part course for beginners and practitioners. It will also benefit a wider range of people.
Learning Objectives
In this course, we will learn how to do several things. First of all, we’re going to learn how to describe fundamental risk concepts. We’re going to explain what a Systems Safety Approach to Risk is and what it does. We will define within that System Safety Approach, what a Risk Analysis Program is. We’re going to be able to list Hazard Analysis Tasks that make up a program. We’ll be able to select tasks to meet our needs.
At the end of this task, we should be able to design a tailored Risk Analysis Program for any application. And also, we’re going to learn how to get some more information resources on how to do that.
Topics for this Course
So how is that going to work? Well. In five modules. In Module One, we’re going to go over some risk basics. The reason for this is to make sure we’ve got a common understanding.
In Module Two, we’re going to look at Systems Safety Risk Analysis. What it is, what it does, and the benefits it delivers.
In Module Three, we will look at a particular System Safety Program Standard. We will understand what it was designed to do and learn what it’s good and not so good at.
In Module Four, we’re going to take all the previous knowledge from Modules One to Three and put it together. We will use that information to design a Risk Analysis Program. This information can also help design any number of programs depending on what we want to do.
And then finally, in Module Five, we’ll look at where to get more resources to take us deeper to the next level…
This is SSRAP: Start of the Course
This is Module 1 from the System Safety Risk Assessment Program (SSRAP) Course. Risk Analysis Programs – Design a System Safety Program for any system in any application.
The full course comprises 15 lessons and 1.5 hours of video content, plus resources. It’s on pre-sale at HALF PRICE until September 1st, 2024. Check out all the free preview videos hereand order using the coupon “Pre-order-Half-Price-SSRAP”. But don’t leave it too long because there are only 100 half-price courses available!
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.
In this post, we will look at Three Insightful Methods for Causal Analysis. Only three?! If you search online, you will probably find eight methods coming up:
Pareto Charts;
Failure Mode and Effect Analysis (FMEA);
Five Whys;
Ishikawa Fishbone Diagram;
Fault Tree Analysis;
8D Report Template Checklist;
DMAIC Template; and
Scatter Diagrams.
However, not all these methods are created equal! Only some provide real insight to the challenge of causal analysis. So, I’ve picked the best ones – based on my 25 years’ experience in system safety – and put them in this post.
What are Causes and Why are They Important?
Before we go any further, I just want to explain some basic terms. When we’re doing safety analysis we have hazards and as the sort of bow tie diagram suggests, one hazard can have many causes and one hazard can have many consequences.
The Accident Sequence Illustrated.
Now, some of those consequences will be harmless but some may result in harm to people. And that progression from causes to hazards to consequences is known as an accident sequence. We tend to Okay? So we’re looking at the worst-case scenario where somebody gets hurt.
(It’s not really the focus of this post, but the test for a hazard is it’s necessary for the accident. If there’s no hazard, there’s no accident. Once the hazard is present, nothing else weird or unusual needs to happen. For the accident to occur. So, the hazard is both necessary and sufficient.)
I’ve mentioned consequences, but today we’re talking about causes. So, we will analyze the left-hand side of the bow tie.
Three Insightful Causal Analysis Methods
Pareto Analysis
So, let’s start with a Pareto Analysis. I suspect most of us have seen this before. If we look at the causes of a certain outcome. What we often find is that a few causes are dominant.
An Example of a Pareto Chart.
In this chart, we’ve got types of medication errors. In this case ‘a dose missed,’ ‘wrong time,’ ‘wrong drug,’ and then ‘overdose’ accounts for 70% of the causation. Everything else is only 30%.
(Now, here they drew a line at 80% as the cutoff because sometimes Pareto is known as the eighty-twenty rule. And that’s suggesting that maybe 80% of the outcome is caused by 20 percent of the inputs or causes. In other words, most of the output variable is driven by only 20% of the input variables. That’s just a rule of thumb, and it doesn’t have to be 80/20, it might be 70/30, or 60/40, it doesn’t matter.)
The point is there are some dominant causes. If we can identify the dominant causes, and we work hard on just those top 2, 3, 4, or 5 causes, then we can get a disproportionate reduction in risk by concentrating on those few things. Whereas, we could spend an awful lot of effort at attacking all the other causes and make very little difference.
It’s a simple technique, but by being led by the data we can become far more effective at risk management.
So an Ishikawa diagram or a fishbone diagram, as it’s often called for obvious reasons. Is a causal diagram (Image By FabianLange at de.wikipedia), and it’s often used.
Example of an Ishiawa, or Fishbone, Diagram Structured for Causal Analysis.
In accident investigations, the Ishikawa diagram becomes a vital tool. I recall learning its application through the tragic case of the Piper Alpha oil rig disaster. Despite the grim nature of such events, they demand thorough causal analysis. Whether we opt for predefined groupings like equipment, process, people, materials, environment, and management, or let the data guide us, the essence remains unchanged: we investigate accidents to identify potential outcomes or problems and determine their contributing factors.
What makes this method invaluable is its ability to transcend technical issues alone. By encouraging us to consider the broader socio-technical environment, it prompts a holistic view of complex systems. The diagram visually represents primary causes directly linked to the main ‘fishbone’ of analysis, while secondary causes may contribute to or stem from these primary factors. The potential for tertiary causes exists in theory, but it may complicate matters without appropriate tools.
Utilizing this technique for brainstorming is highly effective. Displaying it on a whiteboard and collectively contemplating it as a group fosters focused discussions. Subsequently, formal documentation in various formats ensures thorough record-keeping. This method proves particularly powerful for unraveling complexities within systems, a topic worthy of a dedicated webinar.
Fault Tree Analysis
Fault Tree Analysis is another widely used technique. We’ll have a webinar devoted to FTA later.
The Eight Disciplines Method
The Eight Disciplines method is one of those I often get mixed up with something else. It was introduced by the Ford Motor Co. (I’ve never used it) but it looks like a sensible method. There are actually nine steps:
Prepare and Plan
Form your Team
Identify the Problem
Develop an Interim Containment Plan
Verify Root Causes & Escape Points
Choose Permanent Corrective Actions
Implement Corrective Actions
Take Preventative Measures
Celebrate with Your Team!
Effective problem-solving requires careful planning, especially when it’s a team effort. Let’s break it down into three key steps:
Immediate Action: Start by addressing the urgency. What can we do right now to contain the problem while we develop a more comprehensive solution? It’s crucial to manage the issue in the short term as we work on a more refined approach.
Identify Root Causes: Investigate when and how the situation spiraled out of control. Pinpoint the opportunities for errors within the process. Understanding the root causes and timing issues is essential before moving forward.
Implement Permanent Solutions: Now that we’ve dissected the problem, it’s time to implement long-term corrective actions. This involves establishing better control measures and preventive strategies to avoid similar issues in the future.
Finally, it’s important to celebrate with your team once the solution is in place. Whether it’s going out for a meal or another form of recognition, acknowledging the effort is crucial.
This structured approach acknowledges the multi-stage nature of problem-solving. It emphasizes the need for short-term fixes, data-driven decision-making for long-term solutions, and proactive measures to prevent recurrences. Even if you take away nothing else, remembering these key points can guide you through the process. For more detailed information, check out the provided link, and stay tuned for a downloadable PDF with additional resources.
Bonus – Cause Analysis Reports
And a little bonus here, something I picked up while looking through this stuff if you go to smartsheet.com, you’ll find a whole bunch of nice templates on course analysis reports. Okay? So I haven’t been through them all but there looks like quite a lot of good stuff in there if you’re interested.
We’ve created root cause analysis templates you can use to complete your own investigations. Whether you need root cause analysis Excel templates, a root cause analysis template for Word, or a PDF template, we have one that’s right for your organization.”
Interested in accessing more content from the Safety Artisan? Head over to my Thinkific platform, where you’ll find my courses and all the webinars available at the academy. Plus, you can test it out with a 7-day free membership trial. For those looking for an extended trial, use the code ‘one-month-free‘ to enjoy a full month on us. I am continually updating our content, adding new material every month to keep things fresh.
Additionally, sign up for free email updates to stay informed about upcoming webinars and other exciting events.
Meet the Author
Learn safety engineering with me, an industry professional with 25 years of experience, I have:
•Worked on aircraft, ships, submarines, ATMS, trains, and software;
•Tiny programs to some of the biggest (Eurofighter, Future Submarine);
•In the UK and Australia, on US and European programs;
•Taught safety to hundreds of people in the classroom, and thousands online;
•Presented on safety topics at several international conferences.