In this module, on System Safety Risk Assessment, we’re going to look at how we deal with the complexity of the real world. We do a formal risk analysis because real-world scenarios are complex. The Analysis helps us to understand what we need to do to keep people safe. Usually, we have some moral and legal obligation to do it as well. We need to do it well to protect people and prevent harm to people.
What is System Safety?
To start with, here’s a little definition of system safety. System safety is the application of engineering and management principles, criteria, and techniques to achieve acceptable risk within a wider context. This wider context is operational effectiveness – We want our system to do something. That’s why we’re buying it or making it. The system has got to be suitable for its use. We’ve got some time and cost constraints and we’ve got a life cycle. We can imagine we are developing something from concept, from cradle to grave.
And what are we developing? We’re developing a system. An organization of hardware, (or software) material, facilities, people, data and services. All these pieces will perform a designated function within the system. The system will work within a stated or defined operating environment. It will work with the intention to produce specified results.
We’ve got three things there. We’ve got a system. We’ve got the operating environment within which it works- or designed to work. And we have the thing that it’s supposed to produce; its function or its application. Why did we buy it, or make, it in the first place? What’s it supposed to do? What benefits is it supposed to bring humankind? What does it mean in the context of the big picture?
That’s what a system is. I’m not going to elaborate on systems theory or anything like that. That’s a whole big subject on its own. But we’re talking about something complex. We’re not talking about a toaster. It’s not consumer goods. It’s something complicated that operates in the real world. And as I say, we need to understand those three things – system, environment, purpose – to work out Safety.
We Need A Process
We’ve sorted our context. How is all this going to happen? We need a process. In the standard that we’re going to look at in the next module, we have an eight-element process. As you can see there, we start with documenting our approach. Then we identify and document hazards. We document everything according to the standard so forget that.
We assess risk. We plan how we’re going to mitigate the risk. We identify risk mitigation measures or controls as there are often known. Then we apply those controls to reduce risk. We verify and confirm that the risk reduction that we have achieved, or that we believe we will achieve. And then we got to get somebody to accept that risk. In other words, to say that it is an acceptable level of risk. That we can put up with this level of risk in exchange for the benefits that the system is going to give us. Finally, we need to manage risk through the entire lifecycle of the system until we finally get rid of it.
The key point about this is whatever process we follow, we need to approach it with rigor. We stick to a systematic process. We take a structured and rigorous approach to looking at our system.
And as you can see there from the arrows, every step in the eight-element sequence flows into the next step. Each step supports and enables the following steps. We document the results as we go. However, even this example is a little bit too simple.
A More Realistic Process
So, let’s get a more realistic process. What we’ve got here are the same things we’ve had before. We’ve established the context at the beginning. Next, there’s risk assessment. Risk assessment consists of risk identification, risk analysis, and risk evaluation. It asks ‘Where are we?’ in relation to a yardstick or framework that categorizes risk. The category determines whether a risk is acceptable or not.
After determining whether the risk is acceptable or not, we may need to apply some risk treatment. Risk Treatment will reduce the risk further. By then we should have the risk down to an acceptable level.
So, that’s the straight-through process, once through. In the real world, we may have to go around this path several times. Having treated the risk over a period of time, we need to monitor and review it. We need to make sure that the risk turns out, in reality, to be what we estimated it to be. Or at least no worse. If it turns out to be better- Well, that’s great!
And on that monitoring and review cycle, maybe we even need to go back because the context has changed. These changes could include using the system to do something it was not designed to do. Or modifying the system to operate in a wider variety of environments. Whatever it might be, the context has changed. So, we need to look again at the risk assessment and go round that loop again.
And while we’re doing all that, we need to communicate with other people. These other people include end-users, stakeholders, other people who have safety responsibilities. We need to communicate with the people who we have to work with. And we have to consult people. We may have to consult workers. We may have to consult the public, people that we put at risk, other duty holders who hold a duty to manage risk. That’s our cycle. That’s more realistic. In my experience as a safety engineer, this is much more realistic. A once-through process often doesn’t cut it.
Required Risk Reduction
We’re doing all this to drive risk down to an acceptable level. Well, what do we mean by that? Well, there are several different ways that we can do this, and I’ve got to illustrate it here. On the left-hand side of the slide, we have what’s usually known as the ALARP triangle. It’s this thing that looks a bit like a carrot where the width of the triangle indicates the amount of risk. So, at the top of the triangle, we’ve got lots of risks. And if you’re in the UK or Australia where I live, this is the way it’s done. So there will be some level of risk that is intolerable. Then if the risk isn’t intolerable, we can only tolerate it or accept it if it is ALARP or SFARP. And ALARP means that we’ve reduced the risk as low as reasonably practicable. And SFARP means so far as is reasonably practicable. Essentially, they’re the same thing – reasonably practical.
We must ensure that we have applied all reasonably practicable risk reduction measures. And once we’ve done so, if we’re in this tolerable or acceptable region, then we can live with the risk. The law allows us to do that.
That’s how it’s done in the UK and Australia. But in other jurisdictions, like the USA, you might need to use a different approach. A risk matrix approach as we can see on the right-hand side of this slide. This particular risk matrix is from the standard we’re about to look at. And we could take that and say, ‘We’ve determined what the risk is. There is no absolute limit on how much risk we can accept. But the higher the risk, the more senior level of sign-off from management we need’. In effect, you are prioritizing the risk. So you only bring the worst risks to the attention of senior management. You are asking ‘Will you accept this? Or are you prepared to spend the money? Or will you restrict the operational system to reduce the risk?’. This is good because it makes people with authority consider risks. They are responsible and need to make meaningful decisions.
In short, different approaches are legal in different jurisdictions.
Summary of Module
In Module Two, we’ve asked ourselves, ‘How can we deal with real-world complexity?’. And one way that’s developed to do that is System Safety. System Safety is where we take a systematic approach to safety. This approach applies to both the system itself – the product – and the process of System Safety.
We address product and process. We need that rigorous process to give us confidence that what we’ve done is good enough. We have a realistic, useful and powerful process that enables us to put things in context. It helps us to communicate with everyone we need to, to consult with those that we have a duty to consult with. And also, we put around the basic risk process, this monitoring and review. And of course, we analyze risk to reduce it to acceptable levels. So we’ve got to treat the risk or reduce it or control it in some way to get it to those acceptable levels. In the end, it’s all about getting that required risk reduction to work. That reduction makes the risk acceptable to expose human beings to, for the benefit that it will give us.