System Safety Concepts, Part 2

There are two versions of the System Safety Concepts video. The short version is available in a post here, as well as at the Safety Artisan Patreon page and on my YouTube channel.

The full version of the video is only available at the Safety Artisan Patreon page. The transcript is below.

Transcript, ‘System Safety Concept’ (Full)

Hi everyone, and welcome to the safety artisan where you will find professional pragmatic, and impartial advice on all thing’s safety. I’m Simon and welcome to the show today, which is recorded on the 23rd of September 2019. Today we’re going to talk about System safety concepts. A couple of days ago I recorded a short presentation on this, which is on the Patreon website and is also on YouTube.  Today we are going to talk about the same concepts but in much more depth.

Hence, this video is only available on the ‘Safety Artisan’ Patreon page. In the short session, we took some time picking apart the definition of ‘safe’. I’m not going to duplicate that here, so please feel free to go have a look. We said that to demonstrate that something was safe, we had to show that risk had been reduced to a level that is acceptable in whatever jurisdiction we’re working in.

And in this definition, there are a couple of tests that are appropriate that the U.K., but perhaps not elsewhere. We also must meet safety requirements. And we must define Scope and bound the system that we’re talking about a Physical system or an intangible system like a. A computer program or something. We must define what we’re doing with it what it’s being used for. And within which operating environment within which context is being used.  And if we could do all those things, then we can objectively say or claim that this system is safe. OK.  that’s very briefly that.

Topics

What we’re going to talk about a lot more Topics. We’re going to talk about risk accidents. The cause has a consequence sequence. They talk about requirements and. Spoiler alert. What I consider to be the essence of system safety. And then we’ll get into talking about the process. Of demonstrating safety, hazard identification, and analysis.

Risk Reduction and estimation. Risk Evaluation. And acceptance. And then pulling it all together. Risk management safety management. And finally, reporting, making an argument that the system is safe supporting with evidence. And summarizing all of that in a written report. This is what we do, albeit in different ways and calling it different things.

Risk

Onto the first topic. Risk and harm.  Our concept of risk. It’s a combination of the likelihood and severity of harm. Generally, we’re talking about harm. To people. Death. Injury. Damage to help. Now we might also choose to consider any damage to property in the environment. That’s all good. But I’m going to concentrate on. Harm. To people. Because. Usually. That’s what we’re required to do. By the law. And there are other laws covering the environment and property sometimes. That. We’re not going to talk.  just to illustrate this point. This risk is a combination of Severity and likelihood.

We’ve got a very crude. Risk table here. With a likelihood along the top. And severity. Downside. And we might. See that by looking at the table if we have a high likelihood and high severity. Well, that’s a high risk. Whereas if we have Low Likelihood and low severity. We might say that’s a low risk. And then. In between, a combination of high and low we might say that’s medium. Now, this is a very crude and simple example. Deliberately.

You will see risk matrices like this. In. Loads of different standards. And you may be required to define your own for a specific system, there are lots of variations on this but they’re all basically. Doing this thing and we’re illustrating. How we determine the level of risk. By that combination of severity. And likely, I think a picture is worth a thousand words. Moving online to the accident. We’re talking about (in this standard) an unintended event that causes harm.

Accidents, Sequences and Consequences

Not all jurisdictions just consider accidental event some consider deliberate as well. We’ll leave that out. A good example of that is work health and safety in Australia but no doubt we’ll get to that in another video sometime. And the accident sequences the progression of events. That results in an accident that leads to an. Now we’re going to illustrate the accident sequence in a moment but before we get there. We need to think about cousins.  here we’ve got a hazard physical situation of state system. Often following some initiating event that may lead to an accident, a thing that may cause harm.

And then allied with that we have the idea of consequences. Of outcomes or an outcome. Resulting from. An. Event. Now that all sounds a bit woolly doesn’t it, let’s illustrate that. Hopefully, this will make it a lot clearer. Now. I’ve got a sequence here. We have. Causes. That might lead to a hazard. And the hazard might lead to different consequences. And that’s the accident. See. Now in this standard, they didn’t explicitly define causes.

Cause, Hazard and Consequence

They’re just called events. But most mostly we will deal with causes and consequences in system safety. And it’s probably just easier to implement it. Whether or not you choose to explicitly address every cause. That’s often option step. But this is the accident Sequence that we’re looking at. And they this sort of funnels are meant to illustrate the fact that they may be many causes for one hazard. And one has it may lead to many consequences on some of those consequences. Maybe. No harm at all.

We may not actually have an accident. We may get away with it. We may have a. Hazard. And. Know no harm may befall a human. And if we take all of this together that’s the accident sequence. Now it’s worth. Reiterating. That just because a hazard exists it does not necessarily need. Lead to harm. But. To get to harm. We must have a hazard; a hazard is both necessary and sufficient. To lead to harmful consequences. OK.

Hazards: an Example

And you can think of a hazard as an accident waiting to happen. You can think of it in lots of different ways, let’s think about an example, the hazard might be. Somebody slips. Okay well while walking and all. That slip might be caused by many things it might be a wet surface. Let’s say it’s been raining, and the pavement is slippery, or it might be icy. It might be a spillage of oil on a surface, or you’d imagine something slippery like ball bearings on a surface.

So, there’s something that’s caused the surface to become slippery. A person slips – that’s the hazard. Now the person may catch themselves; they may not fall over. They may suffer no injury at all. Or they might fall and suffer a slight injury; and, very occasionally, they might suffer a severe injury. It depends on many different factors. You can imagine if you slipped while going downstairs, you’re much more likely to be injured.

And younger, healthy, fit people are more likely to get over a fall without being injured, whereas if they’re very elderly and frail, a fall can quite often result in a broken bone. If an elderly person breaks a bone in a fall the chances of them dying within the next 12 months are quite high. They’re about one in three.

So, the level of risk is sensitive to a lot of different factors. To get an accurate picture, an accurate estimate of risk, we’re going to need to factor in all those things. But before we get to that, we’ve already said that hazard need not lead to harm. In this standard, we call it an incident, where a hazard has occurred; it could have progressed to an accident but didn’t, we call this an incident. A near miss.

We got away with it. We were lucky. Whatever you want to call it. We’ve had an incident but no he’s been hurt. Hopefully, that incident is being reported, which will help us to prevent an actual accident in future.  That’s another very useful concept that reminds us that not all hazards result in harm. Sometimes there will be no accident. There will be no harm simply because we were lucky, or because someone present took some action to prevent harm to themselves or others.

Mitigation Strategies (Controls)

But we would really like to deliberately design out or avoid Hazards if we can. What we need is a mitigation strategy, we need a measure or measures that, when we put them into practice, reduce that risk. Normally, we call these things controls. Again, now we’ve illustrated this; we’ve added to the funnels. We’ve added some mitigation strategies and they are the dark blue dashed lines.

And they are meant to represent Barriers that prevent the accident sequence progressing towards harm. And they have dashed lines because very few controls are perfect, you know everything’s got holes in it. And we might have several of them. But usually, no control will cover all possible causes; and very few controls will deal with all possible consequences.  That’s what those barriers are meant to illustrate.

That idea that picture will be very useful to us later. When we are thinking about how we’re going to estimate and evaluate risk overall and what risk reduction we have achieved. And how we talk about justifying what we’ve done is good. That’s a very powerful illustration. Well, let’s move on to safety requirements.

Safety Requirements

Now. I guess it’s no great surprise to say that requirements, once met, can contribute directly to the safety of the system. Maybe we’ve got a safety requirement that says all cars will be fitted with seatbelts. Let’s say we’ll be required to wear a seatbelt.  That makes the system safer.

Or the requirement might be saying we need to provide evidence of the safety of the system. And, the requirement might refer to a process that we’ve got to go through or a set kind of evidence that we’ve got to provide. Safety requirements can cover either or both of these.

The Essence of System Safety

Requirements. Covering. Safety of the system or demonstrating that the system is safe. Should give us assurance, which is adequate confidence or justified confidence. Supported with evidence by following a process. And we’ll talk more about process. We meet safety requirements. We get assurance that we’ve done the right thing. And this really brings us to the essence of what system safety is, we’ve got all these requirements – everything is a requirement really – including the requirement. To demonstrate risk reduction.

And those requirements may apply to the system itself, the product. Or they may provide, or they may apply to the process that generates the evidence or the evidence. Putting all those things together in an organized and orderly way really is the essence of system safety, this is where we are addressing safety in a systematic way, in an orderly way. In an organized way. (Those words will keep coming back). That’s the essence of system safety, as opposed to the day-to-day task of keeping a workplace safe.

Maybe by mopping up spills and providing handrails, so people don’t slip over. Things like that. We’re talking about a more sophisticated level of safety. Because we have a more complex problem a more challenging problem to deal with. That’s system safety. We will start on the process now, and we begin with hazard identification and analysis; first, we need to identify and list the hazards, the Hazards and the accidents associated with the system.

We’ve got a system, physical or not. What could go wrong? We need to think about all the possibilities. And then having identified some hazards we need to start doing some analysis, we follow a process. That helps us to delve into the detail of those hazards and accidents. And to define and understand the accident sequences that could result. In fact, in doing the analysis we will very often identify some more hazards that we hadn’t thought of before, it’s not a straight-through process it tends to be an iterative process.

Risk Reduction

And what ultimately what we’re trying to do is reduce risk, we want a systematic process, which is what we’re describing now. A systematic process of reducing risk. And at some point, we must estimate the risk that we’re left with. Before and after all these controls, these mitigations, are applied. That’s risk estimation.  Again, there’s that systematic word, we’re going to use all the available information to estimate the level of risk that we’ve got left. Recalling that risk is a combination of severity and likelihood.

Now as we get towards the end of the process, we need to evaluate risk against set criteria. And those criteria vary depending on which country you’re operating in or which industry we’re in: what regulations apply and what good practice is relevant. All those things can be a factor. Now, in this case, this is a U.K. standard, so we’ve got two tests for evaluating risk. It’s a systematic determination using all the available evidence. And it should be an objective evaluation as far as we can make it.

Risk Evaluation

We should use certain criteria on whether a risk can be accepted or not. And in the U.K. there are two tests for this. As we’ve said before, there is ALARP, the ‘As Low As is Reasonably Practicable’ test, which says: Have we put into practice all reasonably practicable controls? (To reduce risk, this is risk reduction target). And then there’s an absolute level of risk to consider as well. Because even if we’ve taken all practical measures, the risk remaining might still be so high as to be unacceptable to the law.

Now that test is specific to the U.K, so we don’t have to worry too much about it. The point is there are objective criteria, which we must test ourselves or measure ourselves against. An evaluation that will pop out the decision, as to whether a further risk reduction is necessary if the risk level is still too high. We might conclude that are still reasonably practicable measures that we could take. Then we’ve got to do it.

We have an objective decision-making process to say: have we done enough to reduce risk? And if not, we need to do some more until we get to the point where we can apply the test again and say yes, we’ve done enough. Right, that’s rather a long-winded way of explaining that. I apologize, but it is a key issue and it does trip up a lot of people.

Risk Acceptance

Now, once we’ve concluded that we’ve done enough to reduce risk and no further risk reduction is necessary, somebody should be in a position to accept that risk.  Again, it’s a systematic process, by which relevant stakeholders agree that risks may be accepted. In other words, somebody with the right authority has said yes, we’re going to go ahead with the system and put it into practice, implement it. The resulting risks to people are acceptable, providing we apply the controls.

And we accept that responsibility.  Those people who are signing off on those risks are exposing themselves and/or other people to risk. Usually, they are employees, but sometimes members of the public as well, or customers. If you’re going to put customers in an airliner you’re saying yes there is a level of risk to passengers, but that the regulator, or whoever, has deemed [the risk] to be acceptable. It’s a formal process to get those risks accepted and say yes, we can proceed. But again, that varies greatly between different countries, between different industries. Depending on what regulations and laws and practices apply. (We’ll talk about different applications in another section.)

Risk Management

Now putting all this together we call this risk management.  Again, that wonderful systematic word: a systematic application of policies, procedures and practices to these tasks. We have hazard identification, analysis, risk estimation, risk evaluation, risk reduction & risk acceptance. It’s helpful to demonstrate that we’ve got a process here, where we go through these things in order. Now, this is a simplified picture because it kind of implies that you just go through the process once.

With a complex system, you go through the process at least once. We may identify further hazards, when we get into Hazard Analysis and estimating risk. In the process of trying to do those things, even as late as applying controls and getting to risk acceptance. We may discover that we need to do additional work. We may try and apply controls and discover the controls that we thought were going to be effective are not effective.

Our evaluation of the level of risk and its acceptability is wrong because it was based on the premise that controls would be effective, and we’ve discovered that they’re not, so we must go back and redo some work. Maybe as we go through, we even discover Hazards that we hadn’t anticipated before. This can and does happen, it’s not necessarily a straight-through process. We can iterate through this process. Perhaps several times, while we are moving forward.

Safety Management

OK, Safety Management. We’ve gone to a higher level really than risk because we’re thinking about requirements as well as risk. We’re going to apply organization, we’re going to applying management principles to achieve safety with high confidence. For the first time we’ve introduced this idea of confidence in what we’re doing. Well, I say the first time, this is insurance isn’t it? Assurance, having justified confidence or appropriate confidence, because we’ve got the evidence. And that might be product evidence too we might have tested the product to show that it’s safe.

We might have analysed it. We might have said well we’ve shown that we follow the process that gives us confidence that our evidence is good. And we’ve done all the right things and identified all the risks.  That’s safety management. We need to put that in a safety management system, we’ve got a defined organization structure, we have defined processes, procedures and methods. That gives us direction and control of all the activities that we need to put together in a combination. To effectively meet safety requirements and safety policy.

And our safety tests, whatever they might be. More and more now we’re thinking about top-level organization and planning to achieve the outcomes we need. With a complex system, with a complex operating environment and a complex application.

Safety Planning

Now I’ll just mention planning. Okay, we need a safety management plan that defines the strategy: how we’re going to get there, how are we going to address safety. We need to document that safety management system for a specific project. Planning is very important for effective safety. Safety is very vulnerable to poor planning. If a project is badly planned or not planned at all, it becomes very difficult to Do safety effectively, because we are dependent on the process, on following a rigorous process to give us confidence that all results are correct.  If you’ve got a project that is a bit haphazard, that’s not going to help you achieve the objectives.

Planning is important. Now the bit of that safety plan that deals with timescales, milestones and other date-related information. We might refer to as a safety program. Now being a UK Definition, British English has two spellings of program. The double-m-e version of programme. Applies to that time-based progression, or milestone-based progression.

Whereas in the US and in Australia, for example, we don’t have those two words we just have the one word, ‘program’. Which Covers everything: computer programs, a programme of work that might have nothing to do with or might not be determined by timescales or milestones. Or one that is. But the point is that certain things may have to happen at certain points in time or before certain milestones. We may need to demonstrate safety before we are allowed to proceed to tests and trials or before we are allowed to put our system into service.

Demonstrating Safety

We’ve got to demonstrate that Safety has been achieved before we expose people to risk.  That’s very simple. Now, finally, we’re almost at the end. Now we need to provide a demonstration – maybe to a regulator, maybe to customers – that we have achieved safety.  This standard uses the concept of a safety case. The safety case is basically, imagine a portfolio full of evidence.  We’ve got a structured argument to put it all together. We’ve got a body of the evidence that supports the argument.

It provides a Compelling, Comprehensible (or understandable) and valid case that a system is safe. For a given application or use, in a given Operating environment.  Really, that definition of what a safety case is harks back to that meaning of safety.  We’ve got something that really hits the nail on the head. And we might put all of that together and summarise it in a safety case report. That summarises those arguments and evidence, and documents progress against the Safe program.

Remember I said our planning was important. We started off saying that we need to do this, that the other in order to achieve safety. Hopefully, in the end, in the safety report we’ll be able to state that we’ve done exactly that. We did do all those things. We did follow the process rigorously. We’ve got good results. We’ve got a robust safety argument. With evidence to support it. At the end, it’s all written up in a report.

Documenting Safety

Now that isn’t always going to be called a safety case report; it might be called a safety assessment report or a design justification report. There are lots of names for these things. But they all tend to do the same kind of thing, where they pull together the argument as to why the system is safe. The evidence to support the argument, document progress against a plan or some set of process requirements from a standard or a regulator or just good practice in an industry to say: Yes, we’ve done what we were expected to do.

The result is usually that’s what justifies [the system] getting past that milestone. Where the system is going into service and can be used. People can be exposed to those risks, but safely and under control.

Everyone’s a winner, as they say!

Copyright – Creative Commons Licence

Okay. I’ve used a lot of information from the UK government website. I’ve done that in accordance with the terms of its creative commons license, and you can see more about that [here]. We have we complied with that, as we are required to, and to say to you that the information we’ve supplied is under the terms of this license.

More Resources

And for more resources and for more lessons on system safety. And other safe topics. I invite you to visit the safety artisan.com website or to go and look at the videos on Patreon, at my safety artisan page. And that’s www.Patreon.com/SafetyArtisan. Thanks very much for watching. I hope you found that useful.

We’ve covered a lot of information there, but hopefully in a structured way. We’ve repeated the key concepts and you can see that in that standard. The key concepts are consistently defined, and they reinforce each other. In order to get that systematic, disciplined approach to safety, that’s we need.

Anyway, that’s enough from me. I hope you enjoyed watching and found that useful. I look forward to talking to you again soon. Please send me some feedback about what you thought about this video and also what you would like to see covered in the future.

Thank you for visiting the Safety Artisan. I look forward to talking to you again soon. Goodbye.

Links

You can see the full video at the Safety Artisan Patreon Page!

You can see the Short Video posted here.

Go back to the Home Page, or the System Safety Page.

System Safety Concepts, Part 1

System Safety Concepts – a Short Introduction

System Safety Concepts – Transcript

Hi everyone and welcome to the Safety Artisan, where you will find professional, pragmatic and impartial advice. Whether you want to know how safety is done or how to do it. I hope you’ll find today’s session helpful. It’s the 21st of September 2019 as I record this. Welcome to the show. So, let’s get started. Well, we’re going to talk today about System Safety concepts. What does it all mean?  We need to ask this question because it’s not obvious, as we will see.

If we look at a dictionary definition of the word safe, it’s an adjective. To be protected from or not exposed to danger or risk. Not likely to be harmed or lost. There are synonyms – protect, shield, shelter, guard and keep out of harm’s way. They’re all good words, and I think we all know what we’re talking about. However, as a definition, it’s too imprecise. We can’t objectively say whether we have achieved safety or not.

A Practical Definition of ‘Safe’

What we need is a better definition, a more practical definition. I’ve taken something from an old UK Defence standard. Forget about which standard, that’s not important. It’s just that we’re using a consistent set of definitions to work through basic safety concepts. And it’s important to do that because different standards, come from different legal systems and they have different philosophies. So, if you start mixing standards and different concepts together, that doesn’t always work.

OK so whatever you do, be consistent. That’s the key point. We’re going to use this set of definitions from the U.K. defence standard because they are consistent.

In this standard, ‘safe’ means: “Risk has been demonstrated to have been reduced to a level that is ALARP, and broadly acceptable or tolerable. And relevant prescriptive safety requirements have been met. For a system, in a given application, in a given Operating Environment.” OK, so let’s unpack that.

System Safety – Risk

So, we start with risk. We need to manage risk. We need to show that risk has been reduced to an acceptable level. As required perhaps by law, or regulation or a standard. Or just good practice in a particular industry. Whatever it is, we need to show that the risk of harm to people has been reduced. Not just any old reduction, we need to show that it’s been reduced to a particular level. Now in this standard, there are two tests for that.

And they’re both objective tests. The first one says as low as reasonably practicable. Basically, it’s asking have all reasonably practicable risk reduction measures been taken. So that’s one test. And the second test is a bit simpler. It’s basically saying reduce the absolute level of risk to something that is tolerable or acceptable. Now don’t worry too much about precisely what these things mean. The purpose for today is to note that we’ve got an objective test to say that we’ve done enough.

System Safety – Requirements

So that’s dealt with risk. Let’s move on to safety requirements. If a requirement is relevant, then we need to apply it. If it’s prescriptive, if it says you must do this, or you must do that. Then we need to meet it. There are two separate parts to this ‘Safe’ thing: we’ve got to meet requirements; and, we’ve got to manage risk. We can’t use one as an excuse for not doing the other.

So just because we reduce risk until it’s tolerable or acceptable doesn’t mean that we can ignore safety requirements. Or vice versa. So those are the two key things that we’ve got to do. But that’s not actually quite enough to get us there. Because we’ve got to define what we’re doing, with what and in what context. Well, we’re reducing the risk of a system. And the system might be a physical thing.

Defining the Scope: The System

It might be a vehicle, an aeroplane or a ship or a submarine, it might be a car or a truck. Or it might be something a bit more intangible. It might be a computer program that we’re using to make decisions that affect the safety of human beings, maybe a medical diagnosis system. Or we’re processing some scripts or prescriptions for medicine and we’ve got to get it right. We could poison somebody. So, whether it’s a tangible or an intangible system.

We need to define it. And that’s not as easy as it sounds, because if we’re applying system safety, we’re doing it because we have a complex system. It’s not a toaster. It’s something a bit more challenging. Defining the system carefully and precisely is really important and helpful. So, we define what our system is, our thing or our service. The system. What are we doing with it? What are we applying it to?

Defining the Scope: The Application

What are we using it for? Now, just to illustrate that no standard is perfect. Whoever wrote that defence standard didn’t bother to define the application. Which is kind of a major stuff-up to be honest, because that’s really important. So, let’s go back to an ordinary dictionary definition just to get an idea of what it means. By the way, I checked through the standard that I was referring to, and it does not explain in this standard.

What it means by the application. Otherwise, I would use that by preference. But if we go back to the dictionary, we see application: the act of putting something into operation. OK, so, we’re putting something to use. We’re implementing, employing it or deploying it maybe we’re utilizing it, applying it, executing it, enacting it. We’re carrying it out, putting it into operation or putting it into practice. All useful words that help us to understand.

I think we know what we’re talking about. So, we’ve got a thing or a service. Well, what are we using it for? Quite obviously, you know a car is probably going to be quite safe on the road. Put it in water and it probably isn’t safe at all. So, it’s important to use things for their proper application, to the use to which they were designed. And then, kind of harking back to what I just said, the correct operating environment.

Defining the Scope: The Operating Environment

For this system, and the application to which we will put it to. So, we’ve got a thing that we want to use for something. What’s the operating environment in which it will be safe? What’s it qualified or certified for? What’s the performance envelope that it’s been designed for? Typically, things work pretty well within the operating environment, within the envelope for which they were designed. Take them outside of that envelope and they perform not so well.

Maybe not at all. You take an aeroplane too high and the air is too thin, and it becomes uncontrollable. You take it too low and it smashes into the ground. Neither outcome is particularly good for the occupants of the aeroplane. Or whoever happens to be underneath it when it hits the ground. All of those three things:  what is the system? What are we doing with it? and where are we doing it? All those things have to be defined. Otherwise, we can’t really say that risk has been dealt with, or that safety requirements have been met.

System Safety: why Bother?

So, we’ve spent several slides just talking about what safe means, which might seem a bit over the top. But I promise you it is not, because having a solid understanding of what we’re trying to do is important in safety. Because safety is intangible. So, we need to understand what it is we’re aiming for. As some Greek bloke said, thousands of years ago: “If you don’t know to which port, you are bound, then no wind is favourable.”

It’s almost impossible to have a satisfactory Safety Program if you don’t know what you’re trying to achieve. Whereas, if you do have a precise understanding of what you’re trying to achieve, you’ve got a reasonably good chance of success. And that’s what it’s all about.

Copyright Statement

Well, I’ve quoted you some information. From a UK government web site. And I’ve done so. In accordance with the terms of its creative commons license and you can see. More information about the terms of that can be found at this page.

The Full Version is Here…

If you want more, if you want to unpack all the Major Definitions, all the system safety concepts that we’re talking about, then there’s a longer version of this video. Which you can get at my Patreon page.

I hope you enjoy it. Well that’s it for the short video, for now. Please go and have a look at the longer video to get the full picture. OK, everyone, it’s been a pleasure talking to you and I hope you found that useful. I’ll see you again soon. Goodbye.