This is Mil-Std-882E Appendix A. Back to the previous excerpt: Task 403 [Link TBD]
GUIDANCE FOR THE SYSTEM SAFETY EFFORT
A.1 Scope. This Appendix is not a mandatory part of the standard. The information contained herein is intended for guidance only. This Appendix provides guidance on the selection of the optional tasks and use of quantitative probability levels.
A.2. Task Application. The system safety effort described in Section 4 of this Standard can be augmented by identifying specific tasks that may be necessary to ensure that the contractor adequately addresses areas that the Program needs to emphasize. Consideration should be given to the complexity and dollar value of the program and the expected levels of risks involved. Table A-I provides a list of the optional tasks and their applicability to program phases. Once recommendations for task applications have been determined, tasks can be prioritized and a “rough order of magnitude” estimate should be created for the time and effort required to complete each task. This information will be of considerable value in selecting the tasks that can be accomplished within schedule and funding constraints.
TABLE A-I. Task application matrix
A.3. Quantitative Probability Example. For quantitative descriptions, the frequency is the actual or expected number of mishaps (numerator) during a specified exposure (denominator). The denominator can be based on such things as the life of one item; number of missile firings, flight hours, systems fielded, or miles driven; years of service, etc.
Welcome to the Safety Artisan where you will find professional, pragmatic and impartial guidance and educational products on all things safety, be they System Safety, design safety, functional safety. Call it whatever you want. Today we’re going to be talking about System Safety principles. We will be going through some System Safety principles from the American Federal Aviation Authority System Safety Handbook.
This is a transcript of the full, 45-minute video, which you can see on Patreon, here.
So, our topics for today. There’s a fundamental statement to
start with, we’ll talk about planning and Management Authority how we achieve
safety in the precedence that we prefer to use. Safety requirements and
analysis assumptions and criteria emphasis and results, Management Authority
responsibilities, software and how to get an effective System Safety program. There’s
quite a lot here, we’re going to charge on and see what we get.
System Safety is a Basic Requirement
The first thing we need to consider is that System Safety is a basic requirement of the total system. The FAA deal with airplanes, so, I thought I’d show you a picture of an airplane that’s had a bad day. Now the engines and the wings and the tail I think have been removed after the crash but as you can see it’s got to be bashed in the front when it crashed. The point we’re making here is that safety is to do with the total system. An unsafe airplane, an airplane that’s crashed no longer flies. It’s no longer really an airplane, it’s just shattered remains. Safety is a fundamental thing that we need from the whole system. We need the whole aeroplane to work. We could, for example, talk about the safety of the wings or the safety of the engines but that wouldn’t make much sense in isolation would it if the engines aren’t on the airplane or the wings aren’t on the airplane then what’s the point of them. So, we need System Safety. It’s a basic requirement of the whole thing, and the whole thing working.
OK, the next principle is planning. What do we need from planning? Well, we need the safety engineering effort to be comprehensive. In other words, we needed to cover everything it needs to cover, and it needs to be integrated, it all needs to be joined up. if the safety effort isn’t both of those things are then it’s either going to fall short or it’s going to be disconnected in some way and that doesn’t mean effected said we’re going to have this thing.
Now we need ongoing effort over a period to achieve safety for any kind of significant system. that probably means that we’re going to do a whole bunch of different tasks and those tasks that we’ve got to be done in sequence. They’ve got to relate to each other. If you can imagine a planning chart, a Gantt chart, a waterfall chart that kind of thing with tasks linked together. Typical planning stuff. Nothing unusual there. The plan must also, influence facilities equipment procedures and personnel.
When it says influence, I guess it’s better to say making choices, or decisions. Which facilities? which personnel? which procedures? and why are they appropriate? What we’re trying to achieve. That’s what that’s really all about, the fourth bullet point. Here we’ve got applicable to all program phases. We need a plan that gets us started that gets the work done and brings things to a satisfactory conclusion. Whether that be all parts of the program right through to integration getting our airplane or our other system into service then we need it to cover all the other stuff as well.
It’s very easy to think about sexy, design stuff particularly with things like airplanes. But we need to cover all the other things as well. What about transporting our system or spares. What about logistics support. What about spares and repair. What about storage in package handling? How do we ensure that stuff arrives where it’s supposed to in a fit state to be used and that kind of thing. Finally, not every program is all about the development of new things. There are probably going to be some non-developmental items or designs along the way. We’re going to reuse some stuff from elsewhere and we’ve got to make sure that it fits in and contributes to safety, so there are no disconnections or incompatibilities. We need to think about those NDIs as well. Whether we are in control of its development we need to think about that stuff. These seven bullet points talk about the comprehensiveness of the Plan.
Okay, Management Authority. In the FAA handbook, which is getting a bit old in the tooth by now it must be said it’s about 19 years old, we have the concept we’ve got the FAA is the regulator we’ve got the Management Authority whoever is putting together, in this case, an airplane project and then we’ve got the idea that the Management Authority has staff and also, contractors. The Management Authority is contracting out certain things they might be contracting out all the development or just bits of it or whatever it might be.
So, the M.A. has got to manage in this concept the overall system safety effort. They’ve got to pull it all together and the managerial and technical procedures to be used must be approved by the Management Authority. It’s the Management Authority that resolves any conflicts between safety and other design issues and resolves conflicts between different contractors. The Management Authority really has the power here and if need be, they must knock heads together in order to make sure that the whole thing works. That’s a key concept here. We’ll come back to that later as you’ll see.
Precedence of Controls
Moving on now, when we talk about controlling risk, we have several
options for what kind of controls we can use. The FAA principles say we should
start with designing for minimum hazards. So, we should try and make our system,
whatever it may be, as inherently safe, as intrinsically safe as we can by
designing out dangerous features.
Almost certainly we cannot completely design out risk in any significant system. Maybe we need to use specific safety devices. There’s a very simple illustration on the right. What you see with those little white boxes in the center with the wiring coming out the top and bottom. They are circuit breakers and they are what’s called residual current device circuit breakers. If a circuit breaker detects a spike of voltage or current on the line it will trip and isolate whatever it is feeding electricity to. So, if you’ve had a short circuit or you have an accident that would probably cause a voltage spike, the RCD circuit breaker trips and protect people from electrocution or protects equipment from being overvolted. In which case it might fail or catch fire or something.
There is a good example of some safety devices that you could fit into an electrical system. Having designed for minimum hazard and added safety devices we could warn people that there out that of impending problems and we could fit alarms of warning lights and or they might be warning signs that we might have a sign on the side of this box with these circuit breakers in saying watch out there’s electricity.
Finally we can use procedures, we can have written procedures that tell people how to do stuff safely and if the warnings and cautions that say ‘watch out for this’ or don’t do that or in and do this in a particular way and maybe the procedure might say in the case of the illustration you need to isolate the electricity before you open this box. All sorts of options but we want to start with the most effective options which are designing our hazards. In fact, you will still see a version of this precedence of controls in, for example, Australian work health and safety today it’s not called precedence of controls. It’s called a hierarchy of controls, but it says much the same thing.
Let’s talk about safety requirements and there are two points here that the FAA is making very wisely. First, those safety requirements have got to be consistent with other program requirements a safety program in isolation. It’s probably not going to be much use. It’s got to fit in and be consistent with what the overall program is doing to be effective. For example, if the safety program is making assumptions about how stuff is going to be used or maintained or the environment it’s going to work in, but those assumptions are incorrect. They’re not aligned with reality. Then you probably have a problem.
Secondly and this sounds a bit more controversial,
performance cost and other requirements may have priority over safety
I’ll let that sink in.
So, it sounds odd: Other requirements may have priority for
safety but, it’s quite logical when you think about it because there’s no such
thing as perfect safety. Nothing is safe. Breathing in and out has risks for
human beings. We just need to get on with it. It may be that if we give safety
priority over everything we will end up with a system that has low performance,
such that it’s not worth using, or it may cost so much that nobody could afford
to buy or use it or sustain it. We’ve got to balance safety requirements with
others and safety may not always win, it may not always be the pretty dominant
OK So, how do we understand what safety we need and whether
we’ve achieved it or not. The answer is system analysis and system analyses, as
it says, are basic tools for developing design specifications. Now, they do a
lot more than that as we’ll see. But the focus with the FAA approach to System
Safety is very much requirements-centric. The idea is that while you do a lot
of work to get specifications and the requirements right, and then you make
sure that what you design matches the specifications and then you verify and
validate that it’s met the requirements at the end. And that is very much the
American ethos for how you do safety.
Now, not all legal systems take this approach. For example, the UK and the Australian legal system are taking the view that its safety by intent. So, we measure safety or the achievement of safety based on saying that risks have been reduced to an acceptable level (but even that, of course, is a requirement). The two approaches are not incompatible. We must understand what we’re doing and remember these legal requirements, in whatever jurisdiction you’re in, are themselves requirements and need to be fed into the specifications. That’s the key thing. Is that something I often see missing in safety programs in all in all sorts of countries, where whoever is developing the requirements specifications, at whatever level, has forgotten about a bunch of requirements that just have to be met.
Of course, we have to remember that the measure of safety, it’s not the scope of the analysis – the analysis is just a means to an end. It’s a means to satisfy a requirement. That’s what it’s about. Having made sure we’ve considered all the requirements that we need for safety, we need to satisfy them. System analysis helps us to do that by looking at the system as a whole.
Purpose of Analyses
The purpose of these analyses is what do we do with them. I said they weren’t just for requirements. We can use analysis to identify hazards. It says corrective actions, it may be that we’ve identified hazards associated with the design or possible designs that we’re going to correct that design to reduce the hazard.
Or it may be that we’re going to add controls we might use analysis a trade-off to understand and review safety considerations and see how much safety we can get. How much safety is reasonable to have? Back to the requirements, we might use analysis to determine or evaluate safety design requirements, not just safety design requirements. We might also, need to evaluate operational, requirements for testing logistics, etc., Testing might be: how are we going to demonstrate safety? Again, the FAA is an American organization and the American approach to verification and validation tends to emphasize testing, sometimes to the exclusion of all else. Now, this isn’t necessarily the best way to do things but that’s the mentality. Just to be aware that’s one of the underlying philosophies or these principles because it’s from the American FAA.
Finally, we might use analyses to validate requirements that they’ve been met So, we might not be able to do testing. It might be too expensive or too dangerous to test something to destruction. Maybe what we need is a whole bunch of tests, different test points, and analysis is the way to do that particularly in the world of aircraft development. These days the way things tend to be done is that you have a model of your system and you use the model, in general, to validate that your system is correct and then you use certain test points to validate the model because it’s just too expensive, too time-consuming to physically test everything.
And then a final point that sounds rather odd: analysis our hazard analysis is not safety analysis. And I think what the FAA means by this is that we need to focus on real-world hazards. I’ve seen people get hung up trying to analyze a program or trying to start their analysis by analyzing safety controls and thinking about well what happened if my control goes wrong.
Well, we need to start at the other end. We need to start with the real-world hazard. That’s what’s really going to hurt people. we can work out how effective controls need to be from analyzing the hazard, not the other way around. That’s quite a common mistake I see in say programs, which is not focusing on physical hazards because then you can end up going around in circles in a rather theoretical or philosophical approach as opposed to getting the job done. That rather harks back to the previous point. The whole point of the exercise is to satisfy requirements by having a safe system not to do the analysis. There are some purposes of an analysis.
Assumptions and Criteria
As always in science and engineering. We’re going to need to make some assumptions because we can’t possibly prove absolutely everything. Now assumptions are good because they enable us to proceed. They enable us to work pragmatically but we’ve got to make sure that they are sensible. We’ve got a verify, validate them as far as we can and if we discover that an assumption turns out to be incorrect then we’re going to do something about it. Change in a program is inevitable. sometimes as we go through a large development program, we discover that the assumptions that we started with are not correct and we need to review and make changes.
That’s important. Again, people are sometimes nervous about doing that. They just want to well, dare I say, some people just want to stick their head in the sand and ignore these things but that’s not good safety management either. We’re going to have to set some risk criteria. Think we’re going to have to decide how much risk we can accept what our risk appetite is. Because as I’ve said before you can’t have zero risks, and to pretend that you can is foolish and ultimately self-defeating because then you end up with that’s an unrealistic assumption and you end up with a safety program that’s built on fantasy rather than reality.
That’s no good. Making assumptions and setting criteria are an inherent part of risk management. We need to understand that a risk is something that hasn’t yet happened. If it’s already happened, it’s an issue. So, a risk is something that could happen in the future. We’re talking about making estimates. We must set assumptions and we must set criteria. OK, I think I’ve said enough about that.
Moving on to safety management. So, we’ve got the Management Authority. But of course, safety management needs to be done at every level where we can influence the design. So, it’s not just the Management Authority’s responsibility to manage safety. Everybody who is managing safety must define safety functions, the authority that various people must make decisions and interrelationships between bodies and individuals and then safety management must be about exercising appropriate control. Whether it is control of the safety process is what we’re talking about here rather than management of hazard (controls). We need to when we’re exercising safety management. We need to do all those things
Effort and Emphasis
Not all risks are equal, not all safety controls hazard controls are equal. So, the degree of safety effort and the achievements that are required are dependent upon management emphasis. Now it says here by the FAA and tractors So, the FAA acts as a regulator. The emphasis that drives safety and where the emphasis on where we apply safety and the precedence and how much effort we put in, that’s going to be partly directed by the regulator. If you’re working in a regulated industry or it may be directed by the law and then the Management Authority or their contractors after then take and interpret those directives and apply them practically and then, of course, we’re going back to safety management. We define functions, authority, relationships and we exercise control in order to achieve the safety emphasis that is required to achieve the results that is required. That’s going to direct the effort.
We were probably going to spend a lot more effort managing higher risks than lower ones. We know our risks. Now that sounds so obvious doesn’t it? But the reality is it’s very easy for programs to lose sight of what the big risks are and major on the miners if you will. It’s too easy to get carried away with little things and you end up spending all your time on a program dealing with trivia while ignoring the fact that the horse has already bolted (escaped)!
Clarity of Objectives
I guess that comes back to the clarity of objectives, doesn’t it? There’s an old saying, one of my favorites (I apologize) “if you don’t know to which port you are sailing then no wind it’s favorable”. You’ve got to know what your safety objectives are what your safety targets are (if you’re going to set quantitative targets, but you don’t have to). Whatever your safety objectives and requirements are the Management Authority needs to clearly state and communicate them to everybody who is required to take action to manage safety. So, again, this sounds obvious, but people get it wrong so often, or they just don’t do it. Then at the back end of a program, they’re surprised that they haven’t got what they need.
This can become a big problem if you’re at the back end of a program and the Management Authority is trying to demonstrate to the regulator, or whoever it might be, customers perhaps, that they met safety requirements and met safety objectives. They may find either they got kit that can’t meet the requirements because they didn’t specify the requirement up front, or, more often, they can’t demonstrate that the kit meets the requirements, which is quite galling because you’ve got kit, which you suspect it’s perfectly okay but you can’t prove it. So, then you end up having to spend more money and waste time at the back end of the program trying to fix those things. A lot of programs end up being late and over budget for things like that. The earlier and the clearer you set your objectives the better. That supports things like making trade-offs and making decisions.
It’s all about decision making.
Management Authority Responsibilities
And that brings us neatly on to Management Authority responsibilities. The assumption is that we have an SSP, a System Safety Program. So, we have a planned program that’s going to achieve safety. The MA must plan it, organize it and make it happen. The MA has got to establish what the safety requirements are for a system, for the design, and they’ve got to state those safety requirements in a contract. (The assumption is that we’re going to contract with somebody for the whole system may be, or parts of the system.) We need a statement of work, to say OK what activities do we need to meet these requirements?
Now I guess what varies here is the amount of detail in the statement of Work. The Management Authority might take a hands-off approach and go okay, I’m going to specify some things in a statement of work like we want reviews at particular points in the program, or we want safety reporting, or whatever it might be. Or they might take a really prescriptive approach and say we’re going to specify in a lot of detail what we want in the SoW. To do that effectively the management and authority you really got to understand the minimum the thing that they need, and how that minimum might be reasonably achieved, because the danger is if you over specified that state with without work and you’ve got something wrong then you might end up stopping the contractors doing something sensible. Or the contractors might just blindly follow what you’ve told them to do rather than thinking about safety, which is what you really want!
Moving on. The MA must also review things and ensure (I
think we would say in English) ENSURE an adequate and complete System Safety Program
Plan. We’ve got a System Safety Program. We need a plan for it, and whether it
be the MA that produces an overall plan or whether they produce a plan for
themselves and then specify that the other stakeholders do their own, whichever
it might be.
So, this System Safety Program, System Safety Program Plan, the Statement of Work and the requirements: those four things really are linked together and need to be thought of together. You need to take a holistic approach because if you’ve got the requirements are out of step with the program, if the plan doesn’t adequately describe the program that you need, if the statement of work is at odds with the plan or the intended program. All these things are going to cause major problems. Those four things, the System Safety Program, safety requirements, Statement Of Work and the System Safety Program Plan really need to be worked consistently and coherently any to fit together.
Let’s move on from the first five bullet points. A rather odd one, it seems, to supply historical data. Now that looks really odd doesn’t it? out of place with the others. It’s quite logical. The Management Authority, the people who say I want a system and I’m going to set everything up to make sure I get the system that I need. They’re not doing this in isolation. This might be a new system, but it’s probably replacing an old system and a Management Authority should have some expectations, from prior use of other systems or related systems. They should have some expectation of what is reasonable to expect from this kind of system. In other words, setting the safety requirements.
What kind of accidents and incidents we’ve seen in the past? and therefore what kind of hazards and risks we’re going to need to control? So, that historical data is very important and it might literally be lots and lots of low-level data or it might be something a bit higher level where we’ve learned some lessons from the past and those lessons have helped to form our safety requirements for this future system. Historical data is very important.
And again, it’s very easy to get wrong. With historical data usually what we find in the real world is we have underreporting. We have confused reporting and we’ve got a lot of data. We’re not always sure what it means whether there are any overlaps that kind of thing. Gathering historical data and analyzing it can be quite difficult, but it can also, be tremendously useful. It’s worth doing.
So, next Bullet point they may need to review contractor System Safety effort. What we’re doing the data that they’re producing the MA needs to ensure specifications are updated with analysis and test results. Again, we talked about change being inevitable. Somebody has got to make that change happen and make sure the effects of change ripple through the system consistently and that somebody is the MA. Somebody has got to have the authority to manage these things. One body. Management by committee doesn’t always work very well. Somebody some organization or some individual who clearly has authority to lead.
Finally, we need to establish and operate System Safety groups. These groups or committees, whatever you want to call them, we need to bring different stakeholders together different expertise and different competent people with different competencies together in order to support the Management Authority. The final decision rests with the Management Authority but the MA needs to pull together enough expertise to enable them to make sensible decisions. There’s a balance between this unity of leadership unity of purpose and diversity of representation that brings everything we need into the decision-making process.
Okay: software! Now, this is a slight aside, when the FAA came up with these principles, software was maybe a little bit rarer back then. Now, these days software is everywhere. But back in 2000, particularly on high integrity systems, like airplane software, it was rarer. It was there and had been for some time, but it wasn’t always doing safety-related stuff. So, it’s still seen as a bit of a special case and to be honest, even these days lots of people are frightened of software because it’s intangible, and I suspect I’m going to end up doing quite a few sessions talking about software safety and explaining it.
We note that the FAA is still taking their very much requirements-focused approach So, analysing software for hazards is seen in this approach as all about taking requirements from the top left hand side of the V model, which we see illustrated here, and flowing those requirements down to lower and lower levels until we get to implementation: the development of the software. Then as we build those we conduct unit testing, integration testing, system testing, and user testing or operational testing whatever you want to call it.
We progressively build-up testing to show that we have verified that their requirements, at every level in the V model, have been met. This is a philosophy for looking at software and it is correct, but it’s not the only way of looking at software. This is a very American approach. It emphasizes requirements. It emphasizes testing. We will see when we get to a specialist subjects on software, software is not always very amenable to being tested and just because you’ve got a requirement, just because software meets all its requirements – that’s great – maybe we can demonstrate that, but can we demonstrate that it doesn’t do anything it’s not supposed to do? Often in safety that’s half the battle or even more than half. So, I’m not necessarily a fan of this statement here and to be honest it is a bit out of date.
System Safety Program
So, we move on and this is our final slide. We’ve talked about the System Safety Program before and we’ve got some good principles here. What do we need or an effective System Safety Program? And that word effective is key because anybody can make up a program that may or may not be effective. What do we need to make it work? Well, we need a plan a planned approach to getting tasks done, getting them accomplished. Again, I have seen lots of people start tasks and not finish them, or not finish them successfully. We need qualified people. Once again, I’ve seen lots of programs with people who don’t really know what they’re doing and they’re very busy. They’re running around like headless chickens. Maybe they’ve got a lot of people, but if they don’t know what they’re doing then sure they may if directed sensibly, they may still get a result but it’s probably not going to be very elegant. So, we need people who are competent at what they are doing.
We need somebody or something that wields the authority to get stuff done, implement tasks, and that authority has got to flow through all levels of management (because we might have multiple levels). We’ve got the Management Authority in this model, who is reporting to the FAA and trying to demonstrate to that regulator that they’ve done what they were supposed to. Maybe you’ve got internal levels of management, but in the end the Management Authority has got to manage contractors, perhaps at multiple levels. On complex systems, you may have many levels of contracts contributing these parts and components and sub-assemblies et cetera et cetera into an overall complex system.
Finally, we’ve got to have appropriate staffing and funding.
We’ve got to have enough people with the right skills to get the job done and
that all costs money. Very often safety-qualified people are hard to find and
therefore they tend to be expensive. That’s when people like myself get brought
in and safety consultants, because a Management Authority or the contractors
they were working for them discover that they don’t have enough staff with the
right experience and competence in order to get the job done. People like me
get brought in and we can be quite expensive!
Nothing wrong with doing that of course. But usually, to get effective results, I find that the Management Authority needs to have enough competent people at least to understand, to be able to realize we’re not making progress here, we need to bring in more highly qualified people. You need enough knowledge about safety in order just to realize that you’re not cutting it and you need to bring in some higher-powered help.
That’s one of the reasons for The Safety Artisan to exist, really, is to help people have enough background to realize what they’re supposed to be doing versus maybe what’s going on. Once you have that knowledge then hopefully you can build up enough knowledge to assess the situation and to decide whether what you’re doing is adequate or whether you need further help. That minimum level of knowledge is what you need to succeed. Once you’ve got that then maybe you buy more expertise and employ people in-house or maybe you bring people in temporarily, but that understanding requires a certain base-level knowledge about safety. And that’s what the Safety Artisan is all about, ladies and gentlemen. That’s a nice point on which to end.
Just to say that all the “quotations in italics” are
from the U.S. Federal Aviation Authority System Safety handbook. As you can see,
they’re published in the year 2000. It is getting a bit long in the tooth in
some ways, but the basic principles are good ones. To be honest, I can’t find
them as clearly articulated anywhere else, even today, certainly not in a form
publicly available for you and me to share. So, thanks and appreciation for the
FAA for doing that. I do hope one day soon they’re going to update that system
safety handbook because it is a very useful beast. There are still people out
there using it and maybe not understanding where it falls short these days.
Now, U.S. government standards tend to be copyright free. The
text itself is copyright free, but this video presentation and the value add
that I’m providing is copyright of the Safety Artisan, 2019, to understand how current
I’m recording this on the 26th of October 2019. Maybe you found this video on the Safety Artisan Page at www.Patreon.com, or maybe you found it elsewhere, but you will find all my System Safety videos on Patreon.com/SafetyArtisan.
That’s the end of the presentation on System Safety Principles. Thanks for your attention. it just remains for me to say thanks for tuning in as always. I will see you soon. Cheers now.
Hi everyone, and welcome to the safety artisan where you will find professional pragmatic, and impartial advice on all thing’s safety. I’m Simon and welcome to the show today, which is recorded on the 23rd of September 2019. Today we’re going to talk about System safety concepts. A couple of days ago I recorded a short presentation on this, which is on the Patreon website and is also on YouTube. Today we are going to talk about the same concepts but in much more depth.
Hence, this video is only available on the ‘Safety Artisan’ Patreon page. In the short session, we took some time picking apart the definition of ‘safe’. I’m not going to duplicate that here, so please feel free to go have a look. We said that to demonstrate that something was safe, we had to show that risk had been reduced to a level that is acceptable in whatever jurisdiction we’re working in.
And in this definition, there are a couple of tests that are appropriate that the U.K., but perhaps not elsewhere. We also must meet safety requirements. And we must define Scope and bound the system that we’re talking about a Physical system or an intangible system like a. A computer program or something. We must define what we’re doing with it what it’s being used for. And within which operating environment within which context is being used. And if we could do all those things, then we can objectively say or claim that this system is safe. OK. that’s very briefly that.
What we’re going to talk about a lot more Topics. We’re going to talk about risk accidents. The cause has a consequence sequence. They talk about requirements and. Spoiler alert. What I consider to be the essence of system safety. And then we’ll get into talking about the process. Of demonstrating safety, hazard identification, and analysis.
Risk Reduction and estimation. Risk Evaluation. And acceptance. And then pulling it all together. Risk management safety management. And finally, reporting, making an argument that the system is safe supporting with evidence. And summarizing all of that in a written report. This is what we do, albeit in different ways and calling it different things.
Onto the first topic. Risk and harm. Our concept of risk. It’s a combination of the likelihood and severity of harm. Generally, we’re talking about harm. To people. Death. Injury. Damage to help. Now we might also choose to consider any damage to property in the environment. That’s all good. But I’m going to concentrate on. Harm. To people. Because. Usually. That’s what we’re required to do. By the law. And there are other laws covering the environment and property sometimes. That. We’re not going to talk. just to illustrate this point. This risk is a combination of Severity and likelihood.
We’ve got a very crude. Risk table here. With a likelihood along the top. And severity. Downside. And we might. See that by looking at the table if we have a high likelihood and high severity. Well, that’s a high risk. Whereas if we have Low Likelihood and low severity. We might say that’s a low risk. And then. In between, a combination of high and low we might say that’s medium. Now, this is a very crude and simple example. Deliberately.
You will see risk matrices like this. In. Loads of different standards. And you may be required to define your own for a specific system, there are lots of variations on this but they’re all basically. Doing this thing and we’re illustrating. How we determine the level of risk. By that combination of severity. And likely, I think a picture is worth a thousand words. Moving online to the accident. We’re talking about (in this standard) an unintended event that causes harm.
Accidents, Sequences and Consequences
Not all jurisdictions just consider accidental event some consider deliberate as well. We’ll leave that out. A good example of that is work health and safety in Australia but no doubt we’ll get to that in another video sometime. And the accident sequences the progression of events. That results in an accident that leads to an. Now we’re going to illustrate the accident sequence in a moment but before we get there. We need to think about cousins. here we’ve got a hazard physical situation of state system. Often following some initiating event that may lead to an accident, a thing that may cause harm.
And then allied with that we have the idea of consequences. Of outcomes or an outcome. Resulting from. An. Event. Now that all sounds a bit woolly doesn’t it, let’s illustrate that. Hopefully, this will make it a lot clearer. Now. I’ve got a sequence here. We have. Causes. That might lead to a hazard. And the hazard might lead to different consequences. And that’s the accident. See. Now in this standard, they didn’t explicitly define causes.
Cause, Hazard and Consequence
They’re just called events. But most mostly we will deal with causes and consequences in system safety. And it’s probably just easier to implement it. Whether or not you choose to explicitly address every cause. That’s often option step. But this is the accident Sequence that we’re looking at. And they this sort of funnels are meant to illustrate the fact that they may be many causes for one hazard. And one has it may lead to many consequences on some of those consequences. Maybe. No harm at all.
We may not actually have an accident. We may get away with it. We may have a. Hazard. And. Know no harm may befall a human. And if we take all of this together that’s the accident sequence. Now it’s worth. Reiterating. That just because a hazard exists it does not necessarily need. Lead to harm. But. To get to harm. We must have a hazard; a hazard is both necessary and sufficient. To lead to harmful consequences. OK.
Hazards: an Example
And you can think of a hazard as an accident waiting to
happen. You can think of it in lots of different ways, let’s think about an
example, the hazard might be. Somebody slips. Okay well while walking and all.
That slip might be caused by many things it might be a wet surface. Let’s say
it’s been raining, and the pavement is slippery, or it might be icy. It might
be a spillage of oil on a surface, or you’d imagine something slippery like ball
bearings on a surface.
So, there’s something that’s caused the surface to become slippery. A person slips – that’s the hazard. Now the person may catch themselves; they may not fall over. They may suffer no injury at all. Or they might fall and suffer a slight injury; and, very occasionally, they might suffer a severe injury. It depends on many different factors. You can imagine if you slipped while going downstairs, you’re much more likely to be injured.
And younger, healthy, fit people are more likely to get over a fall without being injured, whereas if they’re very elderly and frail, a fall can quite often result in a broken bone. If an elderly person breaks a bone in a fall the chances of them dying within the next 12 months are quite high. They’re about one in three.
So, the level of risk is sensitive to a lot of different factors. To get an accurate picture, an accurate estimate of risk, we’re going to need to factor in all those things. But before we get to that, we’ve already said that hazard need not lead to harm. In this standard, we call it an incident, where a hazard has occurred; it could have progressed to an accident but didn’t, we call this an incident. A near miss.
We got away with it. We were lucky. Whatever you want to call it. We’ve had an incident but no he’s been hurt. Hopefully, that incident is being reported, which will help us to prevent an actual accident in future. That’s another very useful concept that reminds us that not all hazards result in harm. Sometimes there will be no accident. There will be no harm simply because we were lucky, or because someone present took some action to prevent harm to themselves or others.
Mitigation Strategies (Controls)
But we would really like to deliberately design out or avoid Hazards if we can. What we need is a mitigation strategy, we need a measure or measures that, when we put them into practice, reduce that risk. Normally, we call these things controls. Again, now we’ve illustrated this; we’ve added to the funnels. We’ve added some mitigation strategies and they are the dark blue dashed lines.
And they are meant to represent Barriers that prevent the accident sequence progressing towards harm. And they have dashed lines because very few controls are perfect, you know everything’s got holes in it. And we might have several of them. But usually, no control will cover all possible causes; and very few controls will deal with all possible consequences. That’s what those barriers are meant to illustrate.
That idea that picture will be very useful to us later. When we are thinking about how we’re going to estimate and evaluate risk overall and what risk reduction we have achieved. And how we talk about justifying what we’ve done is good. That’s a very powerful illustration. Well, let’s move on to safety requirements.
Now. I guess it’s no great surprise to say that requirements, once met, can contribute directly to the safety of the system. Maybe we’ve got a safety requirement that says all cars will be fitted with seatbelts. Let’s say we’ll be required to wear a seatbelt. That makes the system safer.
Or the requirement might be saying we need to provide evidence of the safety of the system. And, the requirement might refer to a process that we’ve got to go through or a set kind of evidence that we’ve got to provide. Safety requirements can cover either or both of these.
The Essence of System Safety
Requirements. Covering. Safety of the system or demonstrating that the system is safe. Should give us assurance, which is adequate confidence or justified confidence. Supported with evidence by following a process. And we’ll talk more about process. We meet safety requirements. We get assurance that we’ve done the right thing. And this really brings us to the essence of what system safety is, we’ve got all these requirements – everything is a requirement really – including the requirement. To demonstrate risk reduction.
And those requirements may apply to the system itself, the product. Or they may provide, or they may apply to the process that generates the evidence or the evidence. Putting all those things together in an organized and orderly way really is the essence of system safety, this is where we are addressing safety in a systematic way, in an orderly way. In an organized way. (Those words will keep coming back). That’s the essence of system safety, as opposed to the day-to-day task of keeping a workplace safe.
Maybe by mopping up spills and providing handrails, so people don’t slip over. Things like that. We’re talking about a more sophisticated level of safety. Because we have a more complex problem a more challenging problem to deal with. That’s system safety. We will start on the process now, and we begin with hazard identification and analysis; first, we need to identify and list the hazards, the Hazards and the accidents associated with the system.
We’ve got a system, physical or not. What could go wrong? We need to think about all the possibilities. And then having identified some hazards we need to start doing some analysis, we follow a process. That helps us to delve into the detail of those hazards and accidents. And to define and understand the accident sequences that could result. In fact, in doing the analysis we will very often identify some more hazards that we hadn’t thought of before, it’s not a straight-through process it tends to be an iterative process.
And what ultimately what we’re trying to do is reduce risk, we want a systematic process, which is what we’re describing now. A systematic process of reducing risk. And at some point, we must estimate the risk that we’re left with. Before and after all these controls, these mitigations, are applied. That’s risk estimation. Again, there’s that systematic word, we’re going to use all the available information to estimate the level of risk that we’ve got left. Recalling that risk is a combination of severity and likelihood.
Now as we get towards the end of the process, we need to evaluate risk against set criteria. And those criteria vary depending on which country you’re operating in or which industry we’re in: what regulations apply and what good practice is relevant. All those things can be a factor. Now, in this case, this is a U.K. standard, so we’ve got two tests for evaluating risk. It’s a systematic determination using all the available evidence. And it should be an objective evaluation as far as we can make it.
We should use certain criteria on whether a risk can be accepted or not. And in the U.K. there are two tests for this. As we’ve said before, there is ALARP, the ‘As Low As is Reasonably Practicable’ test, which says: Have we put into practice all reasonably practicable controls? (To reduce risk, this is risk reduction target). And then there’s an absolute level of risk to consider as well. Because even if we’ve taken all practical measures, the risk remaining might still be so high as to be unacceptable to the law.
Now that test is specific to the U.K, so we don’t have to worry too much about it. The point is there are objective criteria, which we must test ourselves or measure ourselves against. An evaluation that will pop out the decision, as to whether a further risk reduction is necessary if the risk level is still too high. We might conclude that are still reasonably practicable measures that we could take. Then we’ve got to do it.
We have an objective decision-making process to say: have we done enough to reduce risk? And if not, we need to do some more until we get to the point where we can apply the test again and say yes, we’ve done enough. Right, that’s rather a long-winded way of explaining that. I apologize, but it is a key issue and it does trip up a lot of people.
Now, once we’ve concluded that we’ve done enough to reduce risk and no further risk reduction is necessary, somebody should be in a position to accept that risk. Again, it’s a systematic process, by which relevant stakeholders agree that risks may be accepted. In other words, somebody with the right authority has said yes, we’re going to go ahead with the system and put it into practice, implement it. The resulting risks to people are acceptable, providing we apply the controls.
And we accept that responsibility. Those people who are signing off on those risks are exposing themselves and/or other people to risk. Usually, they are employees, but sometimes members of the public as well, or customers. If you’re going to put customers in an airliner you’re saying yes there is a level of risk to passengers, but that the regulator, or whoever, has deemed [the risk] to be acceptable. It’s a formal process to get those risks accepted and say yes, we can proceed. But again, that varies greatly between different countries, between different industries. Depending on what regulations and laws and practices apply. (We’ll talk about different applications in another section.)
Now putting all this together we call this risk management. Again, that wonderful systematic word: a systematic application of policies, procedures and practices to these tasks. We have hazard identification, analysis, risk estimation, risk evaluation, risk reduction & risk acceptance. It’s helpful to demonstrate that we’ve got a process here, where we go through these things in order. Now, this is a simplified picture because it kind of implies that you just go through the process once.
With a complex system, you go through the process at least once. We may identify further hazards, when we get into Hazard Analysis and estimating risk. In the process of trying to do those things, even as late as applying controls and getting to risk acceptance. We may discover that we need to do additional work. We may try and apply controls and discover the controls that we thought were going to be effective are not effective.
Our evaluation of the level of risk and its acceptability is wrong because it was based on the premise that controls would be effective, and we’ve discovered that they’re not, so we must go back and redo some work. Maybe as we go through, we even discover Hazards that we hadn’t anticipated before. This can and does happen, it’s not necessarily a straight-through process. We can iterate through this process. Perhaps several times, while we are moving forward.
OK, Safety Management. We’ve gone to a higher level really than risk because we’re thinking about requirements as well as risk. We’re going to apply organization, we’re going to applying management principles to achieve safety with high confidence. For the first time we’ve introduced this idea of confidence in what we’re doing. Well, I say the first time, this is insurance isn’t it? Assurance, having justified confidence or appropriate confidence, because we’ve got the evidence. And that might be product evidence too we might have tested the product to show that it’s safe.
We might have analysed it. We might have said well we’ve shown that we follow the process that gives us confidence that our evidence is good. And we’ve done all the right things and identified all the risks. That’s safety management. We need to put that in a safety management system, we’ve got a defined organization structure, we have defined processes, procedures and methods. That gives us direction and control of all the activities that we need to put together in a combination. To effectively meet safety requirements and safety policy.
And our safety tests, whatever they might be. More and more now we’re thinking about top-level organization and planning to achieve the outcomes we need. With a complex system, with a complex operating environment and a complex application.
Now I’ll just mention planning. Okay, we need a safety management plan that defines the strategy: how we’re going to get there, how are we going to address safety. We need to document that safety management system for a specific project. Planning is very important for effective safety. Safety is very vulnerable to poor planning. If a project is badly planned or not planned at all, it becomes very difficult to Do safety effectively, because we are dependent on the process, on following a rigorous process to give us confidence that all results are correct. If you’ve got a project that is a bit haphazard, that’s not going to help you achieve the objectives.
Planning is important. Now the bit of that safety plan that deals with timescales, milestones and other date-related information. We might refer to as a safety program. Now being a UK Definition, British English has two spellings of program. The double-m-e version of programme. Applies to that time-based progression, or milestone-based progression.
Whereas in the US and in Australia, for example, we don’t have those two words we just have the one word, ‘program’. Which Covers everything: computer programs, a programme of work that might have nothing to do with or might not be determined by timescales or milestones. Or one that is. But the point is that certain things may have to happen at certain points in time or before certain milestones. We may need to demonstrate safety before we are allowed to proceed to tests and trials or before we are allowed to put our system into service.
We’ve got to demonstrate that Safety has been achieved before we expose people to risk. That’s very simple. Now, finally, we’re almost at the end. Now we need to provide a demonstration – maybe to a regulator, maybe to customers – that we have achieved safety. This standard uses the concept of a safety case. The safety case is basically, imagine a portfolio full of evidence. We’ve got a structured argument to put it all together. We’ve got a body of the evidence that supports the argument.
It provides a Compelling, Comprehensible (or understandable) and valid case that a system is safe. For a given application or use, in a given Operating environment. Really, that definition of what a safety case is harks back to that meaning of safety. We’ve got something that really hits the nail on the head. And we might put all of that together and summarise it in a safety case report. That summarises those arguments and evidence, and documents progress against the Safe program.
Remember I said our planning was important. We started off saying that we need to do this, that the other in order to achieve safety. Hopefully, in the end, in the safety report we’ll be able to state that we’ve done exactly that. We did do all those things. We did follow the process rigorously. We’ve got good results. We’ve got a robust safety argument. With evidence to support it. At the end, it’s all written up in a report.
Now that isn’t always going to be called a safety case report; it might be called a safety assessment report or a design justification report. There are lots of names for these things. But they all tend to do the same kind of thing, where they pull together the argument as to why the system is safe. The evidence to support the argument, document progress against a plan or some set of process requirements from a standard or a regulator or just good practice in an industry to say: Yes, we’ve done what we were expected to do.
The result is usually that’s what justifies [the system] getting past that milestone. Where the system is going into service and can be used. People can be exposed to those risks, but safely and under control.
Everyone’s a winner, as they say!
Copyright – Creative Commons Licence
Okay. I’ve used a lot of information from the UK government website. I’ve done that in accordance with the terms of its creative commons license, and you can see more about that [here]. We have we complied with that, as we are required to, and to say to you that the information we’ve supplied is under the terms of this license.
And for more resources and for more lessons on system safety. And other safe topics. I invite you to visit the safety artisan.com website or to go and look at the videos on Patreon, at my safety artisan page. And that’s www.Patreon.com/SafetyArtisan. Thanks very much for watching. I hope you found that useful.
We’ve covered a lot of information there, but hopefully in a structured way. We’ve repeated the key concepts and you can see that in that standard. The key concepts are consistently defined, and they reinforce each other. In order to get that systematic, disciplined approach to safety, that’s we need.
Anyway, that’s enough from me. I hope you enjoyed watching and found that useful. I look forward to talking to you again soon. Please send me some feedback about what you thought about this video and also what you would like to see covered in the future.
Thank you for visiting the Safety Artisan. I look forward to talking to you again soon. Goodbye.
Hi everyone and welcome to the Safety Artisan, where you will find professional, pragmatic and impartial advice. Whether you want to know how safety is done or how to do it. I hope you’ll find today’s session helpful. It’s the 21st of September 2019 as I record this. Welcome to the show. So, let’s get started. Well, we’re going to talk today about System Safety concepts. What does it all mean? We need to ask this question because it’s not obvious, as we will see.
If we look at a dictionary definition of the word safe, it’s an adjective. To be protected from or not exposed to danger or risk. Not likely to be harmed or lost. There are synonyms – protect, shield, shelter, guard and keep out of harm’s way. They’re all good words, and I think we all know what we’re talking about. However, as a definition, it’s too imprecise. We can’t objectively say whether we have achieved safety or not.
A Practical Definition of ‘Safe’
What we need is a better definition, a more practical
definition. I’ve taken something from an old UK Defence standard. Forget about
which standard, that’s not important. It’s just that we’re using a consistent
set of definitions to work through basic safety concepts. And it’s important to
do that because different standards, come from different legal systems and they
have different philosophies. So, if you start mixing standards and different
concepts together, that doesn’t always work.
OK so whatever you do, be consistent. That’s the key point. We’re going to use this set of definitions from the U.K. defence standard because they are consistent.
In this standard, ‘safe’ means: “Risk has been demonstrated to have been reduced to a level that is ALARP, and broadly acceptable or tolerable. And relevant prescriptive safety requirements have been met. For a system, in a given application, in a given Operating Environment.” OK, so let’s unpack that.
System Safety – Risk
So, we start with risk. We need to manage risk. We need to show that risk has been reduced to an acceptable level. As required perhaps by law, or regulation or a standard. Or just good practice in a particular industry. Whatever it is, we need to show that the risk of harm to people has been reduced. Not just any old reduction, we need to show that it’s been reduced to a particular level. Now in this standard, there are two tests for that.
And they’re both objective tests. The first one says as low as reasonably practicable. Basically, it’s asking have all reasonably practicable risk reduction measures been taken. So that’s one test. And the second test is a bit simpler. It’s basically saying reduce the absolute level of risk to something that is tolerable or acceptable. Now don’t worry too much about precisely what these things mean. The purpose for today is to note that we’ve got an objective test to say that we’ve done enough.
System Safety – Requirements
So that’s dealt with risk. Let’s move on to safety
requirements. If a requirement is relevant, then we need to apply it. If it’s
prescriptive, if it says you must do this, or you must do that. Then we need to
meet it. There are two separate parts to this ‘Safe’ thing: we’ve got to meet
requirements; and, we’ve got to manage risk. We can’t use one as an excuse for
not doing the other.
So just because we reduce risk until it’s tolerable or acceptable doesn’t mean that we can ignore safety requirements. Or vice versa. So those are the two key things that we’ve got to do. But that’s not actually quite enough to get us there. Because we’ve got to define what we’re doing, with what and in what context. Well, we’re reducing the risk of a system. And the system might be a physical thing.
Defining the Scope: The System
It might be a vehicle, an aeroplane or a ship or a submarine,
it might be a car or a truck. Or it might be something a bit more intangible. It
might be a computer program that we’re using to make decisions that affect the
safety of human beings, maybe a medical diagnosis system. Or we’re processing
some scripts or prescriptions for medicine and we’ve got to get it right. We
could poison somebody. So, whether it’s a tangible or an intangible system.
We need to define it. And that’s not as easy as it sounds, because if we’re applying system safety, we’re doing it because we have a complex system. It’s not a toaster. It’s something a bit more challenging. Defining the system carefully and precisely is really important and helpful. So, we define what our system is, our thing or our service. The system. What are we doing with it? What are we applying it to?
Defining the Scope: The Application
What are we using it for? Now, just to illustrate that no standard is perfect. Whoever wrote that defence standard didn’t bother to define the application. Which is kind of a major stuff-up to be honest, because that’s really important. So, let’s go back to an ordinary dictionary definition just to get an idea of what it means. By the way, I checked through the standard that I was referring to, and it does not explain in this standard.
What it means by the application. Otherwise, I would use that by preference. But if we go back to the dictionary, we see application: the act of putting something into operation. OK, so, we’re putting something to use. We’re implementing, employing it or deploying it maybe we’re utilizing it, applying it, executing it, enacting it. We’re carrying it out, putting it into operation or putting it into practice. All useful words that help us to understand.
I think we know what we’re talking about. So, we’ve got a thing or a service. Well, what are we using it for? Quite obviously, you know a car is probably going to be quite safe on the road. Put it in water and it probably isn’t safe at all. So, it’s important to use things for their proper application, to the use to which they were designed. And then, kind of harking back to what I just said, the correct operating environment.
Defining the Scope: The Operating Environment
For this system, and the application to which we will put it
to. So, we’ve got a thing that we want to use for something. What’s the
operating environment in which it will be safe? What’s it qualified or
certified for? What’s the performance envelope that it’s been designed for? Typically,
things work pretty well within the operating environment, within the envelope
for which they were designed. Take them outside of that envelope and they perform
not so well.
Maybe not at all. You take an aeroplane too high and the air is too thin, and it becomes uncontrollable. You take it too low and it smashes into the ground. Neither outcome is particularly good for the occupants of the aeroplane. Or whoever happens to be underneath it when it hits the ground. All of those three things: what is the system? What are we doing with it? and where are we doing it? All those things have to be defined. Otherwise, we can’t really say that risk has been dealt with, or that safety requirements have been met.
System Safety: why Bother?
So, we’ve spent several slides just talking about what safe
means, which might seem a bit over the top. But I promise you it is not,
because having a solid understanding of what we’re trying to do is important in
safety. Because safety is intangible. So, we need to understand what it is
we’re aiming for. As some Greek bloke said, thousands of years ago: “If you
don’t know to which port, you are bound, then no wind is favourable.”
It’s almost impossible to have a satisfactory Safety Program if you don’t know what you’re trying to achieve. Whereas, if you do have a precise understanding of what you’re trying to achieve, you’ve got a reasonably good chance of success. And that’s what it’s all about.
Well, I’ve quoted you some information. From a UK government web site. And I’ve done so. In accordance with the terms of its creative commons license and you can see. More information about the terms of that can be found at this page.
The Full Version is Here…
If you want more, if you want to unpack all the Major Definitions, all the system safety concepts that we’re talking about, then there’s a longer version of this video. Which you can get at my Patreon page.
I hope you enjoy it. Well that’s it for the short video, for now. Please go and have a look at the longer video to get the full picture. OK, everyone, it’s been a pleasure talking to you and I hope you found that useful. I’ll see you again soon. Goodbye.