Welcome to the Safety Artisan where you will find professional, pragmatic and impartial guidance and educational products on all things safety, be they System Safety, design safety, functional safety. Call it whatever you want. Today we’re going to be talking about System Safety principles. We will be going through some System Safety principles from the American Federal Aviation Authority System Safety Handbook.
This is a transcript of the full, 45-minute video, which you can see on Patreon, here.
So, our topics for today. There’s a fundamental statement to start with, we’ll talk about planning and Management Authority how we achieve safety in the precedence that we prefer to use. Safety requirements and analysis assumptions and criteria emphasis and results, Management Authority responsibilities, software and how to get an effective System Safety program. There’s quite a lot here, we’re going to charge on and see what we get.
System Safety is a Basic Requirement
The first thing we need to consider is that System Safety is a basic requirement of the total system. The FAA deal with airplanes, so, I thought I’d show you a picture of an airplane that’s had a bad day. Now the engines and the wings and the tail I think have been removed after the crash but as you can see it’s got to be bashed in the front when it crashed. The point we’re making here is that safety is to do with the total system. An unsafe airplane, an airplane that’s crashed no longer flies. It’s no longer really an airplane, it’s just shattered remains. Safety is a fundamental thing that we need from the whole system. We need the whole aeroplane to work. We could, for example, talk about the safety of the wings or the safety of the engines but that wouldn’t make much sense in isolation would it if the engines aren’t on the airplane or the wings aren’t on the airplane then what’s the point of them. So, we need System Safety. It’s a basic requirement of the whole thing, and the whole thing working.
OK, the next principle is planning. What do we need from planning? Well, we need the safety engineering effort to be comprehensive. In other words, we needed to cover everything it needs to cover, and it needs to be integrated, it all needs to be joined up. if the safety effort isn’t both of those things are then it’s either going to fall short or it’s going to be disconnected in some way and that doesn’t mean effected said we’re going to have this thing.
Now we need ongoing effort over a period to achieve safety for any kind of significant system. that probably means that we’re going to do a whole bunch of different tasks and those tasks that we’ve got to be done in sequence. They’ve got to relate to each other. If you can imagine a planning chart, a Gantt chart, a waterfall chart that kind of thing with tasks linked together. Typical planning stuff. Nothing unusual there. The plan must also, influence facilities equipment procedures and personnel.
When it says influence, I guess it’s better to say making choices, or decisions. Which facilities? which personnel? which procedures? and why are they appropriate? What we’re trying to achieve. That’s what that’s really all about, the fourth bullet point. Here we’ve got applicable to all program phases. We need a plan that gets us started that gets the work done and brings things to a satisfactory conclusion. Whether that be all parts of the program right through to integration getting our airplane or our other system into service then we need it to cover all the other stuff as well.
It’s very easy to think about sexy, design stuff particularly with things like airplanes. But we need to cover all the other things as well. What about transporting our system or spares. What about logistics support. What about spares and repair. What about storage in package handling? How do we ensure that stuff arrives where it’s supposed to in a fit state to be used and that kind of thing. Finally, not every program is all about the development of new things. There are probably going to be some non-developmental items or designs along the way. We’re going to reuse some stuff from elsewhere and we’ve got to make sure that it fits in and contributes to safety, so there are no disconnections or incompatibilities. We need to think about those NDIs as well. Whether we are in control of its development we need to think about that stuff. These seven bullet points talk about the comprehensiveness of the Plan.
Okay, Management Authority. In the FAA handbook, which is getting a bit old in the tooth by now it must be said it’s about 19 years old, we have the concept we’ve got the FAA is the regulator we’ve got the Management Authority whoever is putting together, in this case, an airplane project and then we’ve got the idea that the Management Authority has staff and also, contractors. The Management Authority is contracting out certain things they might be contracting out all the development or just bits of it or whatever it might be.
So, the M.A. has got to manage in this concept the overall system safety effort. They’ve got to pull it all together and the managerial and technical procedures to be used must be approved by the Management Authority. It’s the Management Authority that resolves any conflicts between safety and other design issues and resolves conflicts between different contractors. The Management Authority really has the power here and if need be, they must knock heads together in order to make sure that the whole thing works. That’s a key concept here. We’ll come back to that later as you’ll see.
Precedence of Controls
Moving on now, when we talk about controlling risk, we have several options for what kind of controls we can use. The FAA principles say we should start with designing for minimum hazards. So, we should try and make our system, whatever it may be, as inherently safe, as intrinsically safe as we can by designing out dangerous features.
Almost certainly we cannot completely design out risk in any significant system. Maybe we need to use specific safety devices. There’s a very simple illustration on the right. What you see with those little white boxes in the center with the wiring coming out the top and bottom. They are circuit breakers and they are what’s called residual current device circuit breakers. If a circuit breaker detects a spike of voltage or current on the line it will trip and isolate whatever it is feeding electricity to. So, if you’ve had a short circuit or you have an accident that would probably cause a voltage spike, the RCD circuit breaker trips and protect people from electrocution or protects equipment from being overvolted. In which case it might fail or catch fire or something.
There is a good example of some safety devices that you could fit into an electrical system. Having designed for minimum hazard and added safety devices we could warn people that there out that of impending problems and we could fit alarms of warning lights and or they might be warning signs that we might have a sign on the side of this box with these circuit breakers in saying watch out there’s electricity.
Finally we can use procedures, we can have written procedures that tell people how to do stuff safely and if the warnings and cautions that say ‘watch out for this’ or don’t do that or in and do this in a particular way and maybe the procedure might say in the case of the illustration you need to isolate the electricity before you open this box. All sorts of options but we want to start with the most effective options which are designing our hazards. In fact, you will still see a version of this precedence of controls in, for example, Australian work health and safety today it’s not called precedence of controls. It’s called a hierarchy of controls, but it says much the same thing.
Let’s talk about safety requirements and there are two points here that the FAA is making very wisely. First, those safety requirements have got to be consistent with other program requirements a safety program in isolation. It’s probably not going to be much use. It’s got to fit in and be consistent with what the overall program is doing to be effective. For example, if the safety program is making assumptions about how stuff is going to be used or maintained or the environment it’s going to work in, but those assumptions are incorrect. They’re not aligned with reality. Then you probably have a problem.
Secondly and this sounds a bit more controversial, performance cost and other requirements may have priority over safety requirements.
I’ll let that sink in.
So, it sounds odd: Other requirements may have priority for safety but, it’s quite logical when you think about it because there’s no such thing as perfect safety. Nothing is safe. Breathing in and out has risks for human beings. We just need to get on with it. It may be that if we give safety priority over everything we will end up with a system that has low performance, such that it’s not worth using, or it may cost so much that nobody could afford to buy or use it or sustain it. We’ve got to balance safety requirements with others and safety may not always win, it may not always be the pretty dominant requirement.
OK So, how do we understand what safety we need and whether we’ve achieved it or not. The answer is system analysis and system analyses, as it says, are basic tools for developing design specifications. Now, they do a lot more than that as we’ll see. But the focus with the FAA approach to System Safety is very much requirements-centric. The idea is that while you do a lot of work to get specifications and the requirements right, and then you make sure that what you design matches the specifications and then you verify and validate that it’s met the requirements at the end. And that is very much the American ethos for how you do safety.
Now, not all legal systems take this approach. For example, the UK and the Australian legal system are taking the view that its safety by intent. So, we measure safety or the achievement of safety based on saying that risks have been reduced to an acceptable level (but even that, of course, is a requirement). The two approaches are not incompatible. We must understand what we’re doing and remember these legal requirements, in whatever jurisdiction you’re in, are themselves requirements and need to be fed into the specifications. That’s the key thing. Is that something I often see missing in safety programs in all in all sorts of countries, where whoever is developing the requirements specifications, at whatever level, has forgotten about a bunch of requirements that just have to be met.
Of course, we have to remember that the measure of safety, it’s not the scope of the analysis – the analysis is just a means to an end. It’s a means to satisfy a requirement. That’s what it’s about. Having made sure we’ve considered all the requirements that we need for safety, we need to satisfy them. System analysis helps us to do that by looking at the system as a whole.
Purpose of Analyses
The purpose of these analyses is what do we do with them. I said they weren’t just for requirements. We can use analysis to identify hazards. It says corrective actions, it may be that we’ve identified hazards associated with the design or possible designs that we’re going to correct that design to reduce the hazard.
Or it may be that we’re going to add controls we might use analysis a trade-off to understand and review safety considerations and see how much safety we can get. How much safety is reasonable to have? Back to the requirements, we might use analysis to determine or evaluate safety design requirements, not just safety design requirements. We might also, need to evaluate operational, requirements for testing logistics, etc., Testing might be: how are we going to demonstrate safety? Again, the FAA is an American organization and the American approach to verification and validation tends to emphasize testing, sometimes to the exclusion of all else. Now, this isn’t necessarily the best way to do things but that’s the mentality. Just to be aware that’s one of the underlying philosophies or these principles because it’s from the American FAA.
Finally, we might use analyses to validate requirements that they’ve been met So, we might not be able to do testing. It might be too expensive or too dangerous to test something to destruction. Maybe what we need is a whole bunch of tests, different test points, and analysis is the way to do that particularly in the world of aircraft development. These days the way things tend to be done is that you have a model of your system and you use the model, in general, to validate that your system is correct and then you use certain test points to validate the model because it’s just too expensive, too time-consuming to physically test everything.
And then a final point that sounds rather odd: analysis our hazard analysis is not safety analysis. And I think what the FAA means by this is that we need to focus on real-world hazards. I’ve seen people get hung up trying to analyze a program or trying to start their analysis by analyzing safety controls and thinking about well what happened if my control goes wrong.
Well, we need to start at the other end. We need to start with the real-world hazard. That’s what’s really going to hurt people. we can work out how effective controls need to be from analyzing the hazard, not the other way around. That’s quite a common mistake I see in say programs, which is not focusing on physical hazards because then you can end up going around in circles in a rather theoretical or philosophical approach as opposed to getting the job done. That rather harks back to the previous point. The whole point of the exercise is to satisfy requirements by having a safe system not to do the analysis. There are some purposes of an analysis.
Assumptions and Criteria
As always in science and engineering. We’re going to need to make some assumptions because we can’t possibly prove absolutely everything. Now assumptions are good because they enable us to proceed. They enable us to work pragmatically but we’ve got to make sure that they are sensible. We’ve got a verify, validate them as far as we can and if we discover that an assumption turns out to be incorrect then we’re going to do something about it. Change in a program is inevitable. sometimes as we go through a large development program, we discover that the assumptions that we started with are not correct and we need to review and make changes.
That’s important. Again, people are sometimes nervous about doing that. They just want to well, dare I say, some people just want to stick their head in the sand and ignore these things but that’s not good safety management either. We’re going to have to set some risk criteria. Think we’re going to have to decide how much risk we can accept what our risk appetite is. Because as I’ve said before you can’t have zero risks, and to pretend that you can is foolish and ultimately self-defeating because then you end up with that’s an unrealistic assumption and you end up with a safety program that’s built on fantasy rather than reality.
That’s no good. Making assumptions and setting criteria are an inherent part of risk management. We need to understand that a risk is something that hasn’t yet happened. If it’s already happened, it’s an issue. So, a risk is something that could happen in the future. We’re talking about making estimates. We must set assumptions and we must set criteria. OK, I think I’ve said enough about that.
Moving on to safety management. So, we’ve got the Management Authority. But of course, safety management needs to be done at every level where we can influence the design. So, it’s not just the Management Authority’s responsibility to manage safety. Everybody who is managing safety must define safety functions, the authority that various people must make decisions and interrelationships between bodies and individuals and then safety management must be about exercising appropriate control. Whether it is control of the safety process is what we’re talking about here rather than management of hazard (controls). We need to when we’re exercising safety management. We need to do all those things
Effort and Emphasis
Not all risks are equal, not all safety controls hazard controls are equal. So, the degree of safety effort and the achievements that are required are dependent upon management emphasis. Now it says here by the FAA and tractors So, the FAA acts as a regulator. The emphasis that drives safety and where the emphasis on where we apply safety and the precedence and how much effort we put in, that’s going to be partly directed by the regulator. If you’re working in a regulated industry or it may be directed by the law and then the Management Authority or their contractors after then take and interpret those directives and apply them practically and then, of course, we’re going back to safety management. We define functions, authority, relationships and we exercise control in order to achieve the safety emphasis that is required to achieve the results that is required. That’s going to direct the effort.
We were probably going to spend a lot more effort managing higher risks than lower ones. We know our risks. Now that sounds so obvious doesn’t it? But the reality is it’s very easy for programs to lose sight of what the big risks are and major on the miners if you will. It’s too easy to get carried away with little things and you end up spending all your time on a program dealing with trivia while ignoring the fact that the horse has already bolted (escaped)!
Clarity of Objectives
I guess that comes back to the clarity of objectives, doesn’t it? There’s an old saying, one of my favorites (I apologize) “if you don’t know to which port you are sailing then no wind it’s favorable”. You’ve got to know what your safety objectives are what your safety targets are (if you’re going to set quantitative targets, but you don’t have to). Whatever your safety objectives and requirements are the Management Authority needs to clearly state and communicate them to everybody who is required to take action to manage safety. So, again, this sounds obvious, but people get it wrong so often, or they just don’t do it. Then at the back end of a program, they’re surprised that they haven’t got what they need.
This can become a big problem if you’re at the back end of a program and the Management Authority is trying to demonstrate to the regulator, or whoever it might be, customers perhaps, that they met safety requirements and met safety objectives. They may find either they got kit that can’t meet the requirements because they didn’t specify the requirement up front, or, more often, they can’t demonstrate that the kit meets the requirements, which is quite galling because you’ve got kit, which you suspect it’s perfectly okay but you can’t prove it. So, then you end up having to spend more money and waste time at the back end of the program trying to fix those things. A lot of programs end up being late and over budget for things like that. The earlier and the clearer you set your objectives the better. That supports things like making trade-offs and making decisions.
It’s all about decision making.
Management Authority Responsibilities
And that brings us neatly on to Management Authority responsibilities. The assumption is that we have an SSP, a System Safety Program. So, we have a planned program that’s going to achieve safety. The MA must plan it, organize it and make it happen. The MA has got to establish what the safety requirements are for a system, for the design, and they’ve got to state those safety requirements in a contract. (The assumption is that we’re going to contract with somebody for the whole system may be, or parts of the system.) We need a statement of work, to say OK what activities do we need to meet these requirements?
Now I guess what varies here is the amount of detail in the statement of Work. The Management Authority might take a hands-off approach and go okay, I’m going to specify some things in a statement of work like we want reviews at particular points in the program, or we want safety reporting, or whatever it might be. Or they might take a really prescriptive approach and say we’re going to specify in a lot of detail what we want in the SoW. To do that effectively the management and authority you really got to understand the minimum the thing that they need, and how that minimum might be reasonably achieved, because the danger is if you over specified that state with without work and you’ve got something wrong then you might end up stopping the contractors doing something sensible. Or the contractors might just blindly follow what you’ve told them to do rather than thinking about safety, which is what you really want!
Moving on. The MA must also review things and ensure (I think we would say in English) ENSURE an adequate and complete System Safety Program Plan. We’ve got a System Safety Program. We need a plan for it, and whether it be the MA that produces an overall plan or whether they produce a plan for themselves and then specify that the other stakeholders do their own, whichever it might be.
So, this System Safety Program, System Safety Program Plan, the Statement of Work and the requirements: those four things really are linked together and need to be thought of together. You need to take a holistic approach because if you’ve got the requirements are out of step with the program, if the plan doesn’t adequately describe the program that you need, if the statement of work is at odds with the plan or the intended program. All these things are going to cause major problems. Those four things, the System Safety Program, safety requirements, Statement Of Work and the System Safety Program Plan really need to be worked consistently and coherently any to fit together.
Let’s move on from the first five bullet points. A rather odd one, it seems, to supply historical data. Now that looks really odd doesn’t it? out of place with the others. It’s quite logical. The Management Authority, the people who say I want a system and I’m going to set everything up to make sure I get the system that I need. They’re not doing this in isolation. This might be a new system, but it’s probably replacing an old system and a Management Authority should have some expectations, from prior use of other systems or related systems. They should have some expectation of what is reasonable to expect from this kind of system. In other words, setting the safety requirements.
What kind of accidents and incidents we’ve seen in the past? and therefore what kind of hazards and risks we’re going to need to control? So, that historical data is very important and it might literally be lots and lots of low-level data or it might be something a bit higher level where we’ve learned some lessons from the past and those lessons have helped to form our safety requirements for this future system. Historical data is very important.
And again, it’s very easy to get wrong. With historical data usually what we find in the real world is we have underreporting. We have confused reporting and we’ve got a lot of data. We’re not always sure what it means whether there are any overlaps that kind of thing. Gathering historical data and analyzing it can be quite difficult, but it can also, be tremendously useful. It’s worth doing.
So, next Bullet point they may need to review contractor System Safety effort. What we’re doing the data that they’re producing the MA needs to ensure specifications are updated with analysis and test results. Again, we talked about change being inevitable. Somebody has got to make that change happen and make sure the effects of change ripple through the system consistently and that somebody is the MA. Somebody has got to have the authority to manage these things. One body. Management by committee doesn’t always work very well. Somebody some organization or some individual who clearly has authority to lead.
Finally, we need to establish and operate System Safety groups. These groups or committees, whatever you want to call them, we need to bring different stakeholders together different expertise and different competent people with different competencies together in order to support the Management Authority. The final decision rests with the Management Authority but the MA needs to pull together enough expertise to enable them to make sensible decisions. There’s a balance between this unity of leadership unity of purpose and diversity of representation that brings everything we need into the decision-making process.
Okay: software! Now, this is a slight aside, when the FAA came up with these principles, software was maybe a little bit rarer back then. Now, these days software is everywhere. But back in 2000, particularly on high integrity systems, like airplane software, it was rarer. It was there and had been for some time, but it wasn’t always doing safety-related stuff. So, it’s still seen as a bit of a special case and to be honest, even these days lots of people are frightened of software because it’s intangible, and I suspect I’m going to end up doing quite a few sessions talking about software safety and explaining it.
We note that the FAA is still taking their very much requirements-focused approach So, analysing software for hazards is seen in this approach as all about taking requirements from the top left hand side of the V model, which we see illustrated here, and flowing those requirements down to lower and lower levels until we get to implementation: the development of the software. Then as we build those we conduct unit testing, integration testing, system testing, and user testing or operational testing whatever you want to call it.
We progressively build-up testing to show that we have verified that their requirements, at every level in the V model, have been met. This is a philosophy for looking at software and it is correct, but it’s not the only way of looking at software. This is a very American approach. It emphasizes requirements. It emphasizes testing. We will see when we get to a specialist subjects on software, software is not always very amenable to being tested and just because you’ve got a requirement, just because software meets all its requirements – that’s great – maybe we can demonstrate that, but can we demonstrate that it doesn’t do anything it’s not supposed to do? Often in safety that’s half the battle or even more than half. So, I’m not necessarily a fan of this statement here and to be honest it is a bit out of date.
System Safety Program
So, we move on and this is our final slide. We’ve talked about the System Safety Program before and we’ve got some good principles here. What do we need or an effective System Safety Program? And that word effective is key because anybody can make up a program that may or may not be effective. What do we need to make it work? Well, we need a plan a planned approach to getting tasks done, getting them accomplished. Again, I have seen lots of people start tasks and not finish them, or not finish them successfully. We need qualified people. Once again, I’ve seen lots of programs with people who don’t really know what they’re doing and they’re very busy. They’re running around like headless chickens. Maybe they’ve got a lot of people, but if they don’t know what they’re doing then sure they may if directed sensibly, they may still get a result but it’s probably not going to be very elegant. So, we need people who are competent at what they are doing.
We need somebody or something that wields the authority to get stuff done, implement tasks, and that authority has got to flow through all levels of management (because we might have multiple levels). We’ve got the Management Authority in this model, who is reporting to the FAA and trying to demonstrate to that regulator that they’ve done what they were supposed to. Maybe you’ve got internal levels of management, but in the end the Management Authority has got to manage contractors, perhaps at multiple levels. On complex systems, you may have many levels of contracts contributing these parts and components and sub-assemblies et cetera et cetera into an overall complex system.
Finally, we’ve got to have appropriate staffing and funding. We’ve got to have enough people with the right skills to get the job done and that all costs money. Very often safety-qualified people are hard to find and therefore they tend to be expensive. That’s when people like myself get brought in and safety consultants, because a Management Authority or the contractors they were working for them discover that they don’t have enough staff with the right experience and competence in order to get the job done. People like me get brought in and we can be quite expensive!
Nothing wrong with doing that of course. But usually, to get effective results, I find that the Management Authority needs to have enough competent people at least to understand, to be able to realize we’re not making progress here, we need to bring in more highly qualified people. You need enough knowledge about safety in order just to realize that you’re not cutting it and you need to bring in some higher-powered help.
That’s one of the reasons for The Safety Artisan to exist, really, is to help people have enough background to realize what they’re supposed to be doing versus maybe what’s going on. Once you have that knowledge then hopefully you can build up enough knowledge to assess the situation and to decide whether what you’re doing is adequate or whether you need further help. That minimum level of knowledge is what you need to succeed. Once you’ve got that then maybe you buy more expertise and employ people in-house or maybe you bring people in temporarily, but that understanding requires a certain base-level knowledge about safety. And that’s what the Safety Artisan is all about, ladies and gentlemen. That’s a nice point on which to end.
Just to say that all the “quotations in italics” are from the U.S. Federal Aviation Authority System Safety handbook. As you can see, they’re published in the year 2000. It is getting a bit long in the tooth in some ways, but the basic principles are good ones. To be honest, I can’t find them as clearly articulated anywhere else, even today, certainly not in a form publicly available for you and me to share. So, thanks and appreciation for the FAA for doing that. I do hope one day soon they’re going to update that system safety handbook because it is a very useful beast. There are still people out there using it and maybe not understanding where it falls short these days.
Now, U.S. government standards tend to be copyright free. The text itself is copyright free, but this video presentation and the value add that I’m providing is copyright of the Safety Artisan, 2019, to understand how current
I’m recording this on the 26th of October 2019. Maybe you found this video on the Safety Artisan Page at www.Patreon.com, or maybe you found it elsewhere, but you will find all my System Safety videos on Patreon.com/SafetyArtisan.
That’s the end of the presentation on System Safety Principles. Thanks for your attention. it just remains for me to say thanks for tuning in as always. I will see you soon. Cheers now.
See the video on Patreon, here.