Categories
Mil-Std-882E

Transcript: Functional Hazard Analysis (T208)

In the full-length (40-minute) session, The Safety Artisan looks at Functional Hazard Analysis, or FHA, which is Task 208 in Mil-Std-882E. FHA analyses software, complex electronic hardware, and human interactions. We explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (We refer to other lessons for special techniques for software safety and Human Factors.)

Transcript: Functional Hazard Analysis

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyse the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is task 208 in mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? And in classic 882 style, task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, it’s used to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. The truth is in the real world, that lots of software-intensive systems have been involved in accidents that have killed lots of people, even when they’re functioning as intended. That’s one of the short-sightedness of this Mil.Standard is that it focuses on failure. The idea that if something is performing as specified, that either the specification might be wrong or there might be some disconnect between what the system is doing and what the human expects- The way the standard is written just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity- severity only, we’ll come back to that- for the purpose of identifying what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sort of a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behaviour over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people is the short answer.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this- the assumption in this task is that we’re doing early- We’ll see that later- and that system design, system architecture, is still up for grabs. We can still influence it. Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life- But we’ll talk about how you deal with the realities later. And they’re allocating these functions and these items of interest to hardware, software and human interfaces.

And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those type of simple components that are only really subject to random failure, that’s not what we’re talking about here. We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide on purpose; so we use the FHA to identify consequences of malfunction or functional failure, lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a systems engineering process- that’s not always the case. We’ll talk about that at the end as well. And we’re going to identify and document these functions and items and allocate and it says partition them in the software design architecture. When we say partition, that’s jargon for separate them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done.

Task Description (T208) #1

Moving on to task description. It says the contractor, but whoever’s doing the analysis has to perform and document the FHA, to analyse those functions, as it says, with the proposed design. I talked about that already so we’ll move on.

It’s got to be based on the best available data, including mishap data. So, accident/incident data, if you can get it from similar systems and lessons learned. As I always say in these sessions, this is hard to do, but it’s really, really valuable so do put some effort into trying to get hold of some data or look at previous systems or similar systems. We’re looking at inputs, outputs, interfaces and the consequences of failure. So, if you can get historical data or you can analyse a previous system or a similar system, then do so. It will ultimately save you an awful lot of money and heartache if you can do that early on. It really is worth the effort.

Task Description (T208) #2

At a minimum, we’ve got to identify and evaluate functions and to do that, we need to decompose the system. So, imagine we’ve got this great big system. We’ve got to break it down into subsystems of major components. We’ve got to describe what each subsystem and major component does, its function or its intended function. Then we need a functional description of interfaces and thinking about what connects to what and the functional ins and outs. I guess pretty obvious stuff – needs to be done.

Task Description (T208) #3

And then we also need to think about hazards associated with, first of all, loss of function. So, no function when we need it. Now, we have degraded functional malfunction and sort of functioning out of time or out of sequence. So, we’ve got different kinds of malfunctions. What we don’t have here is function when not required. So, the system goes active for some reason and does something when it’s not meant to. Now, if we add that third base and we’ve got a functional failure analysis. Essentially here, we’re talking about a functional failure analysis, maybe something a bit more sophisticated, like a HAZOP. And the HAZOP is more sophisticated because instead of just those three things that can go wrong, we think about we’ve got lots of guide words to help us think about ‘out of time, out of sequence’. So, too early, too late, before intended, after intended, whatever it might be. And there are there variations on HAZOP called computer HAZOP, or CHAZOP, where people have come up with different keywords, different prompt words, to help you think about software in data-intensive systems. So, that’s a possible technique to use here.

And then when we’re thinking about these hazards that might be generated by malfunction, or functional failure in its various forms, we need to think about, “What’s the next step in the mishap sequence? In the accident sequence? And what’s the final outcome of the accident sequence?” And that’s very important for software because software is intangible. It has no physical form. On its own, in isolation, software cannot possibly hurt anyone. So, you’ve got to look at how the software failure propagates through the system into the real world and how it could harm people. So, that’s a very important prompt that that last sentence in yellow there.

Task Description (T208) #4

And we carry on. We need to assess the risk with failure of a function subsystem or component. We’re going to do so using the standard 882 tables, tables one and two, and risk assessment codes in table three, unless we come up with our own tailored versions of those tables and that matrix and that’s all approved. In reality, most people don’t tailor this stuff. They should make it appropriate for the system, but they rarely do.

Table I and II

So just to remind us what we’re talking about, here’s table one and two. Table one is severity categories ranging from catastrophic, which could kill somebody- a catastrophic outcome- down to negligible, where we’re talking cuts and bruises- very, very, very minor injuries.

And then table two, probability levels. We’ve got everything from frequent down to eliminated- There’s no hazard at all because we’ve eliminated. It will never happen in the lifetime of the universe. So, it really is a zero probability. We’ve got frequent down to improbable and then in the standard, we’ve got a definition for these things in words, for a single item and also for a fleet or inventory of those items, assuming that there’s a large number of them. And that’s very useful. That helps us to think about how often something might go wrong per item and per fleet.

Table III

So, that’s tables one and two, we put them together, the severity and the probability to give us table three. As you can see, we’ve got probability down the left-hand side and at the bottom, if we’ve eliminated the hazard, then there is no severity. The hazard is completely eliminated. So, forget about that row. Then everything else we’ve got frequent down to improbable, probability. And we’ve got catastrophic down to negligible. Together those generate the risk assessment code, which is either high, serious, medium or low. That’s the way this standard defines things. Nothing is off-limits. Nothing is perfect except for elimination. We’ve just defined a level of risk and then you have to make up rules about how you will treat these levels of risk. The standard does some of that for you, but usually, you’ve got to work out depending on which jurisdiction you’re in legally, what you’re required to do about different levels of risk.

Now this table on its own, I’ll just mention, is not helpful in a British or Australian jurisdiction where we have to reduce or eliminate risks SOFARP. The table on its own won’t help you do that, because this is just an absolute level of risk. It’s not considering what you could have done to make it better. It’s just saying where we are. It’s a status report.

So, those are your tables one, two and three, as the standard describes them. That’s the overall method and we’re going to do what it says in Section four of the standard. In the main body of the standard, Section four talks about software and complex hardware and how we allocate these things.

Task Description (T208) #5

And then finally, I think on task description, an assessment of whether the functions identified are to be implemented in the design- sorry, of whether the functions are to be implemented in the design and map those functions into the components. And then it says functions allocated to software should be matched to the lowest level of technical design or configuration item. So, if you’ve got a software or hardware configuration item that is further subdivided into sub-items, then you need to go all the way down and see which items can contribute to that function and which can’t.

That’s an important labour-saving device, because if you’ve got – you could have quite a large configuration item, but actually, only a tiny bit contributes to the hazard. So, that’s the only thing you need to worry about in theory. In reality, partitioning software is not as easy as the standard might suggest. However, if we can do a meaningful partition, then we could and should aim to have as little software safety-related as we possibly can. If nothing else, for cost in order to get the project in on time. So, the less criticality we have in our system, the better.

Task Description (T208) #6

So, we need to assess the software control category for each configuration item that’s been allocated a safety-significant software function or a triple SF(SSSF). Having assigned the SCC, we then have to work at the software criticality index for each of those functions and we’ll talk about how to do that at the end. Then from all of this work, we need to generate a list of requirements and constraints to include in the spec which, if they work, will eliminate the hazard or reduce the risk.

And the standard talks about that these could be in the form of fault tolerance, fault detection, fault isolation, fault annunciation or warning, or fault recovery. Now, this breakdown reveals- basically this is a reliability breakdown. So, in the world of reliability, we talk typically about fault tolerance, fault detection, warning, and recovery. Four things – they split them down to five here. Now, software reliability is highly controversial. So really, this is a bit of a mismatch here. These reliability-based suggestions are not necessarily much use for software, or indeed for people sometimes. You may have to use other more typical software techniques to do this and in fact, the standard does point you to do that. But that’s for another session.

FHA Update & Records #1

So, we’ve done the FHA, or we’re doing the FHA. We’ve got to record it and we’ve got to update it when new information comes through. So, we’ve got to update the FHA as the design progresses or operational changes come in. We’ve got to have a system description of the physical and functional characteristics of the system and subsystems. And of course, for design complex items like software, context is everything. So, this is very important. Again, software in isolation cannot hurt anyone. You’ve got to have the context to understand what the implications might be. If we don’t have that, we’re stuffed pretty much. Then it goes on to say that when further documentation becomes available, more detail that needs to be supplied. So, don’t forget to ask for that in your contract and expect it as well and be ready to deal with it.

FHA Update & Records #2

 Moving on. When it comes to hazard analysis, method and techniques, we need to describe the method and the technique used for the analysis, what assumptions and what data was used in support of the analysis and this statement is pretty much in every single task so I’ll say no more. You’ve heard this before. Then again, analysis results need to be captured in the hazard tracking system and, as I’ve always said, usually the leading details, the top-level details, go in there has a tracking system. The rest of it goes into the hazard analysis report otherwise, you end up with a vast amount of data in your HTS and it becomes unwieldy and potentially useless.

Contracting #1

Contracting- Again, this is a pretty standard clause, or set of clauses, in a Mil. Standard 882 task. So, in our request for proposal and statement of work, we’ve got to ask the task 208. We’ve got to point the analyst, the contractor, at what we want them to analyse particularly or maybe as a minimum. And what we don’t want to analyse, maybe because it’s been done elsewhere or it’s out of scope for this system.

We need to say what are data reporting requirements are considering Task 106, which is all about hazard tracking system or the hazard log or the risk register, whatever you want to call it. So, what data do we want? What format? What are the definitions, etc.? Because if you’re dealing with multiple contractors or you want data that is compatible with the rest of your inventory, then you’ve got to specify what you want. Otherwise, you’re going to get variability in your data and that’s going to make your life a whole lot harder downstream- Again, this is standard stuff.

And what are the applicable requirements, specifications and standards? Of course, this is an American standard so compliance with specifications, requirements and standards is all because that’s the American system.

Contracting #2

We need to supply the concept of operations, as I’ve said before, with a complex design. Especially software, context is everything. So, we need to know what we’re going to do with the system that the software is sat within. So, this system has got some functions, this is what we’re looking at in task 208: What are those functions for? How do they to relate with the real world? How could we hurt people? And then if we got any other specific hazard management requirements. Maybe we’re using a special matrix because we’ve decided the standard matrix isn’t quite right for our system. Whatever we’re doing, if we’ve got special requirements that are not the norm for the vanilla standard, that we need to say what they are. Pretty straightforward stuff.

Commentary #1

We’re onto commentary, and I think we’ve got five slides of commentary today. As it says, functional hazard analysis depends on systems engineering. So, if we don’t have good systems engineering, we’re unlikely to have good functional analysis. So, what do I mean by good systems engineering? I mean, that for the complete system – apart from things that we deliberately excluded for a good reason – but for the complete system we need or functions to be identified, we need those functions to be analysed and allocated correctly in accordance and rigorously and consistently. We need interface analysis, control, and we need the architecture of the design to be determined based on the higher-level requirements, all that work that we’ve done.

Now, if those things are not done or they’re incomplete, or they were done too late to influence the design architecture, then you’re going to have some compromised systems engineering. And these days, because we’re using lots of commercial off the shelf stuff, what you find is that your top-level design architecture is very often determined before you even start because you’ve decided you’re going to have an off the shelf this and you’re going to have a modified off the shelf that and you’re going to put them together in a particular way with a set of business rules, a concept of operations, that says this is how we’re going to use this stuff.

And our new system interfaces with some existing stuff and we can’t modify the existing stuff. So, that really limits what we can do with the design architecture. A lot of the big design decisions have already been taken before we even got started. Now, if that’s the case, then that needs to be recognized and dealt with. I’ve seen those things dealt with well. In other words, the systems engineering has been done recognizing those constraints, those things that that can’t be done. And I’ve seen it done badly in that figuratively speaking, the systems engineering team or the program manager, whoever has just given us of Gallic shrug and gone “Yeah, what the heck, who cares?” So, there’s this the two extremes that you can see.

Now, if the systems engineering is weak or incomplete, then you’re going to get a limited return on doing task 208. Maybe there are some areas where you can do it on new areas, or maybe you’ve got a new interface that’s got to be worked up and created in order to get these things to talk to each other. Clearly, there is some mileage in doing that. You’re going to get some benefits from doing that in that area. But for the stuff that’s already been done, probably – well, what’s the point of doing systems engineering here? What does it achieve? So, maybe in those circumstances, it’s better- Well, in fact, I would say it’s essential to understand where systems engineering is still valid, where you still going to get some results and where it isn’t. And maybe you just declare that scope; What’s in and out.

Or maybe you take a different approach. Maybe you go “OK, we’re dealing with a predominantly CoP system. We need a different way of dealing with this than the way the Mil. standard 882 assumes.” So, you’re going to have to do some heavy tailoring of the standard because 882 assumes that you’re determining all these requirements predesigned. If that’s not the case, then maybe 882 isn’t for you. Or maybe you just need to recognize you’re going to have to hack it about severely. Which in turn means you’ve got to know what you’re doing fundamentally. In which case the standard really is no longer fulfilling its role of guiding people.

Commentary #2

Moving on. Let’s assume that we are still going to do some task 208. We’re going to determine some software criticality. We’re also going to determine some criticality for complex hardware. So, things whether it be software in complex electronics, so pre-programmed electronics, whatever that might be.

First of all, as we said before, we’re going to determine the software control category and what that’s really saying is how much authority does the software have? And then secondly, we’re going to be looking at severity, which was table one. How severe is the worst hazard or risk that the software could contribute to? And these are illustrated in the next two slides. And we do a session or several sessions on software safety is coming soon. That will be elsewhere. I’m not going to go into massive detail here. I’m just giving you an overview of what the task requires.

Commentary #3: Software Control Categories 1-5

First of all, how do we determine software control category? So, there’s the table from the standard. We’ve got five levels of SCC.

At the top, we’ve got autonomous. Basically, the software does whatever it wants to and there’s no checks and balances.

Secondly, they’re semi-autonomous. The software is there’s one software system performing a function, but there are hardware interlocks and checks. And those hardware interlocks and checks, and whatever else that are not software, can work fast enough to prevent the accident happening. So, they can prevent harm. So, that’s semi-autonomous.

Then we’ve got redundant fault-tolerant where you’ve got an architecture typically with more than one channel, and maybe all channels are software controlled. Maybe there’s diversity in the software and there is some fault-tolerant architecture. Maybe a voting system or some monitoring system saying, “Well, Channel Three’s output is looking a bit dodgy” or “Something gone wrong with Channel two”. I’ll ignore the channel at fault, and I’ll take the good output from the channels that are still working and I’ll use that. So that’s that option. Very common.

Then we’ve got number four, which is influential. So, the software is displaying some information for a human to interpret and to accept or reject.

And then we’ve got five, which is no safety impact at all. Now, the problem child in this, of course, is influential because it’s very easy to say, “The software just displays some information, it doesn’t do anything”. So, unless a human does something – so we don’t have to worry about the safety implications of that at all. Wrong! Because the human operator may be forced to rely on the software output by circumstances, there may not be time to do anything else. Or the human may not be able to work out what’s going on without using the software output. Or more typically, the humans have just got used to the software generating the correct information or even they interpret it incorrectly.

A classic example of that was when the American warship, the USS Vincennes, shot down an airliner and killed three hundred people because the way the system was set up, the supposedly not safety-related radar system was displaying information not associated with the airliner, but associated with the with a military Iranian aircraft. And the crew got mixed up and shot down the airliner. So, that’s a risky one. Even though it’s down at number four, that doesn’t mean it’s without risk or without criticality.

Commentary #4

So, if we have the software control category, and that’s down the right-hand side- sorry down the left-hand side, one to five. And along the top, we have the severity category from catastrophic down to negligible. We can use that to determine the software criticality index, which varies from one most critical down to five least critical. It’s similar to the risk assessment code in Table III, the coloured matrix that I showed you earlier. So, the writers of the standard have made a determination for us based on some assessment that they’ve done saying, “Well, this is this is how we assess these different criticality levels”. Whether there is actually any real-world evidence supporting this assessment, I don’t know and I’m not sure anybody else does either. However, that’s the standard and that’s where we are.

Commentary #5

And so just to finish up on the commentary. 208 is focused on software engineering, also programmable electronics, complex hardware, but typically electronics with software functionality or logic functionality embedded within it. Now if all of that software, all that programmable electronic systems, if they’re all developed already, is there any point in doing task 208? That’s the first-it’s got to pass the “So what?” test. Is it feasible to do 208 and expect to get benefits? If not, maybe you just do system and subsystem hazard analysis. That’s tasks 205 and 204, respectively. And we just look at the complex components and subsystems as a black box and say, “OK, what’s it meant to do? What are the interfaces?” Maybe that would be a better thing to do.

Particularly, bearing in mind that the software or the complex electronic system could be working perfectly well and we still get an accident because there’s been a misunderstanding of the output. Maybe it’s more beneficial to look at those interfaces and think about, “Well, in what scenarios could the human misunderstand? How do we how do we guard against that?”

It’s also worth saying that some particularly American software development standards, can work well with Mil.standard 882 because they share a similar conceptual basis. For example, I’ve seen many, many times in the air world, the systems software system safety standard is 882 and the systems software standard is DO-178. Or ED12. Anyway, it’s the same standard, just different labels. Now they work relatively well together because the concept underpinning 178 is very similar to 882. It’s American centric. It’s all about, you put requirements on the software development and it’s assumed that if you – this is sort of a cookbook approach – the standard assumes that if you use the right ingredients and you mix them up in the right way, then you’re going to get a good result. And that’s a similar sort of concept for 882 and the two work relatively well together, fairly consistently. Also because they’re both American, there’s a great focus on software testing. Certainly, in the earlier versions of DO-178, it’s exclusively focused on software testing. Things like source code analysis and other things- more modern techniques that have come in- they’re not recognized at all in earlier versions of 178 because they just weren’t around. So, that focus on testing suits 882, because 882, generates lots of requirements and constraints which you need to test.

What it’s not so good at is generating cases where you say, “Well if this goes wrong” or “If we’re at the edge of the envelope where we should be, let’s test for those edge of the envelope cases, let’s test that the software is working correctly when it’s outside of the operating envelope that it should be”. Now, that kind of thinking isn’t so strong in 882, nor in 178. So, there are some limitations there. Good practice, experienced practitioners will overcome those by adding in the smarts that the standards lack. But just to be aware, a standard is not smart. You’ve still got to know what you’re doing in order to get the most out of it.

So, maybe you’re buying software that’s predevelopment or that you’re using- you’re not in the States. You’ve got a European or an Asian Indian supplier or Japanese supplier or whatever. Maybe they’re not using American style techniques and standards. Is that- how well is that going to work with 882? Are they compatible? They might be, but maybe they’re not. So, that requires some thought. If they’re not obviously compatible, then what do you need to do to make that translation and make it work. Or at least understand where the gaps are and what you might do about it to compensate?

And I’ve not talked about data, but it is worth mentioning that with data-rich systems these days- and I heard just the other day, is it two quintillion bytes of data being generated every two days or something ridiculous? That was back in 2017. So, gigantic amounts of data being generated these days and used by computing systems, particularly artificial intelligence systems. So, the rigour associated with that data – the things that we need to think about on data are potentially just as important as the software. Because if the software is processing rubbish data, you’re probably going to get rubbish results. Or at the very least unreliable results that you can’t trust. So, you need to be thinking about all of those attributes of your data; correct, complete, consistent, etc, etc. I mean, I probably need to do a session on that and maybe I will.

Copyright Statement

That’s the presentation. As you can see, everything in italics and quotes is out of the standard, which is copyright free. But this presentation is copyright of the Safety Artisan.

For More…

And you will find many more presentations and a lot more resources at the website www.safetyartisan.com. Also, you’ll find the paid videos on our Patreon page, which is www.patreon.com/SafetyArtisan or go to Patreon and search for the Safety Artisan.

End

Well, that’s the end of our presentation, and it just remains for me to say thanks very much for listening. Thanks for your time and I look forward to seeing you in the next session, Task 209. Looking forward to it. Goodbye.

Back to the Home Page | Mil-Std-882 Page | System Safety Page

#Safety #Engineering #Training
Categories
Mil-Std-882E

T208 Requirements for Functional Hazard Analysis

This is Mil-Std-882E Functional Hazard Analysis (FHA).
Back to: Task 207.

The 200-series tasks fall into several natural groups. T208 requirements for Functional Hazard Analysis are reproduced (below). The full-length video is here.

T208 Requirements: Functional Hazard Analysis

208.1 Purpose. Task 208 is to perform and document a Functional Hazard Analysis (FHA) of an individual system or subsystem(s). The FHA is primarily used to identify and classify the system functions and the safety consequences of functional failure or malfunction, i.e. hazards. These consequences will be classified in terms of severity for the purpose of identifying the safety-critical functions (SCFs), safety-critical item (SCIs), safety-related functions (SRFs), and safety-related items (SRIs) of the system. SCFs, SCIs, SRFs, and SRIs will be allocated or mapped to the system design architecture in terms of hardware, software, and human interfaces to the system. The FHA is also used to identify environmental and health related consequences of functional failure or malfunction. The initial FHA should be accomplished as early as possible in the Systems Engineering (SE) process to enable the engineer to quickly account for the physical and functional elements of the system for hazard analysis purposes; identify and document SCFs, SCIs, SRFs, and SRIs; allocate and partition SCFs and SRFs in the software design architecture; and identify requirements and constraints to the design team.

208.2 Task description. The contractor shall perform and document a FHA to analyze functions associated with the proposed design. The FHA should be based on the best available data, including mishap data (if obtainable) from similar systems and other lessons learned. This effort will include inputs, outputs, critical interfaces, and the consequence of functional failure.

208.2.1 At a minimum, the FHA shall consider the following to identify and evaluate functions within a system:
a. Decomposition of the system and its related subsystems to the major component level.
b. A functional description of each subsystem and component identified.
c. A functional description of interfaces between subsystems and components. Interfaces should be assessed in terms of connectivity and functional inputs and outputs.
d. Hazards associated with loss of function, degraded function or malfunction, or functioning out of time or out of sequence for the subsystems, components, and interfaces. The list of hazards should consider the next effect in a possible mishap sequence and the final mishap outcome.
e. An assessment of the risk associated with each identified failure of a function,
subsystem, or component. Estimate severity, probability, and Risk Assessment Code (RAC) using the process described in Section 4 of this Standard. The definitions in Tables I and II, and the RACs in Table III shall be used, unless tailored alternative definitions and/or a tailored matrix are formally approved in accordance with Department of Defense (DoD) Component policy.
f. An assessment of whether the functions identified are to be implemented in the design hardware, software, or human control interfaces. This assessment should map the functions to their implementing hardware or software components. Functions allocated to software should be mapped to the lowest level of technical design or configuration item prior to coding (e.g., implementing modules or use cases).
g. An assessment of Software Control Category (SCC) for each Safety-significant Software Function (SSSF). Assign a Software Criticality Index (SwCI) for each SSSF mapped to the software design architecture.
h. A list of requirements and constraints (to be included in the specifications) that, when successfully implemented, will eliminate the hazard or reduce the risk. These requirements could be in the form of fault tolerance, detection, isolation, annunciation, or recovery.

208.2.2 The contractor shall update the FHA following system design or operational changes as necessary.

208.2.3 The contractor shall document results of the analysis to include the following:
a. System description. This summary describes the physical and functional
characteristics of the system and its subsystems. Reference to more detailed system and subsystem descriptions, including specifications and detailed review documentation, shall be supplied when such documentation is available.
b. Hazard analysis methods and techniques. Provide a description of each method and technique used in conduct of the analysis. Include a description of assumptions made for each analysis and the qualitative or quantitative data used.
c. Hazard analysis results. Contents and formats may vary according to the individual requirements of the program and methods and techniques used. As applicable, analysis results should be captured in the Hazard Tracking System (HTS).

208.3 Details to be specified. The Request for Proposal (RFP) and Statement of Work (SOW) shall include the following, as applicable:
a. Imposition of Task 208. (R)
b. Identification of functional discipline(s) to be addressed by this task. (R)
c. Desired analysis methodologies and technique(s) and any special data elements, format, or data reporting requirements (consider Task 106, Hazard Tracking System).
d. Applicable requirements, specifications, and standards.
e. Concept of operations.
f. Other specific hazard management requirements, e.g., specific risk definitions and matrix to be used on this program.

Forward to the next excerpt: Task 209

Back to the Home Page | Mil-Std-882 Page | System Safety Page

#Safety #Engineering #Training
Categories
Mil-Std-882E

Functional Hazard Analysis (Task 208)

To view this content, you must be a member of Simon's Patreon at $45 or more
Already a qualifying Patreon member? Refresh to access this content.
Categories
Mil-Std-882E

Lesson, Sub-System Hazard Analysis (Task 204)

To view this content, you must be a member of Simon's Patreon at $45 or more
Already a qualifying Patreon member? Refresh to access this content.
Categories
Mil-Std-882E

Lesson: Preliminary Hazard Analysis, Task 202

To view this content, you must be a member of Simon's Patreon at $45 or more
Already a qualifying Patreon member? Refresh to access this content.
Categories
Mil-Std-882E

Transcript: Sub-System Hazard Analysis (T204)

Here is the full transcript: Sub-System Hazard Analysis.

In the video lesson, The Safety Artisan looks at Sub-System Hazard Analysis, or SSHA, which is Task 204 in Mil-Std-882E. We explore Task 204’s aim, description, scope and contracting requirements. We also provide value-adding commentary and explain the issues with SSHA – how to do it well and avoid the pitfalls.

Introduction

Hello, everyone, and welcome to the Safety Artisan, where you will find professional, pragmatic, and impartial instruction on all things system safety. I’m Simon – I’m your host for today, as always and it’s the fourth of April 22. With everything that’s going on in the world, I hope that this video finds you safe and well.

Sub-System Hazard Analysis

Let’s move straight on to what we’re going to be doing. We’re going to be talking today about subsystem hazard analysis and this is task 204 under the military standard 882E. Previously we’ve done 201, which was preliminary hazard identification, 202, which is preliminary hazard analysis, and 203, which is safety requirements hazard analysis. And with task 204 and task 205, which is system has analysis, we’re now moving into getting stuck into particular systems that we’re thinking about, whether they be physical systems or intangible. We’re thinking about the system under consideration and I’m really getting into that analysis.

Topics for this Session

So, the topics that we’re going to cover today, I’ve got a little preamble to set things in perspective. We then get into the three purposes of task 204. First, to verify compliance. Secondly, to identify new hazards. And thirdly, to recommend necessary actions. Or in fact, that would be recommend control measures for hazards and risks. We’ve got six slides of task description, a couple of slides on reporting, one on contracting, and then a few slides on some commentary where I put in my tuppence worth and I’ll hopefully add some value to the basic bones of the standard. It’s worth saying that you’ll notice that subsystem is highlighted in yellow and the reason for that is that the subsystem and system hazard analysis tasks are very, very similar. They’re identical except for certain passages and I’ve highlighted those in yellow. Normally I use a yellow highlighter to emphasize something I want to talk about. This time around, I’m using underlining for that and the yellow is showing you what these different for subsystem analysis as opposed to system. And when you’ve watched both sessions on 204 and 205, I think you’ll see the significance of why I’ve done.

Preamble – Sub-system & System HA

Before we get started, we need to explain the system model that the 882 is assuming. If we look on the left-hand side of the hexagons, we’ve got our system in the centre, which we’re considering. Maybe that interfaces with other systems. They work within operating environment; hence we have the icon of the world, and the system and maybe other systems are there for a purpose. They’re performing some task; they’re doing some function and that’s indicated by the tools. We’re using the system to do something, whatever it might be.

Then as we move to the right-hand side, the system is itself broken down into subsystems. We’ve got a couple here. We’ve got sub-system A and B and then A further broken down into A1 and A2, for example. There’s some sort of hierarchy of subsystems that are coming together and being integrated to form the overall system. That is the overall picture that I’d like to bear in mind while we’re talking about this. The assumption in the 882, is we’re going to be looking at this subsystem hierarchy bottom upwards, largely. We’ll come on to that.

System Requirements Hazard Analysis (T204)

The purpose of the task, as I’ve said before, it’s threefold. We must verify subsystem compliance with requirements. Requirements to deal with risk and hazards. We must identify previously unidentified hazards which may emerge as we’re working at a lower level now. And we must recommend actions necessary. That’s further requirements to eliminate all hazards or mitigate associated risks. We’ll keep those three things in mind and that will keep coming up.

Task Description (T204) #1

The first of six slides on the task description. Basically, we are being told to perform and document the SSHA, sub-system hazard analysis. And it’s got to include everything, whether it be new developments, COTS, GOTS, GFE, NDI, software and humans, as we’ll see later. Everything must be included. And we’re being guided to consider the performance of the subsystem: ‘What it is doing when it is doing it properly’. We’ve got to consider performance degradation, functional failures, timing errors, design errors or defects, and inadvertent functioning – we’ll come back to that later. And while we’re doing analysis, we must consider the human as a component within the subsystem dealing with inputs and making outputs. If, of course, there is an associated human. We’ve got to include everything, and we’ve got to think about what could go wrong with the system.

Task Description (T204) #2

The minimum that the analysis has got to cover is as follows. We’ve got to verify subsystem compliance with requirements and that is to say, requirements to eliminate hazards or reduce risks. The first thing to note about that is you can’t verify compliance with requirements if there are no requirements. if you haven’t set any requirements on the subsystem provider or whoever is doing the analysis, then there’s nothing to comply with and you’ve got no leverage if the subsystem turns out to be dangerous. I often see it as it gets missed. People don’t do their top-down systems engineering properly; They don’t think through the requirements that they need; and, especially, they don’t do the preliminary hazard identification and analysis that they need to do. They don’t do Task 203, the SRHA, to think about what requirements they need to place further down the food chain, down the supply chain. And if you haven’t done that work, then you can’t be surprised if you get something back that’s not very good, or you can’t verify that it’s safe. Unfortunately, I see that happen often, even on exceptionally large projects. If you don’t ask, you don’t get, basically.

We’ve got two sub-paragraphs here that are unique to this task. First, we’ve got to validate flow down of design requirements. “Are these design requirements valid?”, “Are they the right requirements?” From the top-level spec down to more detailed design specifications for the subsystem. Again, if you haven’t specified anything, then you’ve got no leverage. Which is not to say that you have to dive into massive detail and tell the designer how to do their job, but you’ve got to set out what you want from them in terms of the product and what kind of process evidence you want associated with that product.

And then the second sub-paragraph, you’ve got to ensure design criteria in the subsystem specs have been satisfied. We need to verify that they’re satisfied, and that V and V of subsystem mitigation measures or risk controls have been included in test plans and procedures. As always, the Mil. standard 882 is the American standard, and they tend to go big on testing. Where it says test plans and procedures that might be anything – you might have been doing V and V by analysis, by demonstration, by testing, by other means. It’s not necessarily just testing, but that’s often the assumption.

Task Description (T204) #3

We must also identify previously unidentified hazards because we are now down at a low level of detail in a subsystem and stuff probably will emerge at that level that wasn’t available before. First, number one, we’ve got to ensure the implementation of subsystem design requirements and controls. And ensure that those requirements and controls have not introduced any new hazards, because very often accidents occur. Not because the system has gone wrong – the system is working as advertised – but the hazards with normal operation maybe just weren’t appreciated and guarded against or we just didn’t warn the operators that something might happen that they needed to look out for. A common shortfall, I’m afraid.

And number two, we’ve got to determine modes of failure down to component failure and human errors, single points of failure, common-mode failures, effects when failures occur in components, and from functional relationships. “What happens if something goes wrong over on this side of the system or subsystem and something else is happening over here?” What are those combinations? What could result? And again, we’ve got to consider hardware and software, including all non-developmental type stuff, and faults, and occurrences. Again, I see very often, buyers/purchases don’t think about the off the shelf stuff in advance or don’t include it. And then sometimes also you see contractors going “This is off the shelf, so we’re not analysing it.” Well, the standard requires that they do analyse it to the extent practicable. And they’ve got to look at what might go wrong with all of this non-development to stuff and integrate the possible effects and consider. That’s another common gotcha, I’m afraid. we do need to think about everything, whether it’s developmental or not.

Task Description (T204) #4

And then part C, recommending actions necessary to eliminate hazards if we can. Very often we can’t, of course, and we have to mitigate. We must reduce or minimize the associated risk of those hazards. In terms of the harm that might come to people. We’ve got to ensure that system-level hazards, it says attributed. Maybe we believe when we did the earlier analysis that the subsystem could contribute to a higher-level hazard, or maybe we’ve allocated some failure budget to this particular subsystem, which it has got to keep to if we’re going to meet the higher-level targets. You can imagine lots of these subsystems all feeding up a certain failure rate and different failure modes. And overall, when you pull it all together, we may have to meet some target or reduce the number of failures in their propagation upwards in order to manage hazards and risks. We’ve got to make sure that we’ve got adequate mitigation controls of these potential hazards are implemented in the design.

If we think back to the hierarchy, we prefer to fix things in the design, eliminate the hazard if possible, or make changes to the design to eliminate or reduce the hazard, rather than just rely on human beings to catch the problem and deal with it further downstream. It’s far more effective and cheaper, in the long run, to fix things in design they are more effective controls. Certainly, in this standard in Australian law, and in the UK and elsewhere, you will find either regulations or law or codes of practice or recognized and accepted good practice that says, “You should do this”. It’s a very, very common requirement and we should pretty much assume that we have to do this.

Task Description (T204) #5

Interesting clause here in 2.2, it says if no specific hazard analysis techniques are directed or the contractor wants to take a different route to what is directed, then they’ve got to obtain approval from the program manager. If the PM (Project Manager) hasn’t specified analysis techniques, and they may not wish to, they may just wish to say you’ll do whatever analysis is required in order to identify hazards and mitigate them. But in many industries, there are certain ways of doing things and I’ve said before in previous lessons, if you don’t specify that you want something, then contractors will very often cut the safety program to the bone in order to be the cheapest bid. the customer will get what they prioritize. If the customer prioritizes a cheap bid and doesn’t specify what they want, then they will get the bare minimum that the contractor thinks they can get away with. If you don’t ask, you don’t get – Becoming a theme that isn’t it?

Task Description (T204) #6

Let’s move on to 2.3. Returning to software, we’ve got to include that. The software might be developed separately, but nevertheless, the contractor performing the SSHA shall monitor the software development, shall obtain data from each phase of the software development process in order to evaluate the contribution of the software to the subsystem hazard analysis. There’s no excuse for just ignoring the software and treating it as a black box. Of course, very often these days the software is already developed. It’s a GFE or NDI item, but there still should be evidence available or you do a black-box analysis of the subsystem that the software is sitting in. Again, if the software developer reports any identified hazards, they’ve got to be reported to the program manager in order to request appropriate direction.

This assumes a level of interaction between the software developers right up the chain to the program manager. Again, this won’t happen unless the program manager directs it and pays for it. If the PM doesn’t want to pay for it, then they are either going to have to take a risk on not knowing about the functionality of the software that’s hidden within the subsystem. Or they’re going to deal with it some other way, which is often not effective. The PM needs to do a lot of work upfront in order to think what kind of problems there might be associated with a typical subsystem of whatever kind it is we’re dealing with. And think about “How would I deal with the associated risks?” “What’s the best way to deal with them in the circumstances?” If I’m buying stuff off the shelf and I’m not going to get access to hazard analysis or other kinds of evidence, how am I going to deal with them? Big questions.

And then 2.4, the contractor shall update the SSHA following changes, including software design changes. Again, we can’t just ignore those things.  That’s slide six out of six. Let’s move on to reporting.

Reporting (T204) #1

The first slide, contractor’s got to prepare a report that contains results from the task, including within the system description, physical and functional characteristics of the system, a list of the subsystems, and a detailed description of the subsystem being analysed, including its boundaries. And from other videos, you’ll know how much and how often I emphasize knowing where the boundaries are because you can’t really do effective safety analysis and safety management on an unbounded system. It just doesn’t work. There’s a requirement here for quite a lot of information reference to more detailed descriptions as they become available. The standard says they shall be supplied. That’s a lot of information that probably texts and pictures of all sorts of stuff and that’s going to need to go into a report. And typically, we would expect to see a hazard analysis report or a HAR with this kind of information in it. Again, if the PM/customer doesn’t specify that HAR, then they’re not going to get it and they’re not going to get textual information that they need to manage the overall system.

Reporting (T204) #2

So, if we move on to parts B and C of the reporting requirement. We’ve got to describe hazard analysis methods and techniques, provide a description of each method and the technique used, and a description of the assumptions made. And it says for each qualitative or quantitative data. This is another area that often gets missed. If you don’t know what techniques have been used and you don’t know the assumptions that almost certainly that subsystem analyser will have to make because they probably don’t have visibility in the rest of the system. If you don’t have that information, it becomes very difficult to verify the hazard analysis work and to have confidence in it.

And the hazard analysis results. Content and format vary. Something else the PM is going to think about and specify upfront. Then results should be captured the hazard tracking system. Now, usually, this hazard tracking system is hazard log. It might be a database, a spreadsheet or even a word document, or something like that. And usually, in the hazard tracking system, we have the leading particulars. We don’t always have, in fact, we shouldn’t have, every little piece of information in the hazard tracking system because it will quickly become unwieldy. Really, we want the hazard log to have the leading particulars of all the hazards, causes, consequences and controls. And then the hazard log should refer out to that hazard analysis report or other reports and data, whatever they’re called, other records.

If we go back up, this reemphasizes the kind of detail that’s here in 2.5 A. That really shouldn’t be going in the hazard log. That should be going in a separate report which the hazard log/the hazard tracking system refers to. Otherwise, it all gets that unwieldy.

Contracting

I’ve said repeatedly the PM needs to think about this and ask for that.

Contracting; The standard assumes that the information in A to H below is specified way up front in the request for proposal. That’s not always possible to do in full detail, but nevertheless, you’ve got to think about these things really early and include them in the contractual documentation. And again, if your if you’re running a competition, by the time you get to the final RFP, you need to make sure that you’re asking for what you really need. maybe run a preliminary expression of interest or pre-competition exercise in order to tease out, detect. We’ve got to impose task 204 (A.) as a requirement. We may have to specify which people we want to involve, which functional specialists, which discipline specialists (B.). We want to get involved to address this work. Identification of subsystems to be analysed (C.). Well, if you don’t know what the design is upfront, we can’t always do that, but you could say all.

You may specify desired analysis methodologies and techniques (D.). And again, that’s largely domain dependent. We tend to do safety in certain ways in different worlds, in the air world is done in a particular way. in the maritime world, it’s a different way. With Road or Off-Road Vehicle, it’s done in a particular way, etc, etc., whatever it might be. Chemical plant, whatever. If they’re known hazards, hazardous areas or other specific items be examined or excluded (E.) because they’re covered adequately elsewhere. The PM or the client has got to provide technical data on all those non-development developmental items (F.), particularly if they’re specifying that the contractor will use them. If the client says “You will use this. You will use these tires, therefore, this data with these tires” or whatever it might be, you’re going to – we want a system that’s going to use to standardized spares of standardized fuel or whatever it might be or is maintainable by technicians and mechanics with these standard skill sets. There may be all sorts of reasons for asking or forcing contracts to do certain things, in which case the purchaser is responsible for providing that data.

And again, many purchases forget to do that entirely or do it very badly, and then that can cripple a safety program. What’s the concept of operations (G.). What are we going to do with this stuff? What’s the context? What’s the big picture? That’s important. And any other specific requirements (H.). What risk matrix? What risk definitions are we using on this program? Again, important otherwise, different contractors do their own thing, or they do nothing at all. And then the client must pick up pieces afterwards, which is always time-consuming and expensive and painful. And it tends to happen at the back of a program when you’re under time pressure anyway. It’s never a happy place to be. do make sure that clients and purchases that you’ve done your homework and specified this stuff upfront, even if it turns out to be not the best thing you could have specified, it’s better to have an 80 percent solution that’s pretty standard and locked down.

Commentary #1

That’s the wording that’s in the task with some commentary by myself. Now some additional commentary. It says right up front, areas to consider include performance, performance degradation, functional failures, timing errors, design errors or defects and inadvertent function. What we have here basically is a causal analysis, there will be some simple techniques that you can use to identify this kind of stuff. Something like a functional failure analysis or a failure modes effects analysis, which is like an FFA, but an FMEA requires design to work on. And, FMEA, a variant of FME is FMECA, where we include the criticality of the failure as it possibly propagates out the hierarchy of the system.

These sorts of techniques will think about what could go wrong, no function when required, inadvertent function – the subsystem functions when it’s not supposed to – and incorrect function, and there’s often multiple versions of incorrect function. considering all of those causes, all of those failure modes and if we’re doing a big safety program on something quite critical, very often the those identified faults and failures and failure modes will feed into the bottom of a fault tree where we have a hierarchical build-up of causation and we look at how redundancy and mitigation and control measures mitigate those low-level failures and hopefully prevent them from becoming full-blown incidents and accidents.

And these techniques, particularly the FFA in the FMEA, are also good for hazard identification and for investigating performance and non-compliance issues. you can apply an FFA and FMEA those type of techniques to a specification and say “We’ve asked for this. What could happen if we get what we ask for?” What could go wrong? And, what could go wrong with these requirements?

Commentary #2

Now, the second part that I’ve chosen to highlight a consideration of the human within a subsystem and this is important. Traditionally, it’s not always been done that well. Human factors, I’m glad to say is becoming more prominent and more used both because in many, many systems, human is a key component, is a key player in the overall system. And in the past, we have tended to build systems and then just expect the human operator and maintainer to cope with the vicissitudes of that system. maybe the system isn’t that well designed in terms of it is not very usable, its performance depends on being lovingly looked after and tweaked and maybe systems are vulnerable to human error, and even induce human error. We need to get a lot better at designing systems for human use.

So, we could use several techniques. We could use a HAZOP, a hazard analysis operability study to consider information flows to and from the human. There are lots of specialist human factors analyses out there. And I’m hoping to run a series of human factors sessions, interviewing a very knowledgeable colleague of mine but more on that later. that will come in due course. We’ll look at those specialist human analysis techniques. But there’s been a couple of conceptual models around for quite a long time, about 20 years now at least, for how to think about humans in the system.

Human-System Models

So, we’ve got a 5M model and the SHELL model. I’m just going to briefly illustrate those. Now, both models are taken from the US Federal Aviation Authority System Safety Handbook, which dates to the end of 2000. These have been around a long time and they were around before the year 2000, and they’re quite long in the tooth.

We’ve got the SHELL model, which considers our software, hardware, environments and live-ware – the human. And there’s quite a nice checklist on Wikipedia for things to consider. We’re considering all the different interfaces between those different elements. That’s at the hyperlink you can see at the bottom of the slide.

Then on the right-hand side, we’ve got the 5M model and apologies for the gendered language. Where the five Ms are the man/the human, the machine, management, the media – and the media is the environment for operating and maintenance environment – and then in the middle is the mission. the humans, the machines, the systems, and the management come together in order to perform a mission within a certain environment. that’s another very useful way of conceptualizing our contribution of humans and interaction between human and system. Human operators usually are maintainers, frontline staff, and management, all in a particular operating environment and environmental context and how they come together to accomplish the mission or the function of the system, whatever it might be.

Now a word of caution, on this. It’s possible to spend gigantic sums of money on human analysis. very often we tend to target it at the most critical points and we very often target it at the operator, particularly for those phases of operation where the operator must do things in a limited amount time. the operator will be under pressure and if they don’t take the right action within a certain time, something could go wrong. we do tend to target this analysis in those areas and tend to spend money hopefully in a sensible and targeted way.

Commentary #3

My final slide on the additional commentary. The other things we’ve talked about for this task, compliance checking. We should get a subsystem specification. If we don’t get a subsystem specification, well, what are the expectations on the subsystem? Are they documented anywhere? Is it in the consent to box? Is there an interface requirement document or are there interface control documents for other systems that or subsystems that interface with our subsystem – anywhere where we can get information. if we have a subsystem spec, a bunch of functional requirements say, early on we could do a functional failure analysis of those functional requirements. we can do this work really quite early if we need to and think about, “Well, what interfaces are expected or required from our subsystem?” versus “What is our subsystem actually do?” any mismatches that could give rise to problems.

So, this is a type of activity where we’re looking for continuity and we’re looking for coherence across the interface. And we’re looking for things to join up. And if they don’t join up or they’re mismatched, then there’s a potential problem. And, as we look down into the subsystem, are there any derived safety requirements from above that says this subsystem needs to do this or not do that in order to manage a hazard? Those are important to identify.

Again, if it’s not been done probably the subsystem contractor won’t do it because it’s extra expense. And they may well truly believe that they don’t need to. We’re all proud of the things that we do, and we feel sometimes emotionally threatened if somebody suggests a piece of kit might go wrong and it does blind people to potential problems.

If going the other way, we are a higher-level authority where a system prime contractor or something and we’ve got to look at the documentation from a subsystem supplier. Well, we might find out some information from sales brochures or feature lists, or there might be a description of the benefits or the functions of the system with its outputs. We hopefully should be able to get hold of some operating and maintenance manuals. And very often those manuals will contain warnings and cautions and say, “You must look after the piece of cake by doing this”. I’m thinking the gremlins now “Don’t feed it after midnight or get it wet” otherwise bad things will happen. Sorry about that, slightly fatuous example, but a good illustration, I think. And ideally, if there’s any training materials associated with the piece of kit, is there a training needs analysis that shows how the training was developed? It’s very often in a TNA if it’s done well, there’s lots of good information in there. Even if it’s not quite for the same application that weighs in the piece of kit for, you could learn a lot from that kind of stuff

And finally, if all else fails, if you’ve got a legacy piece of kit, then you can physically inspect it. And if you can take it apart, put it back together again – do so. You might discover there’s asbestos in it. You might discover that lithium batteries or whatever it might be, fire hazards, flammable materials, toxic materials, you name it. there’s a lot of ways that we can get information about the subsystem. Ideally, we ask for everything upfront. Say, you know, if there’s any hazardous chemicals in there, then you must provide the hazard sheets and the hazard data in accordance with international or national standards and so on and so forth. But if you can’t get that or you haven’t asked for it, there are other ways of doing it, but they’re often time-consuming and not the optimal way of doing it.

So again, do think about what you need upfront and do ask for it. And if the contractor can’t supply exactly what you want, what you need, you then have to decide whether you could live with that, whether you could use some of these alternative techniques or whether you just have to say, “No, thanks. I’ll go to another supplier of something similar”. And I may have to pay more for it, but I’ll get a better-quality product that actually comes with some safety evidence that means I can actually integrate it and use it within my system. Sometimes you do have to make some tough decisions and the earlier we do those tough decisions the better, in my experience.

Copyright Statement

So that’s all the technical content. Just to say that all the text that’s in italics and in speech marks is from the standard, which is copyright free. But this presentation, and especially all the commentary and the added value, is copyright at the Safety Artisan 2020.

For More …

And if you want more videos like this, rest in the 882 series and other resources on safety topics, you can find them at the website www.safetyartisan.com. And you can also go to the safety artisan page at Patreon. that’s www.pateron.com and search for Safety Artisan – all one word.

End

So, that’s the end of the presentation and it just remains for me to say, thanks very much for watching and supporting the Safety Artisan. And I’ll be doing Task 205 system hazard analysis next in the series, look forward to seeing you again soon. Goodbye, everyone.

Back to the Home Page | Mil-Std-882 Page | System Safety Page

Categories
Mil-Std-882E

Mil-Std-882E Sub-System Hazard Analysis (Task 204)

This is Mil-Std-882E Sub-System Hazard Analysis (SSHA).
Back to: Task 203.

The 200-series tasks fall into several natural groups. Task 203 address the identification and analysis of safety requirements at multiple levels.

In the video lesson, The Safety Artisan looks at Sub-System Hazard Analysis, or SSHA, which is Task 204 in Mil-Std-882E. We explore Task 204’s aim, description, scope and contracting requirements. We also provide value-adding commentary and explain the issues with SSHA – how to do it well and avoid the pitfalls.

The text from the standard follows:

“SUBSYSTEM HAZARD ANALYSIS

204.1 Purpose. Task 204 is to perform and document a Subsystem Hazard Analysis (SSHA) to verify subsystem compliance with requirements to eliminate hazards or reduce the associated risks; to identify previously unidentified hazards associated with the design of subsystems; and, to recommend actions necessary to eliminate identified hazards or mitigate their associated risks.

204.2 Task description. The contractor shall perform and document an SSHA to identify hazards and mitigation measures in components and equipment. This analysis shall include Commercial-Off-the-Shelf (COTS), Government-Off-the-Shelf (GOTS), Government-Furnished Equipment (GFE), Non-Developmental Items (NDI), and software. Areas to consider include performance, performance degradation, functional failures, timing errors, design errors or defects, and inadvertent functioning. While conducting this analysis, the human shall be considered a component within a subsystem, receiving both inputs and initiating outputs.

204.2.1 At a minimum, the analysis shall:

a. Verify subsystem compliance with requirements to eliminate hazards or reduce the associated risks.

(1) Validate applicable flow-down of design requirements from top-level specifications to detailed design specifications for the subsystem.

(2) Ensure design criteria in the subsystem specifications have been satisfied and that verification and validation of subsystem mitigation measures have been included in test plans and procedures.

b. Identify previously unidentified hazards associated with the design of subsystems.

(1) Ensure implementation of subsystem design requirements and mitigation measures have not introduced any new hazards.

(2) Determine modes of failure, including component failure modes and human errors, single point and common mode failures, the effects when failures occur in subsystem components, and from functional relationships between components and equipment comprising each subsystem. Consider the potential contribution of subsystem hardware and software events (including those developed by other contractors/sources, COTS, GOTS, NDIs, and GFE hardware or software), faults, and occurrences (such as improper timing).

c. Recommend actions necessary to eliminate previously unidentified hazards or mitigate their associated risk. Ensure system-level hazards attributed to the subsystem are analyzed and adequate mitigations of the potential hazards are implemented in the design.

204.2.2 If no specific analysis techniques are directed or if the contractor recommends a different technique than that specified by the Program Manager (PM), the contractor shall obtain PM approval of techniques to be used before performing the analysis.

204.2.3 When software to be used in conjunction with the subsystem is developed under a separate software development effort, the contractor performing the SSHA shall monitor, obtain, and use the output of each phase of the formal software development process in evaluating the software contribution to the SSHA. Hazards identified that require mitigation action by the software developer shall be reported to the PM in order to request appropriate direction be provided to the software developers.

204.2.4 The contractor shall update, as necessary, the SSHA following system design changes, including software design changes.

204.2.5 The contractor shall prepare a report that contains the results from the task described in paragraph 204.2 and includes:

a. System description. This summary describes the physical and functional characteristics of the system, a list of its subsystems, and a detailed description of the subsystem(s) being analyzed, including its boundaries. Reference to more detailed system and subsystem descriptions, including specifications and detailed review documentation, shall be supplied when such documentation is available.

b. Hazard analysis methods and techniques. Provide a description of each method and technique used in conduct of the analysis. Include a description of assumptions made for each analysis and the qualitative or quantitative data used.

c. Hazard analysis results. Contents and formats may vary according to the individual requirements of the program and methods and techniques used. As applicable, analysis results should be captured in the Hazard Tracking System (HTS).

204.3. Details to be specified. The Request for Proposal (RFP) and Statement of Work (SOW) shall include the following, as applicable:

a. Imposition of Task 204. (R)

b. Identification of functional discipline(s) to be addressed by this task. (R)

c. Identification of subsystem(s) to be analyzed.

d. Desired analysis methodologies and technique(s), and any special data elements, format, or data reporting requirements (consider Task 106, Hazard Tracking System).

e. Selected hazards, hazardous areas, or other specific items to be examined or excluded.

f. COTS, GOTS, NDI, and GFE technical data to enable the contractor to accomplish the defined task.

g. Concept of operations.

h. Other specific hazard management requirements, e.g., specific risk definitions and matrix to be used on this program.

Forward to the next excerpt: Task 205

Back to the Home Page | Mil-Std-882 Page | System Safety Page

Categories
Mil-Std-882E

Equipped: Sub-System Hazard Analysis

To view this content, you must be a member of Simon's Patreon at $15 or more
Already a qualifying Patreon member? Refresh to access this content.
Categories
Mil-Std-882E

Transcript: Preliminary Hazard Analysis (T202)

Here is the full transcript: Preliminary Hazard Analysis.

The full video is here.

Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you’ll find professional, pragmatic and impartial safety training resources. So, we’ll get straight on to our session and it is the 8th February 2020.  

Now we’re going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It’s very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It’s one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I’m doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording and scope. How we do risk assessments against Tables 1, 2 and 3. Basically, it is severity, likelihood and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting and then a short commentary from myself. In fact, I’m providing commentary all the way through. So, let’s crack on.

Task 202 Purpose

The purpose of Task 202 is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks and identify potential mitigation measures. We’re going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I’ve highlighted some stuff that I particularly want to talk about.

It says “the contractor” [does this or that], but it doesn’t really matter who is doing the analysis, and actually, the customer needs to do some to inform themselves, otherwise they won’t really understand what they’re doing.  Whoever does it needs to perform and document PHA. It’s about determining initial risk assessments. There’s going to be more work, more detailed work done later. But for now, we’re doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions that we’re proposing to introduce. That’s very important. We don’t need a design to do this. We can get in early when we have user requirements, functional requirements, that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that’s another big issue. We’ll talk about that more later. It says including mishap data as well, if accessible: American term mishap, it means an accident, but we’re avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, whatever. It’s a mishap. It’s an undesirable event. We look for accessible data from similar systems, legacy systems and other lessons learned. I’ve talked about that a little bit in Task 201 lesson about that, and there’s more on that today under commentary. We need to look at provisions, alternatives, meaning design provisions and design alternatives in order to reduce risks and adding mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What’s the task description? That’s a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we’ve got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call hazard tracking system; we might call it hazard log; we might call it a risk register. It doesn’t really matter what it’s called. The key point is it’s a tracking system. It’s a live document, as people say, it’s a spreadsheet or a database, something like that. It’s something relatively easy to update and change. And, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and to refine them and change them as time goes on. Very important point.

Scope #1

Scope. Big section this. Let me just check. Yes, we’ve got three slides on the scope. This does go on and on. The scope of the PHA is to consider the potential contribution from a lot of different areas. We might be considering a whole system or a subsystem, depending on how complex the thing is we’re dealing with. And we’re going to consider mishaps, the accidents and incidents, near misses, whatever might occur from components of the system (a. System components), energy sources (b. Energy sources), ordnance (c. Ordnance)- well that’s bullets and explosives to you and me, rockets and that kind of stuff.

Hazardous materials (d. Hazardous Materials (HAZMAT)), interfaces and controls (3. Interfaces and controls), interface considerations to other systems (f. Interface considerations to other systems when in a network or System-of-Systems (SoS) architecture), external systems. Maybe you’ve got a network of different systems talking to each other. Sometimes that’s called a system of systems architecture. Don’t worry about the definitions. Our system probably interacts and talks to other systems, or It relies on other systems in some way, or other systems rely on it. There are external interfaces. That’s the point.

Scope #2

We might think about material compatibilities (g. Material Compatibilities) – Different materials and chemicals are not compatible with others- inadvertent activation (h. Inadvertent activation).

Now, I’ve highlighted I. (Commercial-Off-the-Shelf (COTS), Government-Off-the-Shelf (GOTS), Non-Developmental Items (NDIs), and Government-Furnished Equipment (GFE).) because it’s something that often gets neglected. We also need to think about stuff that’s already been developed. The general term is NDIs and it might be commercial off the shelf, it might be a government off the shelf system, or government-furnished equipment  GFE- doesn’t really matter what it is. These days, especially, very few complex systems are developed purely from scratch. We try and reuse stuff wherever we can in order to keep costs down and schedule down.

We’re going to need to integrate all these things and consider how they contribute to the overall risk picture. And as I say, that’s not often done well. Well, it’s hardly ever done well. It’s often not done at all. But it needs to be, even if only crudely. That’s better than nothing.

J. (j. Software, including software developed by other contractors or sources.  Design criteria to control safety-significant software commands and responses (e.g., inadvertent command, failure to command, untimely command or responses, and inappropriate magnitude) shall be identified, and appropriate action shall be taken to incorporate these into the software (and related hardware) specifications)  we need to include software, including software developed elsewhere. Again, that’s very difficult, often not done well. Software is intangible. If somebody else has developed it maybe we don’t have the rights to see the design, or code, or anything like that. Effectively it’s a black box to us. We need to look at software. I’m not going to bother going through all the blurb there.

Another big thing in part k (k.  Operating environment and constraints) is we need to look at the operating environment. Because a piece of kit that behaves in a certain way in one environment, you put it in a different environment and it behaves differently. And it might become much more dangerous. You never know. And the constraints that we put under on the system. Operating environment is very big. And in fact, if you see the lesson I did on the definition of safety, we can’t really define whether a system is safe or not until we define the operating environment. It’s that important, a big point there.

Scope #3

And then the third slide of three procedures (l. Procedures for operating, test, maintenance, built-in-test, diagnostics, emergencies, explosive ordnance render-safe and emergency disposal). Again, these are well these often don’t appear until later unless of course, we’ve gone off the shelf system. But if we have got off the shelf system; there should be a user manual, there should be maintenance manuals, there should be warnings and cautions, all this kind of stuff. So, we should be looking for procedures for all these things to see what we could learn from them. We want to think about the different modes (m. Modes) of operation of the system. We want to think about health hazards (n. Health hazards) to people, environmental impacts (o. Environmental Impacts), because they take to includes environmental.

We need to think about human factors, human engineering and human error analysis (p. Human factors engineering and human error analysis of operator functions, tasks, and requirements). And it says operator function tasks and requirements, but there’s also maintenance and disposal of storage. All the good stuff. Again, Human Factors is another big issue. Again, it’s not often done well, but actually, if you get a human factor specialist statement early, you can do a lot of good work and save yourself a lot of money, and time, and aggravation by thinking about things early on.

We need to think about life support requirements (q.  Life support requirements and safety implications in manned systems, including crash safety, egress, rescue, survival, and salvage). If the system is crewed or staffed in some way, I’m thinking about, well, ‘What happens if it crashes?’ ‘How do we get out?’ ‘How do we rescue people?’ ‘How do we survive?’ ‘How do we salvage the system?’

Event-unique hazards (r. Event-unique hazards). Well, that’s kind of a capsule for your system does something unusual. If it does something unusual you need to think about it.

And then thinking about part s. infrastructure (s.  Built infrastructure, real property installed equipment, and support equipment), property installed equipment and support equipment in property and infrastructure.

And then malfunctions (t. Malfunctions of the SoS, system, subsystems, components, or software) of all the above.

I’m just going to whizz back and forth. We’ve got to sub-item T there. We’ve got an awful lot of stuff there to consider. Now, of course, this is kind of a hazard checklist, isn’t it? It’s sort of a checklist of things. We need to look at all that stuff. And in that respect, that’s excellent, and we should aim to do something on all of them just to see if they’re relevant or not if nothing else. The mistake people often make is because they can’t do something perfect and comprehensive, they don’t do anything. We’ve got a lot of things to go through here. And it’s much better to have a go at all these things early and do a bit of rough work in order to learn some stuff about our system. It’s much better to do that than to do nothing at all. And with all of these things, it may be difficult to do some of these things, the software, the COTS, things where we don’t have access to all the information, but it’s better to do a little bit of work early than to do nothing at all waiting for the day to arrive when we’ll be able to do it perfectly with only information. Because guess what? That day never comes! Get in and have a go at everything early, even if it’s only to say, ‘I know nothing about this subject, and we need to investigate it.’ That’s the pros and cons of this approach. Ideally, we need to do all these things, but it can be difficult.

Risk Assessment

Moving on. Well, we’ve looked to a broad scope of things for all the hazards that we identify and there are various techniques you can use. The PHA has got to include a risk assessment. That means that we’ve got to think about likelihood and severity and then that gives us an overall picture of risk when we combine the two together. That’s tables 1 and 2.

And then, forget risk assessment codes I’m not sure why that’s in there, table 3 is the risk matrix and 88 2 has a standard risk matrix. And it says to use that unless you’ve got a tailored matrix for your system that’s been approved for use. And in this case, it says approved effectively in accordance with the US Department of Defence. But it’s whoever is the acquiring organization, the authority, the customer, the purchaser, whatever you want to call it, the end-user. We’ll talk about that more in a sec.

Table I, Severity

Let’s start by looking at severity, which in many ways is the easiest thing to look at. Now, here we’ve got in this standard we’ve got an approach based on harm to people, harm to the environment, and monetary loss due to smashing stuff up. At the top catastrophic accident. Category 1 is a fatal accident. This accident could result in death, permanent total disability, irreversible significant environmental impact, or monetary loss. And in this case, it says $10 million. Well, this, that’s 10 million US dollars. This standard was created in 2012, this version of the standard, probably inflation has had an effect since then. And a critical accident, we could cause partial disability injuries or occupational illness that can hospitalized three people are reversible. Significant environmental impact or some losses between 1 million and 10. And then we go down to marginal. Injury or hospital, lost workdays for one person, reversible moderate environmental impact or monetary loss between $100,000 and one million dollars. And then finally negligible is less than that. Negligible is an injury or illness that doesn’t result in any lost time at work, minimal environmental impact, or a monetary loss of less than a hundred thousand dollars. That’s easy to do in this standard. We just say, ‘What are the losses that we think could result?’ Worst case, reasonable scenario or an accident? That’s straightforward.

Table II, Probability

Now let’s look at probability. We’ve got a range here from ‘a’ to ‘e’, frequent down to improbable, and then F is eliminated. And eliminated in the standard really does mean eliminated. It cannot happen ever! It does not mean that we managed to massage the figures, the likelihood a probability figures, down Low that we pretend that it will never happen. It means that it is a physical impossibility. Please take note because I’ve seen a lot of abuse of that approach. That’s bad practices to massage the figures down to a level where you say, ’I don’t need to bother thinking about this at all!’ because the temptation is just to frig [massage] the figures and not really consider stuff that needs to be considered. Well, I’ll get off my soapbox now.

Let’s go back to the top. Frequent- you’ve said, for one item, likely to occur often. Down to probable- occur several times in the life of an item. Occasional- likely to occur sometimes, we think it’ll happen once in the life of an item. Remote- we don’t think it’ll happen at all, but it could do. And improbable – so unlikely for an individual item that we might assume that the occurrence won’t happen at all. But when we consider a fleet, particularly, I’ve got hundreds or thousands of items, the cumulative risk or cumulative probability, sorry, I should say, is unlikely to occur across the fleet, but it could.

And this is where this specific vs. fleet occurrence or probability is useful. For example, if we think ‘Let’s imagine a frequent hazard’. We think that something could happen to an item, per item, let’s say once a year. Now, if we’ve got a fleet of fifty of these items or fifty-something of these items, that means it’s going to happen across the fleet pretty much every week on average. That’s the difference. And sometimes it’s helpful to think about an individual system. And sometimes it’s helpful to think about a fleet where you’ve got the relevant experience to say, ‘Well the fleet that we’re replacing. We had a fleet of 100 of these things. And this went wrong every week or every month or once a year or only happened once every 10 years across the entire fleet.’ And therefore, we could reason about it that way.

We’ve got two different ways of looking at probability here. And use whichever one is more useful or helps you. But when we’re doing that, try and do that with historical data, not just subjective judgment. Because otherwise your subjective judgment, one individual might say ‘That will never happen!’, whereas another will say, ‘Well, actually we experienced it every month on our fleet!’. Circumstances are different.

Table III, Risk Matrix

We put severity and probability together. We have got ‘1’ to ‘4’ for severity, and ‘A’ to ‘F’ for probability, and we get this matrix. We’ve got probability down the side and severity along the top. And in this standard, we’ve got high risk, serious risk, medium risk and low risk. And now how exactly you define these things is, of course, somewhat arbitrary. We’ll just look at some general principles.

The good thing about this risk matrix is- First, the thing to remember is that risk is the product of probability and severity. Effectively we multiply the two together and we go, well, if we’ve got a catastrophic or critical risk. And it’s if we’ve got a more serious risk and it’s going to happen often that’s a big risk. That’s a high risk. Whereas, if we’ve got a low severity accident that we think will happen very, very rarely, then that’s a low risk. That’s great.

One thing to note here it’s easier to estimate the severity than it is the probability. It’s quite easy to under- or overestimate probability. Usually, because of the physical mechanism involved, it’s easier to estimate the severity correctly. If we look on the right-hand side, at negligible. We can see that if we’re confident that something is negligible, then it can be a low risk. But at the very most, it can only be a medium risk. We are effectively prioritizing negligible severity risks quite low down the pecking order.

Now, on the other side, if we think we’ve got a risk that could be catastrophic, we could kill somebody or do irreversible environmental damage, then, however improbable we think it is, it’s never going to be classified less than medium. That’s a good point to note. This matrix has been designed well, in the sense that all catastrophic and critical risks are never going to get the low medium and they can quite easily become serious or high. That means they’re going to get serious management attention. When you put risks up in front of a manager, senior person, a decision-maker, who’s responsible and they see red and orange, they’re going to get uncomfortable and they’re going to want to know all about that stuff. And they will want to be confident that we’ve understood the risk correctly and it’s as low as we can get it. This matrix is designed to get attention and to focus attention where it is needed.

And in this standard, in 88, you ultimately determine whether you can accept risk based on this risk rating. In 882, there is no unacceptable, intolerable risk. You can accept anything if you can persuade the right person with the right amount of authority to sign it off. And the higher the risk, the higher the level of authority you must get in order to accept the risk and expose people to it. This matrix is very important because it prioritizes attention. It prioritizes how much time and effort money gets spent on reducing risks. You will use it to rank things all the time and it also prioritizes, as we’ll see later, how often you review a risk because clearly, you don’t want to have high risks or serious risks. Those are going to get reviewed more often than a medium risk or low risk. A low risk might just get review routinely, not very often, maybe once a year or even less. We want to concentrate effort and attention on high risks and this matrix helps us to do that. But of course, no matrix is perfect.

Now, if we go back. Looking at the yellow highlight, we’re going to use table three unless there’s a tailored alternative definition, a tailored alternative matrix. Now, noting this matrix, catastrophic risk, the highest possible risk, we’ve got one death. Now, if we had a system where it was feasible to kill more than one person in an accident, then really, we would need another column worse than catastrophic. We could imagine that if you had a vehicle that had one person in it and the vehicle crashed, whatever it was, a motorbike let’s say. Let’s imagine you only said ‘We’re only going to have solo riders. We can only kill one person.’. We’re assuming we won’t hurt anybody else. But if you’ve got a car where you’ve got four or more people in, you could kill several people. If you’ve got a coach or a bus, you could drive it off a cliff and kill everybody, or you might have a fire and some people die, but most of them get out. You can see that for some vehicles, for some systems, you would need additional columns. Killing one person isn’t the worst conceivable accident.

Some systems. You might imagine quite easily, say with a ship, it’s actually very rare for a ship to sink and everybody dies. But it’s quite common for individuals on ships to die in health and safety type accidents, workplace accidents. In fact, being a merchant seaman is quite a risky occupation. But also in between those two, it’s also quite possible to have a fire or asphyxiating gases in a compartment. You can kill more than one person, but you won’t kill the entire ship’s company. Straight away in a ship, you can see there are three classes, if you like, of serious accidents where you can kill people. And we knew we should really differentiate between the three when we’re thinking about risk management. And this matrix doesn’t allow you to do that. If you’ve got a system where more than one death this is feasible, then this matrix isn’t necessarily going to serve well, because all of those types of accidents get shoved over into a catastrophic column, on this matrix, and you don’t differentiate between any of between them which is not helpful. You may need to tailor your matrix and add further columns.

And depending on the system, you might want to change the way that those risks are distributed. Because you might have a system, for example riding a bicycle. It’s very common riding a bicycle to get negligible type injuries. You know you fall off, cuts and bruises, that kind of thing. But, if you’re not on the road, let’s say you’re riding off-road it is quite rare to get utilities unless you do a mountain biking on some extreme environment. You’ve got to tailor the matrix for what you’re doing. I think we’ve talked about that enough. We’ll come back to that in later lessons, I’m sure.

Risk Mitigation

Risk mitigation, we’re doing this analysis, not for the sake of it, we’re doing it because we want to do something about it. We want to reduce the risk or eliminate it if we can. 88 2 standard gives us an order of precedence, and as it says it’s specified in section 4.3.4, but I’ve reproduced that here for convenience. Ideally, we would like to eliminate hazards by designing them. We would make a design decision to say, ‘We won’t have a petrol engine, let’s say, in this vehicle or vessel because petrol is a serious fire/explosion hazard. We’ll have something else. We’ll have diesel or we’ll have an all-electric vehicle maybe these days or something like that.’ We can eliminate the risk.

We could reduce the risk by altering the design introducing sort of failsafe features, or making the design crashworthy, or whatever it might be. We could add engineered features or devices to reduce risk safety features seatbelts in cars or airbags, roll balls, crash survivable cages around the people, whatever it might be. We can provide warning devices to say ‘Something’s going wrong here, and you need to pull over’ or whatever it is you need to do. ‘Watch out!’ because the system is failing and maybe ‘Your brakes are failing. You’ve got low brake fluid. Time to pull over now before it gets worse!’.

And then finally, the least effective precautions or mitigations signage, warning signs – because nobody reads warning signs, sadly. Procedures. Good, if they’re followed. Again, very often people don’t follow them. They cut corners. We train people. Again, they don’t always listen to the training or carry it out. And we provide PPE. That’s personal protective equipment. And again, PPE is great if you enforce it. For example, I live in Australia. If you cycle in Australia, if you ride a bicycle, it’s the law that you wear a bike helmet. Most people obey the law because they don’t want to get a $300 fine or whatever it is if the cops catch you, but you still see people around who don’t wear one. Presumably, because they think they’re bulletproof, and it will never happen to them.

PPE is fine if it’s useful. But of course, sometimes PPE can make a job so much harder that people discard it. We really need to think about designing a job to make it easy to do, if we’re going to ask people to wear awkward PPE. Also, by the way, we need to not ask them to wear PPE for trivial reasons just so that the managers can cover their backsides. If you ask people to wear PPE when they’re doing trivial jobs where they don’t need it then it brings the system into disrepute. And then people end up not wearing PPE for jobs where they really should be wearing it. You can over-specify safety and lose goodwill amongst your workers if you’re not careful.

Now those risk mitigation priorities, that’s the one in this standard, but you will see an order of precedence like that in many different countries in the law. It’s the law in Australia. It’s the law in the UK, for example, expressed slightly differently. It’s in lots of different standards for good reason because we want to design out the risks. We want to reduce them in the design because that’s more effective than trying to bolt on or stick home safety afterwards. And that’s another reason why we want to get in early in a project and think about our hazards and our risks early on. Because it’s cheaper at an early stage to say, ‘We will insist on certain things in the design. We will change the requirements to favour a design that is inherently safe.’

Contracting

We only get these things if we contract for them. The model in 88 2, the assumption is it’s a government somewhere contracting a contractor to do stuff. But it doesn’t have to be a government, it can be any client or purchase of world authority or end-user asking for something, buying something, contracting something, be it the physical system, or service, or whatever it might be. The assumption is that the client issues a request for proposal.

Right at the start, they say ‘I want a gizmo’. Or ‘I want- I don’t even want to specify that I want a gizmo. I want something that will do this job. I don’t care what it is. Give me something that will do this job.’ But even at that early stage, we should be asking for preliminary hazard analysis (PHA) to be done. We should be saying, ‘Well, who?’ ‘Which specialists?’ ‘Which functional disciplines need to be involved?’. We need to specify the data that we require and the format that it’s in. Considering, especially the tracking system, which is task 106. If we’re going to get data from lots of different people, best we get it in a standardized format we can put it all together. We want to insist that they identify hazards, hazardous locations, etc. We want to insist on getting technical data on non-developmental items, either getting it for the client or the client supplies it. Says to the contractor or doing it ‘This is the information that I’m going to supply you’ and you will use it. We need to supply the concept of operations and of course, the operating environment. Let me just check, no that that’s it. We’ve only got one slide on commentary. It doesn’t say the environment, but we do need to specify that as well, and hopefully, that should be in the concept of operations, and a specific hazard management requirement. For example, what matrix are we going to use? What is a suitable matrix to use for this system?

Now to do all of this, the purchaser, the client really probably needs to have done Task 202 and 201 themselves, and they’ve done some thinking about all of this in order to say, ‘With this system, we can envisage- with this kind of requirement, we can envisage these risks might be applicable.’ And ‘We think that the risks might be large or small’ depending on what the system is or ‘We think that-’. Let’s say if you purchase a jet fighter, jet fighters because of that demand, the overwhelming demand for performance, they tend to be a bit riskier than airliners. They fall out of the sky more often. But the advantage is that there are normally only one or two people on board. And jet fighters tend to fly a lot of the time in the middle of nowhere. You’re likely to hurt relatively few people, but it happens more often.

Whereas if you’re buying an airliner something, you can shove a couple of hundred people in at one go, those fall out of the sky much less frequently, thank goodness, but when they do, lots of people get hurt. Aa different approach to risk might be appropriate for different types of system. And when your, you should be thinking about early on, if you’re the client, if you’re the purchaser. You should have done some analysis to enable you to write a good request for proposal because if you write a bad request for proposal, it’s very difficult to recover the situation afterwards because you start at a disadvantage. And the only way often to fix it is to reissue the RFP and start again. And of course, nobody wants to do that because it’s expensive and it wastes a lot of time. And it’s very embarrassing. It is a career-limiting thing to do, a lot of people. You do need to do some work upfront in order to get your RFP correct. That’s what it says in the standard.

Commentary

I want to add a couple of comments, I’m not going to say the much. First, it’s a little line from a poem by Kipling that I find very, very helpful. And Kipling used to be a journalist and it was his job to go out and find out what the story was and report it. And to do that he used his six honest serving men. He asked ‘What?’ and ‘Why?’ and “When?’ and ‘Who?’, sorry, and ‘How?’ and ‘Where?’ and ‘Who?’. Those are all good questions to ask. If you can ask all those questions and get definite answers, you’re doing well. And a little tip here as a consultant, I rock up and one of the tricks of the trade I use is I turn up as the ‘dumb consultant’ – I always pretend to be a bit dumber than I really am- and I ask these stupid questions. And I ask the same questions to several different people. And if I get the same answer to the same question from everyone, I’m happy. But that doesn’t always happen. If you start getting very different answers to the same question from different people, then you think, ‘Okay, I need to do some more digging here’. And it’s the same with hazard analysis. Ask the what, why, when, where and who questions.

Another issue, of course, is ‘How much?’ ‘How much is this going to take?’ ‘How long is this going to take?’ ‘How many people am I going to have to invite to this meeting?’, etc. And that’s difficult. And really, the only way to answer these questions properly is to just do some PHI and PHA early and to learn from the results. The other alternative, which we are really good as human beings, is to ask the questions early to get answers that we don’t really like and then just to sweep them under the carpet and not ask those questions ever again because we’re frightened of the answers that we might. However frightened you are of the answer, you might get do ask the question because forewarned is forearmed. And if you know about a problem, you can do something about it. Even if that something is to rewrite your CV and start looking for another job. Do ask the questions even if it makes people uncomfortable. And I guess learning how to ask the questions without making people uncomfortable is one of the tricks that we must learn as safety engineers and consultants. And that’s an important part of the job. The soft skills really that you can only learn through practice, really, and observing people.

What’s the way to do it? Well, I’ve said this several times but do your PHI and PHA early. Do it as early as possible because it’s cheap to do it early. If you’re the only safety person or safety, you often in the beginning, maybe you’re a manager, maybe safety is part of your portfolio, you’ve got other responsibilities as well. Just sit down one day and ask these dumb questions, go through the checklist in Task 202 and say, ‘Do I have these things in my system?’

If you know for sure you’re not going to have explosive ordnance, or radiation, or whatever it might be, you can go, ‘Great. I can cross those off the list’. I can make an assumption or I can put a constraint in, by the way, if you really want to do it well and say ‘We will have no explosive devices’, ‘We will have no energetic materials.’, ‘We will have no radiation’ or whatever it might be. Make sure that you insist that you’ll have none of it then you can hopefully move on and never have to deal with those issues again.

Do the analysis early, but expect to repeat it because things change, and you learn more and more information comes in. But of course, the further you go down the project, the more expensive everything gets. Now, having said do it, do it early, the Catch 22 is very often people think ‘How can I analyse when I don’t have a design?’

The ‘Catch-22’ question is what comes first, design or analysis? Now, the truth is that you could do an analysis of very simple functions. You don’t need any design at all. You don’t even need to know what kind of vehicle or what kind of system you might be dealing with. But of course, that will only take you so far. And it may be that you want to do early analysis, but for whatever reason, [Intellectual Property Rights] IPR or whatever it might be, you can’t get access to data.

What do you do? You can’t get access to data about your system or the system that you’re replacing. What do you do? Well, one of the things you can do is you can borrow an idea from the logistics people. Logistic support analysis Task 203 is a baseline comparison system. Imagine that you’re going to have a new system, maybe is replacing an old system, but maybe it does a lot more than the old system used to do. Just looking at the old system isn’t going to give you the full picture. Maybe what you need to do is make up an imaginary comparison system. You take the old system and say, ‘Well, I’m adding all this extra functionality’. Maybe the old system, we just bought the vehicle. We didn’t buy the support system, we didn’t buy the weapons, we didn’t buy the training, whatever it might be. But, this time around, we’re buying the complete package. We’re going to have all this extra stuff that probably has hazards associated with it, but just doing lessons learned from the previous system will not be enough.

Maybe you need to construct an imaginary Baseline Comparison System and go, ‘I’ll borrow bits from all these other systems, put them all together, and then try and learn from that sort of composite system that I’ve invented, even though it’s imaginary.’ That can be a very powerful technique. You may get told, ‘Oh, we haven’t got the money’ or ‘We haven’t got the time to do that’. But to be honest, if there’s no other way of doing effective, early analysis, then spend the money and do it early. Because many times I’ve seen people go, ‘Oh, we haven’t got time to do that’. They’ve never got time to do it properly and therefore, you end up doing it. You go around the buoy two or three times. You do it badly. You do it again slightly less badly. You do it a third time. And it’s sort of barely adequate. And then you move forward. Well, you’ve wasted an awful lot of time and money and held up other people, the rest of the project doing that. Probably it’s better off to spend the money and just get on with it. And then you’re informed going forwards before you start to spend serious money elsewhere on the project.

Copyright Statement

Well, that’s it for me. Just one thing to say, that Mil. Standard 882E came out in 2012. Still going strong, unlikely to be replaced anytime soon. It’s copyright free. All the quotations are from the standard, they’re copyright free. But this video is copyright of The Safety Artisan 2020.

For More …

And you can find a lot more information, a lot more safety videos, at The Safety Artisan page at www.Patreon.com and you can find more resources at www.safetyartisan.com.

That is the end of the show. Thank you very much for listening. And it just remains for me to say. Come and watch some more videos on Mill-Std-882E. There’s going to be a complete course on them, and you should be able to get, I hope, a lot of value out of the course. So, until I see you again, cheers.


Back to the Home Page | Mil-Std-882 Page | System Safety Page

Professional | Pragmatic | Impartial