Transcript: Operating & Support Hazard Analysis

In the full-length session, The Safety Artisan looks at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. We explore Task 205’s aim, description, scope and contracting requirements. We also provide value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the pitfalls to avoid. We refer to other lessons for specific tools and techniques, such as Human Factors analysis methods.

Introduction

Hello everyone and welcome to the Safety Artisan; home of safety engineering training. I’m Simon and today we’re going to be carrying on with our series on Mil. Standard 882E system safety engineering.

Operating & Support Hazard Analysis

Today, we’re going to be moving on to the subject of operating and support hazard analysis. This is, as it says, task 206 under the standard. Operating and support hazard analysis, I’ll just call it ONS or OSHA(also O&SHA) for short. Unfortunately, that will confuse people if I call OSHA. Let’s call it ONS.

Topics for this Session

The purpose of ONS hazard analysis is to identify and assess hazards introduced by those activities and procedures and also to evaluate the adequacy of ONS procedures, processes, equipment, facilities, etc, to mitigate risks that have been already identified. A twofold task but a very big task. And as we’ll see, we’ve got lots of slides today on task description, and reporting, contracting, and commentary. As always, I present the full text as is of the task, which is copyright free, but I’m only going to talk about the things that are important. So, we’re not going to go through every little clause of the standard that would be pointless.

O&S Hazard Analysis (T206) – Purpose

Let’s get started with the purpose. As we’ve already said, it’s to identify and assess those hazards which are introduced by operational and support activities and procedures and evaluate their adequacy. So, we’re looking at operating the system, whatever it may be- And of course, this is a military standard, so we assume a military system, but not all military systems are weapon systems by any means. Not all are physical systems. So, there may be inventory management systems, management information systems, all kinds of stuff. So, does operating those systems and just supporting them, maintaining them are resupplying them, disposing of them, etc – Does that create any hazards or introduce any hazards? And how do we mitigate? That’s the purpose of the task.

Task Description (T206)

Let’s move on to the task description. Again, we’re assuming a contractor is performing the analysis, but that’s not necessarily the case.

Task Description (T206) #1

For this task, this actually says this typically begins during engineering and manufacturing development, or EMD.  So, we’re assuming an American style lifecycle for a big system and EMD comes after concept and requirements development. So, we are beginning to move into the very expensive stage of development for a system where we begin to commit serious money. It’s suggesting that O&SHA can wait until then which is fine in general unless you’ve identified any particularly novel hazards that will need to be dealt with earlier on. As it says, it should build on design hazard analyses, but we’ll also talk about the case later on when there is no design hazard analyses. And the O&SHA shall identify requirements or alternatives or eliminating hazards, mitigating risks, etc. This is one of those tasks where the human is very important – In fact, dominant to be honest. Both as a source of hazards and the potential victim of the associated risks. A lot of human-centric stuff going on here.

Task Description (T206) #2

As always, we’re going to think about the system configurations. We’re going to think about what we’re going to do with the system and the environment that we’re going to do it in. So, a familiar triad and I know I keep banging on about this, but this really is fundamental to bounding and therefore evaluating safety. We’ve got to know what the system is, what we’re doing with it, and the environment in which we’re doing it. Let’s move on.

Task Description (T206) #3

Again, Human Factors, regulatory requirements and particularly specified personnel requirements need to be thought of. Particularly for operating and support, we need to take into account the staffing and personnel concept that we have. It’s frighteningly easy to produce a system that needs so much maintenance, for example, or support activity that it is unaffordable. And lots and lots of military systems and, it must be said, government and commercial systems in the past have come in that required enormous amounts of support, which soon proved to be unaffordable or no one would sign up to the commitment required. So, lots of projects have simply died because the system was going to be too expensive to sustain. That’s a key point of what we’re doing with ONS here. It’s not just about health and safety. It’s about health and safety, which is affordable.

We also need to look at unplanned events. So, not just designed in things, but things introduced- It says human errors. Again, I’m going to re-emphasize it’s erroneous human action because human error makes it sound like a human is at fault. Whereas very often it’s the design or the concept or the requirements that are at fault and place unacceptable burdens on the human being. Again, lots of messy systems seen in the past, which didn’t quite work and we just kind of expected the operator to cope. And most of the time they cope and then every so often they have a bad day at the office or a bunch of factors come together and lots of people die. And then we blame the human. Well, it’s not the human’s fault at all. We put them in that position. And as always, we need to look at past- Past evaluations of related legacy systems and support operations. If you have good data about legacy systems or about similar systems that your organization or another organization has operated, then that’s gold dust. So, do make an effort to get hold of that information if you can. Maybe a trade association or some wider pan organization body can help you there.

Task Description (T206) #4

At a minimum, we’ve got to identify activities involving known hazards. This assumes that we’ve done some hazard analysis in the past, which is very important. We always need to do that. I’ll come back to that commentary. Secondly, changes needed in requirements, be they functional requirements – what we want the system to do. Or design requirements, if we put constraints on how the system may do it for whatever it may be, hardware, software, support equipment, whatever to make those hazard and risks more manageable. Requirements for safety features – so requirements for engineered features and devices, equipment, because always, in almost any jurisdiction, we will have a hierarchy of control that recognizes that designed and engineered in safety features are more effective than just relying on people to get it right. And then we’ve also got to communicate to people the hazards associated with the system. Warnings, cautions and whatever special emergency procedures might be required associated with the system. Again, that’s something that we see reinforced in law and regulations in many parts of the world. This is all good stuff. It’s accepted good practice all across the world.

Task Description (T206) #5

Moving on, we also need to think about how are we going to move the system around and the associated spares and supplies? How are we going to package them, handle them, stole them, transport them? Particularly if there are hazardous materials, etc, etc, involved. That’s the next part, G. Again, training requirements. We’re thinking about a human-centric approach. Whatever we expect people to do, they’ve got to be trained in how to do it. Point I, we’ve got to include everything, whether it’s developmental or non-developmental terms. We can’t just ignore stuff because it’s GFE or it’s off the shelf. It doesn’t mean it can never go wrong. Far from it. Particularly if we are putting stuff together that’s never been put together before in a novel combination or in a novel environment. Something that might be perfectly safe and stable in an air-conditioned office might start to do odd things in a much more corrosive and uncontrolled environment, let’s say.

We need to think about what modes might the system be potentially hazardous when under operative control. Particularly, we might think about degraded modes of operation. So, for whatever reason, a part of the system has gone wrong or the system has got into an operating environment within which it doesn’t operate as well as it could. It’s not in an optimal operating environment or state. The human being in control of it, we’re assuming, has still got to be able to operate the system, even if it’s only to shut it down or to get it back into a safer state or safer environment. We’ve got to think about all of those nuances.

Then because we’re talking about support as well, we need to think about a related legacy systems, facilities and processes which may provide background information. Also, of course, the system presumably will very often be operating alongside other systems or it will be supported by all systems maybe that exist or being procured separately. So, we’ve got to think about all those interactions as well and all those potential contributions. As you can see, this is quite a wide-ranging, broadly-scoped task.

Task Description (T206) #6

Finally, on this section, the customer/the end-user/or whoever may specify some specific analysis techniques. Very often they will not. So, whoever is doing the analysis, be they a contractor or third party outside agency, needs to make sure that whatever they propose to do is going to be acceptable to the program manager. In the sense that it is going to be compatible and relevant and useful. And then finally, the contractor has got to do some O&SHA at the appropriate time but maybe more detailed data will come along later. In which case that needs to be incorporated and also operational changes.

An absolute classic [situation] with military and non-military systems is; the system gets designed, it goes into test and evaluation and we discover that things- assumptions that were made during development- don’t actually hold up. The real world isn’t like that or whatever it might be and we find we’re making changes- making changes in assumptions. Those need to be factored in which, sadly, is often not done very well. So, that’s an important point to think about. What’s my change control mechanism and how will the people doing the and O&SHA find out about these changes? Because very often it’s easy to assume that everybody knows about this stuff but when you start making assumptions, the truth is that it very often goes adrift.

Reporting (T206) #1

Let’s talk about reporting- Just a couple of slides here. In the reporting, there’s some fairly standard stuff in here, the physical and functional characteristics of the system- that’s important. Again, we might assume that everybody knows what they are, but it’s important to put them in. It may be that the people doing the analysis were given a different system description to the people developing the system, to the people doing the personnel planning, etc. All the different things that have to be brought together, we need to make sure that they join up again. It’s too easy to get that wrong. Reinforcing the point I made on the previous slide, as more detailed descriptions and specifications come in that needs to be supplied when it becomes available and provided.

Hazard analysis methods and techniques. What techniques are we using? Give a description. If you’re doing it to a particular standard, so much the better. Great- that saves a lot of paper. What assumptions that we made? What data, both qualitative and quantitative have we used to support analysis? That all needs to be declared. By the way, one of the reasons is to be declared is that when things change- not if- that’s when these assumptions and the data and the techniques get exposed. So, if there are changes, if we don’t have this kind of information declared, we can’t assess the impact changes. And it gets even more difficult to keep up with what’s going on.

Reporting (T206) #2

And then hazard analysis results. Again, the leading particulars of the results should be recorded in the hazard tracking system, the HTS, or hazard log, or risk register- whatever you want to call it. But there will be more detailed information that we wouldn’t want to clutter up the risk register with and we also need to provide warnings, cautions and procedures to be included in maintenance manuals, training courses, operator manuals, etc. So, we’re going to or we’re probably going to generate an awful lot of data out of this task and that needs to be provided in a suitable format. Again, whoever the program manager on the client-side, or is the end-user representation, needs to think about this stuff quite early on.

Contracting

That leads us neatly on to contracting. Now, this task, in theory, can be specified a little bit down the track, after the program started. In practice, what you find is program managers tried to specify everything upfront in a single contract for various reasons.

There are good reasons for doing that sometimes. Also, there are bad reasons but I’m not going to talk about that in this session. We’ll have a talk about planning your system safety program in another session. There’s a lot of nuances in there to be considered.

Contracting #1

Just sticking to this task, identification of functional disciplines – who do we need to get involved in order to do this work properly? It’s likely that the safety team if you have one, may not have relevant operating experience or relevant sustainment experience for this kind of system. If they do, that’s fantastic but that doesn’t negate the read the requirement to get the end-user represented and involved. In fact, that’s a near legal requirement in Australia, for example, and in some other jurisdictions. We need to get the end-users involved. We need the discipline specialist to get involved. Typically, your integrated logistic support team, your reliability people, your maintainability, and your testability people, if you have those disciplines. Or maybe you’re calling them something else, it doesn’t really matter.

We need to know what are the reporting requirements. What, if any, analysis methods and techniques do we desire to be used? Maybe the client or end-user has got to jump through some regulatory hoops and therefore they need specific analysis work and safety results to be done and produced. If that’s the case, then that needs to be specified in the contract. And what data is to be generated in what format? And how is it to be reported on when, etc? Considering the hazard tracking system, etc? And then the client may also select or specify known hazards, known hazardous areas, or other specific items to be examined or excluded because maybe it’s being covered elsewhere or we don’t expect the contractor to be able to do this stuff. Maybe we need to use a specialist organization. Again, maybe a regulator has directed us to do so. So, all of these things need to be thought about when we’re putting together the contract requirements for task 206.

Contracting #2

Again, I say this every time, we need to include all items within the scope of the system and the environment, not just developmental stuff. In fact, these days, maybe the majority of programs that I am seeing are mostly non-developmental. So, we’re taking lots of COTS stuff, GFE components and putting it all together. That’s all going to be included, particularly integration.

We need to think about legacy and related processes and the hazard analysis associated with them if we can get them. They should be supplied to whoever is doing the work and an analyst should be directed to review them and include lessons learned.

Then, reinforcing the previous point that has a tracking system- How will information reported in this task be correlated with tasks and analyses that are being done maybe elsewhere or by different teams? And the example here is 207 health hazard analysis. I’ll talk a little bit about the linkages between the two later. But it’s quite likely in this sort of area there will be large groups of people thinking about operations and maintenance and support. Very often those groups are very different. Sometimes they don’t even talk to each other. That’s the culture in different organizations. You don’t see airline pilots hanging around with baggage handlers very much, do you, down the pub for whatever reason? Different set of people- they don’t always mix very much. And again, you may also have different specialist disciplines, especially the Human Factors people. Again, you’ve got to tie everything in there. So, there’s going to be lots of interfaces in this kind of task that they’ve got to be managed.

Point I – concept of operations. Yes, that’s in every task. You’ve got to understand what we intend to do with this system or what the end-user intends to do with the system in order to have some context for the analysis.

And then finally, what risk definitions and what risk matrix are we using? If we’re not using the standard 882 matrix, then what are we doing?

Commentary

I’ve got four slides of commentary now – a number of things to say about Task 206.

Commentary #1

Now, I’ve picked an Australian example. So, Task 206 ties in very neatly with Australian WHS requirements. I suspect Australian WHS requirements have been strongly influenced by American OSHA and system safety practices. In Australia, we are heavily influenced by the US approach. This standard and legal requirements in Australia, and in many other states and territories let’s be honest, do tie in nicely with the standard. Although not always perfectly, you’ve got to remember that. So, we do need to focus on operations and support activities. That’s a big part of WHS, thinking about all relevant activities and cradle to grave – the whole life of the system. We need to think about the working environment, the workplace. We need to think about humans as an integral part of the system, be they operators or maintainers, suppliers, other kinds of sustainers. And we need to be providing relevant information on hazards, risks, warnings, trainings, and procedures, and requirements for PPE, and so on and so forth to workers.

So, task 206 is going to be absolutely vital to achieving WHS compliance in Australia and compliance with health and safety legislation and regulations in many parts of the world. In the US and UK and I would say in virtually all developed nations. So, this is a very important task for achieving compliance with the law and regulations. It needs to get the requisite amount of attention- It doesn’t always. People so often on a program during procurement and acquisition development, the technical system is the sexy thing. That’s the thing that gets all the attention, especially early on. The operating and particularly the support side tends to get neglected because it’s not so sexy. We don’t buy a system to support it after all do we? We buy a system to do a job. So, we get the operators in and we get their input on how to optimize the system to do the job most cost-effectively and with most mission effectiveness that we can get out of it. We don’t often think about support effectiveness. But to achieve WHS compliance or the equivalent this is a very important task so we will almost always need to do it.

Commentary #2

The second item to think about – what is going to be key for the maintenance support side is a technique called Job Safety Analysis or Job Hazard Analysis. I’ve highlighted a couple of sources of information there, particularly I would recommend going to the American www.OSHA.gov site and the guidance that they provide on how to do a job hazard analysis. So, use that or use something else if something different is specified in the jurisdiction you’re working it, then go ahead and use that. But if you don’t have any [guidance] on what to do, this will help you.

This is all about – I’ve got a task to do, whatever it might be doing, how do I do it? Let’s analyse this step-by-step, or at least in reasonable size chunks, thinking about how we do the tasks that need to be done. Now, there’s the operator side, and then, of course, we’re always dealing with human beings working on the system or working with the system. So, we’re going to be seeing potentially a lot of Human Factors type techniques being relevant. And there are lots of tasks that we can think about, Hierarchical Task Analysis and that kind of approach is going to fit in with the Job Hazard Analysis as well. Those are going to link together quite well. There will also be things like workload analysis. Particularly for the operators, if we’re asking the operator to do a lot and to maintain a particular level of concentration or respond rapidly, we need to think about workload and too much workload and too little workload can make things worse.

There are lots of techniques out there, I’m not going to talk about Human Factors here. I’m going to be putting on a series on Human Factors techniques in cooperation with a specialist in that area. So, I’m not going to say more here.

For certain kinds of operators, let’s say pilots, people navigating a ship and so on, drivers, there will be well-established ways that those kind of operators are trained the way they have to operate. There will often be a legal framework and a regulatory framework that says how they have to operate. And then that may direct a particular kind of analysis to be done or a particular approach to be taken for how operators do their jobs. But equally, there are a vast range of operator roles in industry, in chemical plants. Various specialist operating roles where there’s an industry-specific approach to doing things. Or indeed the general approach may be left up to whoever is developing system. So, there’s a huge range of approaches here that are going to be largely dictated by the concept of operations and also an awareness of what is relevant law, regulation and good practice in a particular industry, in a particular situation. That’s where doing your Task 203, your safety requirements analysis really kicks in. It’s a very broad subject we’re covering here. You’ve got to get the specialist in to do it well.

Contracting #3

Now, I mention that these days we’re seeing more and more legacy and COTS systems being used and repurposed. Partly to save time and money. We’re not developing mega systems as often as we used to, particularly in defence, but also in many other walks of life as well. So, we may find ourselves evaluating a system where very little technical hazard analysis has been done because there are no developmental items and it’s even difficult to do analysis on legacy or a COTS system because we cannot get the data to do so. Perhaps we can’t get the data for commercial reasons, contractual reasons.

Or maybe we’ve got a legacy system that was developed in a different jurisdiction and whatever information is available with it just doesn’t fit the jurisdictional regulatory system that we’ve got to work in where we want to operate the system. This is very common. Australia, for example, [acquires] a lot of systems from abroad, which have not been developed in line with how we normally do things.

We could in theory just do Task 206 if there was no developmental hazard analysis to do but that’s not quite true. At a minimum, we will always need to do some Preliminary Hazard Listing and hazard analysis – that’s Tasks 201 and 202 respectively. And we will very definitely need to do some System Requirements Hazard Analysis, Task 203, to understand what we need to do for a particular system in a particular application, operating environment, and regulatory jurisdiction. So, we’re always going to have to do those and we may well have to look at the integration of COTS things and do some system-level analysis. That’s 204. We’re definitely going to need to do the early analyses. In fact, the client and the end-user representatives should be doing 201, 202 and 203 and then we may be in a position to finish things off with 206 for certain systems.

Contracting #4

Now, having said that, I’ve mentioned already that Task 206 can be very broad in scope and very wide-ranging. There’s a danger that we will turn Task 206 into a bottomless pit into which we pour money and effort and time without end. So, for most systems, we cannot afford to just do O&SHA, blanket across the board without any discernment or any prioritization.

So, we need to look at those other hazard analyses and prioritize those areas where people could get hurt. Particularly we should be using legacy and historical data here to say “What does – in reality, what does hurt people when looking after these systems or operating systems?” Again, as I’ve said before, in many industries there is a standard industry approach or good practice to how certain systems are operated, and maintained, and supported. So, if there is a standard industry approach available – particularly if we can justify that by available historical data – if that [is as good] as doing analysis, then why not just use the standard approach? It’s going to be easier to make a SFARP or a ALARP argument that way anyway. And why spend the money on analysis when we don’t have to? We could just spend the money on actually making the system safer. So, let’s not do analysis for the sake of doing analysis.

Also, there’s a strong synergy between the later tasks in the 200 series. There’s a strong linkage between this Task 206 and 207, which is Health Hazard Analysis. Also, there can be a strong linkage between Task 210, which is the Environmental Hazard Analysis. So, this trio of tasks focuses on the impact on living things, whether they be human beings or animals and plants and ecosystems and very often there’s a lot of overlap between them. For example, hazardous chemicals that are dangerous for humans are often dangerous for animals and plants and watercourses and so on and so forth. I’ll be talking about that more in the next session on Task 207.

One word of warning, however. Certainly, in Australia, we have got fixated on hazardous chemicals because we’ve had some very high-profile scandals involving HAZCHEM in the past. Now, there’s nothing wrong, of course, with learning from experience and applying rigorous standards when we know things have gone wrong in the past. But sometimes we go into a mindset of analysis for analysis sake. Dare I say, to cover people’s backsides rather than to do something useful. So, we need to focus on whether the presence of a HAZCHEM could be a problem. Whether people get exposed to it, not just that it’s there.

Certain chemicals may be quite benign in certain circumstances, and they only become dangerous after an emergency, for example. There are lots of things in the system that are perfectly safe until the system catches fire. Then when you’re trying to dispose or repair a fire damage system that can be very dangerous, for example. So, we need to be sensible about how we go about these things. Anyway, more on that in the next session.

Copyright Statement

That’s the commentary that I have on Task 206. As we said, it links very tightly with other things and we will talk about those in later sessions. I just like to point out that the “italic text in quotations” is from the Mil. standard. That is copyright free as most American government standards are. However, this presentation and my commentary, etc. are copyright of the Safety Artisan 2020.

For More …

Now, for all lessons and resources, please do visit the www.safetyartisan.com. Now, as you’ll notice, it’s an https – it’s a secure website. Also, you can go and see the Safety Artisan page at www.patreon.com/SafetyArtisan.

End

So, that is the end of the lesson and it just remains for me to say thank you very much for your time and for listening. And I look forward to seeing you again soon. Cheers.

Back to the Home Page | Mil-Std-882 Page | System Safety Page

#Safety #Engineering #Training

Mil-Std-882E Operating & Support Hazard Analysis

This is Mil-Std-882E Operating & Support Hazard Analysis (O&SHA).
Back to: Task 205.

The 200-series tasks fall into several natural groups. Task 206 addresses Operating & Support Analysis.

In the full-length session, The Safety Artisan looks at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. We explore Task 205’s aim, description, scope and contracting requirements. We also provide value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the pitfalls to avoid. We refer to other lessons for specific tools and techniques, such as Human Factors analysis methods.

The text from the standard follows:

OPERATING AND SUPPORT HAZARD ANALYSIS

206.1 Purpose. Task 206 is to perform and document an Operating and Support Hazard Analysis (O&SHA) to identify and assess hazards introduced by operational and support activities and procedures; and to evaluate the adequacy of operational and support procedures, facilities, processes, and equipment used to mitigate risks associated with identified hazards.

206.2 Task description. The contractor shall perform and document an O&SHA that typically begins during Engineering and Manufacturing Development (EMD) and builds on system design hazard analyses. The O&SHA shall identify the requirements (or alternatives) needed to eliminate hazards or mitigate the associated risks for hazards that could not be eliminated. The human shall be considered an element of the total system, receiving both inputs and initiating outputs within the analysis.

206.2.1 The O&SHA considers the following:

a. Planned system configuration(s)

b. Facility/installation interfaces to the system

c. Planned operation and support environments

d. Supporting tools or other equipment

e. Operating and support procedures

f. Task sequence, concurrent task effects, and limitations

g. Human factors, regulatory, or contractually specified personnel requirements

h. Potential for unplanned events, including hazards introduced by human errors

i. Past evaluations of related legacy systems and their support operations

206.2.2 At a minimum, the analysis shall identify:

a. Activities involving known hazards; the time periods, approximate frequency, and numbers of personnel involved; and the actions required to minimize risk during these activities.

b. Changes needed in functional or design requirements for system hardware, software, facilities, tooling, or support/test equipment to eliminate hazards or mitigate the associated risks for hazards that could not be eliminated.

c. Requirements for engineered features, devices, and equipment to eliminate hazards or reduce risk.

d. Requirements for Personal Protective Equipment (PPE), to include its limitations.

e. Warnings, cautions, and special emergency procedures.

f. Requirements for packaging, handling, storage, and transportation to eliminate hazards or reduce risk.

g. Requirements for packaging, handling, storage, transportation, and disposal of Hazardous Materials (HAZMAT) and hazardous wastes.

h. Training requirements.

i. Effects of Commercial-Off-the-Shelf (COTS), Government-Off-the-Shelf (GOTS), Government-Furnished Equipment (GFE) and Non-Developmental Item (NDI) hardware and software across interfaces with other system components or subsystems.

j. Potentially hazardous system modes under operator control.

k. Related legacy systems, facilities, and processes which may provide background information relevant to operating and supporting hazard analysis.

206.2.3 If no specific analysis techniques are directed or if the contractor recommends a different technique than the one specified by the Program Manager (PM), the contractor shall obtain PM approval of the technique(s) to be used before performing the analysis.

206.2.4 The contractor shall update the O&SHA following system design or operational changes as necessary.

206.2.5 The contractor shall document the results of the analysis to include the following information:

a. System description. This summary describes the physical and functional characteristics of the system and its subsystems. Reference to more detailed system and subsystem descriptions, including specifications and detailed review documentation, shall be supplied when such documentation is available.

b. Hazard analysis methods and techniques. Provide a description of each method and technique used in conduct of the analysis. Include a description of assumptions made for each analysis and the qualitative or quantitative data used.

c. Hazard analysis results. Contents and formats may vary according to the individual requirements of the program and methods and techniques used. As applicable, analysis results should be captured in the Hazard Tracking System (HTS). Ensure the results include a complete list of warnings, cautions, and procedures required in operating and maintenance manuals and for training courses.

206.3 Details to be specified. The Request for Proposal (RFP) and Statement of Work (SOW) shall include the following, as applicable:

a. Imposition of Task 206. (R)

b. Identification of functional discipline(s) to be addressed by this task. (R)

c. Minimum reporting requirements. (R)

d. Desired analysis methodologies and technique(s) and any special data elements, format, or data reporting requirements (consider Task 106, Hazard Tracking System).

e. Selected hazards, hazardous areas, or other specific items to be examined or excluded.

f. COTS, GOTS, NDI, and GFE technical data to enable the contractor to accomplish the defined task.

g. Legacy and related processes and equipment and associated hazard analyses to be reviewed.

h. How information reported in this task will be correlated with tasks and analyses that may provide related information, such as Task 207 (Health Hazard Analysis).

i. Concept of operations.

j. Other specific hazard management requirements, e.g., specific risk definitions and matrix to be used on this program.

Forward to the next excerpt: Task 207

Back to the Home Page | Mil-Std-882 Page | System Safety Page

#Safety #Engineering #Training

Mil-Std-882E System Hazard Analysis (Task 205)

This is Mil-Std-882E System Hazard Analysis (SHA).
Back to: Task 204.

The 200-series tasks fall into several natural groups. Task 205 addresses System Hazard Analysis.

In the 45-minute video, The Safety Artisan looks at System Hazard Analysis, or SHA, which is Task 205 in Mil-Std-882E. We explore Task 205’s aim, description, scope and contracting requirements. We also provide value-adding commentary, which explains SHA – how to use it to complement Sub-System Hazard Analysis (SSHA, Task 204) in order to get the maximum benefits for your System Safety Program.

The text from the standard follows:

“SYSTEM HAZARD ANALYSIS

205.1 Purpose. Task 205 is to perform and document a System Hazard Analysis (SHA) to verify system compliance with requirements to eliminate hazards or reduce the associated risks; to identify previously unidentified hazards associated with the subsystem interfaces and faults; identify hazards associated with the integrated system design, including software and subsystem interfaces; and to recommend actions necessary to eliminate identified hazards or mitigate their associated risks.

[Task Description]

205.2 Task description. The contractor shall perform and document an SHA to identify hazards and mitigation measures in the integrated system design, including software and subsystem and human interfaces. This analysis shall include interfaces associated with Commercial-Off-theShelf (COTS), Government-Off-the-Shelf (GOTS), Government-Furnished Equipment (GFE), Non-Developmental Items (NDI), and software. Areas to consider include performance, performance degradation, functional failures, timing errors, design errors or defects, and inadvertent functioning. While conducting this analysis, the human shall be considered a component within the system, receiving both inputs and initiating outputs.

205.2.1 This analysis shall include a review of subsystems interrelationships for:

a. Verification of system compliance with requirements to eliminate hazards or reduce the associated risks.

b. Identification of previously unidentified hazards associated with design of the system. Recommend actions necessary to eliminate these hazards or mitigate their associated risk.

c. Possible independent, dependent, and simultaneous events, including system failures, failures of safety devices, common cause failures, and system interactions that could create a hazard or result in an increase in risk.

d. Degradation of a subsystem or the total system.

e. Design changes that affect subsystems.

f. Effects of human errors.

g. Determination:

(1) Of potential contribution of hardware and software events (including those that are developed by other contractors/sources, COTS, GOTS, NDIs, and GFE hardware or software), faults, and occurrences (such as improper timing) on the potential for mishaps.

(2) Of whether design requirements in the system specifications have been satisfied.

(3) Of whether the methods of implementing the system design requirements and mitigation measures have introduced any new hazards.

205.2.2 If no specific analysis techniques are directed or if the contractor recommends a different technique than the one specified by the Program Manager (PM), the contractor shall obtain PM approval of techniques to be used before performing the analysis.

205.2.3 When software to be used within the system is being developed under a separate software development effort, the contractor performing the SHA shall monitor, obtain, and use the output of each phase of the formal software development process in evaluating the software contribution to the SHA. Hazards identified that require mitigation action by the software developer shall be reported to the PM in order to request appropriate direction be provided to the software developers.

205.2.4 The contractor shall evaluate system design changes, including software design changes, and update the SHA as necessary.

205.2.5. The contractor shall prepare a report that contains the results from the task described in paragraph 205.2 and includes:

a. System description. The system description provides the physical and functional characteristics of the system and its subsystem interfaces. Reference to more detailed system and subsystem descriptions, including specifications and detailed review documentation, shall be supplied when such documentation is available.

b. Hazard analysis methods and techniques. Provide a description of each method and technique used in conduct of the analysis. Include a description of assumptions made for each analysis and the qualitative or quantitative data used.

c. Hazard analysis results. Contents and formats may vary according to the individual requirements of the program and methods and techniques used. As applicable, analysis results should be captured in the Hazard Tracking System (HTS).

[Contracting]

205.3 Details to be specified. The Request for Proposal (RFP) and Statement of Work (SOW) shall include the following, as applicable:

a. Imposition of Task 205. (R)

b. Identification of functional discipline(s) to be addressed by this task. (R)

c. Desired analysis methodologies and technique(s) and any special data elements, format, or data reporting requirements (consider Task 106, Hazard Tracking System). d. Selected hazards, hazardous areas, or other specific items to be examined or excluded.

e. COTS, GOTS, NDI, and GFE technical data to enable the contractor to accomplish the defined task.

f. Concept of operations.

g. Other specific hazard management requirements, e.g., specific risk definitions and matrix to be used on this program.

Forward to the next excerpt: Task 206

Back to the Home Page | Mil-Std-882 Page | System Safety Page

Transcript: System Hazard Analysis (T205)

Here is the full transcript: System Hazard Analysis.

In the 45-minute video, The Safety Artisan looks at System Hazard Analysis, or SHA, which is Task 205 in Mil-Std-882E. We explore Task 205’s aim, description, scope and contracting requirements. We also provide value-adding commentary, which explains SHA – how to use it to complement Sub-System Hazard Analysis (SSHA, Task 204) in order to get the maximum benefits for your System Safety Program.

Introduction

Hello, everyone, and welcome to the Safety Artisan, where you will find professional, pragmatic, and impartial safety training resources and videos. I’m Simon, your host, and I’m recording this on the 13th of April 2020. And given the circumstances when I record this, I hope this finds you all well.

System Hazard Analysis Task 205

Let’s get on to our topic for today, which is System Hazard Analysis. Now, system hazard analysis is, as you may know, is Task 205 in the Mil. Standard 882E system safety standard.

Topics for this Session

What we’re going to cover in this session is purpose, task description, reporting, contracting and some commentary – although I’ll be making commentary all the way through. Going to the back to the top, the yellow highlighting with this and with task 204, I’m using the yellow highlighting to indicate differences between 205 and 204 because they are superficially quite similar. And then I’m using underlining to emphasize those things that I want to really bring to your attention and emphasize. Within task 205, purpose. We’ve got four purposes for this one. Verify subsistent compliance and recommend necessary actions – fourth one there. And then in the middle of the sandwich, we’ve got identification of hazards, both between the subsystem interfaces and faults from the subsystem propagating upwards to the overall system and identifying hazards in the integrated system design. So, quite different emphasis to 204 which was really thinking about subsystems in isolation. We’ve got five slides of task description, a couple on reporting, one on contracting – nothing new there – and several commentaries.

System Requirements Hazard Analysis (T205)

Let’s get straight on with it. The purpose, as we’ve already said, there is a three-fold purpose here; Verify system compliance, hazard identification and recommended actions, and then, as we can see in the yellow, the identifying previously unidentified hazards is split into two. Looking at subsystem interfaces and faults and the integration of the overall system design. And you can see the yellow bit, that’s different from 204 where we are taking this much higher-level view, taking an inter subsystem view and then an integrated view.

Task Description (T205) #1

On to the task description. The contract has got to do it and documented, as usual, looking at hazards and mitigations, or controls, in the integrated system design, including software and human interface. It’s very important that we’ll come onto that later. All the usual stuff about we’ve got to include COTS, GOTS, GFE and NDI. So, even if stuff is not being developed, if we’re putting together a jigsaw system from existing pieces, we’ve still got to look at the overall thing. And as with 204, we go down to the underlined text at the bottom of the slide, areas to consider. Think about performance, and degradation of performance, functional failures, timing and design errors, defects, inadvertent functioning – that classic functional failure analysis that we’ve seen before. And again, while conducting this analysis, we’ve got to include human beings as an integral component of the system, receiving inputs, and initiating outputs.  Human factors were included in this standard from long ago.

Task Description (T205) #2

Slide two. We’ve got to include a review of subsystem interrelationships. The assumption is that we’ve previously done task 204 down at a low level and now we’re building up to task 205. Again, verification of system compliance with requirements (A.), identification of new hazards and emergent hazards, recommendations for actions (B.), but Part C is really the new bit. We are looking at possible independent, dependent, and simultaneous events (C.) including system failures, failures of safety devices, common cause failures, and system interactions that could create a hazard or increase risk. And this is really the new stuff in 205 and we are going to emphasize in the commentary, you’re going to look very carefully at those underlying things because they are key to understanding task 205.

Task Description (T205) #3

Moving on to Slide 3, all new stuff, all in yellow. Degradation of the system or the total system (D.), design changes that affect subsystems (E.). Now, I’ve underlined this because what’s the constant in projects? It’s change. You start off thinking you’re going to do something and maybe the concept changes subtly or not so subtly during the project. Maybe your assumptions change the schedule changes, the resources available change. You thought you were going to get access to something, but it turns out that you’re not. So, all these things can change and cause problems, quite frankly, as I am sure we know. So, we need to deal with not just the program as we started out, but the program as it turns out to be – as it’s actually implemented. And that’s something I’ve seen often go awry because people hold on to what they started out with, partly because they’re frightened of change and also because of the work of really taking note changes. And it takes a really disciplined program or project manager to push back on random change and to control it well, and then think through the implications. So, that’s where strength of leadership comes in, but it is difficult to do.

Moving on now. It says effects of human errors (F.) in the blue, I’ve changed that. Human error implies that the human is at fault, that the human made a mistake. But very often, we design suboptimal systems and we just expect the human operator to cope. Whether it’s fair or unfair or unreasonable, it results in accidents. So, what we need to think about more generally is erroneous human action. So, something has gone wrong but it’s not necessarily the humans’ fault. Maybe the system has induced the human to make an error. We need to think very carefully about.

Moving on, determination (G.), potential contribution of all those components in G. 1. As we said before, all the non-developmental stuff. G.2, have design requirements in the specifications being satisfied? This standard emphasizes specifications and meeting requirements, we’ve discussed that in other lessons. G.3 and whether methods of system implementation have introduced any new hazards. Because of course, in the attempted to control hazards, we may introduce technology or plant or substances that themselves can create problems. So, we need to be wary of that.

Task Description (T205) #4

Moving on to slide four. Now, in 205.2.2, the assumption here is that the PM has specified methods to be used by the contractor. That’s not necessarily true, the PM may not be an expert in this stuff. While they may for contractual or whatever reasons have decided we want the contractor to decide what techniques to use. But the assumption here is that the PM has control and if the contractor decides they want to do something different they’ve got to get the PM’s authority to do that. This is assuming, of course, that the this has been specified in the contract.

And 205.2.3, whichever contractor is performing the system hazard analysis, the SHA, they are expected to have oversight of software development that’s going to be part of their system. And again, that doesn’t happen unless it’s contracted. So, if you don’t ask for it, you’re not going to get it because it costs money. So, if the ultimate client doesn’t insist on this in the contract and police it to be fair because it’s all very well asking for stuff. If you never check what you’re getting or what’s going on, you can’t be sure that it’s really happening. As an American Admiral Rickover once said, “You get the safety you inspect”. So, if you don’t inspect it, don’t expect to get anything in particular, or it’s an unknown. And again, if anything requires mitigation, the expectation in the standard is that it will be reported to the PM, the client PM this is and that they will have authority. This is an assumption in the way that the standard works. If you’re not going to run your project like that, then you need to think through the implications of using this standard and manage accordingly.

Task Description (T205) #5

And the final slide on task description. We’ve got another reminder that the contractor performing the SHA shall evaluate design changes. Again, if the client doesn’t contract for this it won’t necessarily happen. Or indeed, if the client doesn’t communicate that things have changed to the contractor or the subcontractors don’t communicate with the prime contractor then this won’t happen. So, we need to put in place communication channels and insist that these things happen. Configuration control, and so forth, is a good tool for making sure that this happens.

Reporting (T205) #1

So, if we move on to reporting, we’ve got two slides on this. No surprises, the contractor shall prepare a report that contains the results from the analysis as described. First, part A, we’ve got to have a system description. Including the physical and functional characteristics and subsystem interfaces. Again, always important, if we don’t have that system description, we don’t have the context to understand the hazard analysis that had been done or not being done for whatever reason. And the expectation is that there will be reference to more detailed information as and when it becomes available. So maybe detailed design stuff isn’t going to emerge until later, but it has to be included. Again, this has got to be required.

Reporting (T205) #2

Moving onto parts B and C. Part B as before we need to provide a description of each analysis method used, the assumptions made, and the data used in that analysis. Again, if you don’t do this, if you don’t include this description, it’s very hard for anybody to independently verify that what has been done is correct, complete, and consistent. And without that assurance, then that’s going to undermine the whole purpose of doing the analysis in the first place.

And then part C, we’ve got to provide the analysis results and at the bottom of this subparagraph is the assumption. The analysis results could be captured in the hazard tracking system, say the hazard log, but I would only expect the sort of leading to be captured in that hazard log. And the detail is going to be in the task 205 hazard analysis report, or whatever you’re calling it. We’ve talked about that before, so I’m not going to get into that here.

Contracting

And then the final bit of quotation from the standard is contracting. And again, it’s all the same things that you’ve seen before. We need to require the task to be completed. It’s no good just saying apply Mil. Standard 882E because the contractor, if they understand 882E, they will tailor it to suit selves, not the client. Or if they don’t understand 882E they may not do it at all, or just do it badly. Or indeed they may just produce a bunch of reports that have got all the right headings in as the data item description, which is usually supplied in the contract, but there may be no useful data under those headings. So, if you haven’t made it clear to the contractor, they need to conduct this analysis and then report on the results – I know it sounds obvious. I know this sounds silly having to say this, but I’ve seen it happen. You’ve got a contractor that does not understand what system safety is.

(Mind you, why have you contracted them in the first place to do this? You should know that you should have done your research, found out.)

But if it’s new to them, you’re going to have to explain it to them in words of one syllable or get somebody else to do it for them. And in my day job, this is very often what consultancies get called in to do. You’ve got a contractor who maybe is expert building tanks, or planes, or ships, or chemical plants, or whatever it might be, but they’re not expert in doing this kind of stuff. So, you bring in a specialist. And that’s part of my day job.

So, getting back to the subject. Yes, we’ve got to specify this stuff. We’ve got to specify it early, which implies that the client has done quite a lot of work to work this all out. And again, the client may above the line, as we say, say engage a consultant or whoever to help them with this, a specialist. We’ve got to include all of the details that are necessary. And of course, how do you know what’s necessary, unless you’ve worked it out. And you’ve got to supply the contractor, it says concept of operations, but really supplying the contractor with as much relevant data and information as you can, without bogging them down. But that context is important to getting good results and getting a successful program.

Illustration

I’ve got a little illustration here. The supposition in the standard in Task 205 is we’ve got a number of subsystems and there may be some other building blocks in there as well. And some infrastructure we’ve going to have probably some users, we’re going to have an operating environment, and maybe some external systems that our system, or the system of interest, interfaces with or interacts with in some way. And that interaction might be deliberate, or it might be just in the same operating environment at night. And they will interact intentionally or otherwise.

Commentary – Go Early

With that picture in mind, let’s think about some important points. And the first one is to get 205, get some 205-work done early. Now, the implication in the standard by the numbering and when you read the text is that subsystem hazard analysis comes first. You do those hexagonal building blocks first and then you build it up and task 205 comes after the subsystem hazard analysis. You thought, “Well, you’ve already got the SHHAs for each subsystem and then you build the SHA on top”. However, if you don’t do 205 early, you’re going to lose an opportunity to influence the design and to improve your system requirements. So, it’s worth doing an initial pass of 205 first, top-down, before you do the 204 hexagons and then come back up and redo 205. So, the first pass is done early to gain insight, to influence the design, and to improve your requirements, and to improve, let’s say, the prime contractor’s appreciation and reporting of what they are doing. And that’s really, dare I say, a quick and dirty stab at 205 could be quite cheap and will probably the payback/the return on investment should be large if you do it early enough. And of course, act on the results.

And then the second part is more about verifying compliance, verifying those as required interfaces, and looking at emergent stuff, stuff that’s emerged – the devil’s in the detail as the saying goes. We can look at the emerging stuff that’s coming out of that detail and then pull all that together and tidy up it up and look for emergent behaviour.

Commentary – Tools & Techniques

Looking at tools and techniques, most safety analysis techniques look at single events or single failures only in isolation. And usually, we expect those events and failures to be independent. So, there’re lots of analyses out there. Basic fault tree analysis, event tree analysis, (well, event tree is slightly different in that we can think about subsequent [control] failures), but there’re lots of basic techniques out there that will really only deal with a single failure at a time. However, 205.2.1C requires us to go further. We’ve got to think about dependent simultaneous events and common cause failures. And for a large and complex system, each of those can be a significant undertaking. So, if we’re doing task 205 well, we are going to push into these areas and not simply do a copy of task 204, but at a higher level. We’re now really talking about the second pass of 205. The previous, quick and dirty, 205 is done. Task 204 on the subsystems is done. Now we’re pulling it all together.

Dependent Events

Let’s think about independent simultaneous events. First, dependent failures. Can an initial failure propagate? For example, a fire could lead to an explosion or an explosion could lead to a fire. That’s a classic combination. If something breaks or wears could be as simple as components wearing and then we get debris in the lubrication system. Could that – could the debris from component wear clog up the lubrication system and cause it to fail and then cause a more serious seizure of the overall system? Stuff like that. Or there may be more subtle functional effects. For example, electric effects, if we get a failure in an electrical system or even non-failure events that happen together.

Could we get what’s called a sneak circuit? Could we get a reverse flow of current that we’re not expecting? And could that cause unexpected effects? There’s a special technique we’re looking at called sneak circuits analysis. That’s sneak, SNEAK, go look it up if you’re interested. Or could there be multiple effects from one failure? Now, I’ve already mentioned fire. It’s worth repeating again. Fire is the absolute classic. First, the effects of fire. You’ve got the fire triangle. So, to get fire, we need an inflammable substance, we need an ignition source, and we need heat. And without all three, we don’t get a fire. But once we do get a fire, all bets are off, and we can get multiple effects. So, we recall, you might remember from being tortured doing thermodynamics in class, you might remember the old equation that P1V1T1 equals P2V2T2. (And I’ve put R2 that for some reason, so sorry about that.)

What that’s saying is, your initial pressure, volume and temperature multiplied together, P1V1T1, is going to be the same as your subsequent pressure, volume and temperature multiply together, P2V2T2. So, what that means is if you dramatically increase the temperature say, because that’s what a fire does, then your volume and your pressure are going to change. So, in an enclosed space we get a great big increase in pressure, or if we’re in an unenclosed space, we’re going to get an increase in volume in a [gas or] fluid. So, if we start to heat the [gas or] fluid, it’s probably going to expand. And then that could cause a spill and further knock-on effects.

Fire, as well as effect making pressure and volume changes to the fluids, it can weaken structures, it makes smoke, and produces toxic gases. So, it can produce all kinds of secondary hazardous effects that are dangerous in themselves and can mess up your carefully orchestrated engineering and procedural controls. So, for example, if you’ve got a fire that causes a pressure burst, you can destroy structures and your fire containment can fail. You can’t send necessarily people in to fix the problem because the area is now full of smoke and toxic gas. So, fire is a great example of this kind of thing where you think, “Well, if this happens, then this really messes up a lot of controls and causes a lot of secondary effects”. So, there’s a good example, but not the only one.

Simultaneous Events

And then simultaneous events, a hugely different issue. What we’re talking about here is we have got undetected, or latent, failures. Something has failed, but it’s not apparent that it’s failed, we’re not aware, and that could be for all sorts of reasons. It could be a fatigue failure. We’ve got something that’s cracked, or it could be thermal fatigue. So, lots of things that can degrade physical systems, make them brittle. For example, an odd one, radiation causes most metals to expand and neutron bombardment makes them brittle. So, it can weaken things, structure and so forth. Or we might have a safety system that has failed, but because we’ve not called upon it in anger, we don’t notice. And then we have a failure, maybe the primary system fails. We expect the secondary system to kick in, but it doesn’t because there’s been some problem, or some knock-on effect has prevented the secondary system from kicking in. And I suspect we’ve all seen that happen.

My own experience of that was on a site I was working on. We had a big electricity failure, a contractor had sawed through the mains electricity cable or dug through it. And then, for some unknown reason, the emergency generators failed to kick in. So, that meant that a major site where thousands of people worked had to be evacuated because there was no electricity to run the computers. Even the old analogue phones failed after a while. Today, those phones would be digital, probably voice over IP, and without electricity, they’d fail instantly. And eventually, without power for the plumbing, the toilets back up. So, you’re going to end up having to evacuate the entire site because it’s unhygienic. So, some effects can be very widespread. Just because you had a late failure, and your backup system didn’t kick in when you expected it to.

So how can we look at that? Well, this is classic reliability modelling territory. We can look at meantime between failures, MTBF, and meantime to repair (MTTR) and therefore we could work out what the exposure time might be. We can work out, “What’s the likelihood of a latent failure occurring?” If we’ve got an interval, presumably we’ve going to test the system periodically. We’ve got to do a proof test. How often do we have to do the proof test to get a certain level of reliability or availability when we need the system to work? And we can look at synchronous and asynchronous events.

And to do that, we can use several techniques. The classic ones, Reliability Block Diagrams (RBD) and Fault Tree Analysis (FTA). Or if we’ve got repairable systems, we can use Markov chain modelling, which is very powerful. So, we can bring in time-dependent effects of systems failing at certain times and then being required, or systems failing and being repaired, and look at overall availability so that we can get an estimate of how often the overall system will be available. If we look at potential failures in all the redundant constituent parts. Lots of techniques there for doing that, some of them quite advanced. And again, very often this is what safety consultants, this is what we find ourselves doing so.

Common Cause Failures

Common cause failure, this is another classic. We might think about something very obvious and physical, maybe we get debris, maybe we’ve got three sets of input channels guarded by filters to stop debris getting into the system, but what if debris blocks all the filters so we get no flow? So, obvious – I say obvious – often missed sources of sometimes quite major accidents. Or let’s say something more subtle, we’ve got three redundant channels, or a number of redundant channels, in an electronic system and we need two out of three to work, or whatever it might be. But we’ve got the same software working each channel. So, if the software fails systematically, as it does, then potentially all three channels will just fail at the same time.

So, there’s a good example of non-independent failures taking down a system that on paper has a very high reliability but actually doesn’t. Once you start considering common cause failure or common mode analysis. So, really what we would like is we would like all redundancy to be diverse if possible. So, for example, if we wanted to know how much fuel we had left in the aeroplane, which is quite important if you want the engines to keep working, then we can employ diverse methods. We can use sensors to measure how much fuel is in the tanks directly and then we can cross-check that against a calculated figure where we’ve entered, let’s say, how much fuel was in the tanks to start with. And then we’ve been measuring the flow of fuel throughout the flight. So, we can calculate or estimate the amount of fuel and then cross-check that against the actual measurements in the tanks. So, there’s a good diverse method. Now, it’s not always possible to engineer a diverse method, particularly in complex systems. Sometimes there’s only really one way of doing something. So, diversity kind of goes out of the window in such an engineered system.

But maybe we can bring a human in

So, another classic in the air world, we give pilots instruments in order to tell them what’s going on with the aeroplane, but we also suggest that they look out the window to look at reality and cross-check. Which is great if you’re not flying a cloud or in darkness and there are maybe visual references so you can’t necessarily cross-check. But even things like system failures, can the pilot look out the window and see which propeller has stopped turning? Or which engine the smoke and flames coming out of? And that might sound basic and silly, but there have been lots of very major accidents where that hasn’t been done and the pilots have shut down the wrong engine or they’ve managed the wrong emergency. And not just pilots, but operators of nuclear power plants and all kinds of things. So, visual inspection, going and looking at stuff if you have time, or take some diverse way of checking what’s going on, can be very helpful if you’re getting confusing results from instrument readings or sensor readings.

And those are examples of the terrific power of human diversity. Humans are good at taking different sensory inputs and fusing them together and forming a picture. Now, most of the time they fuse the data well and they get the correct picture, but sometimes they get confused by a system or they get contradictory inputs and they get the wrong mental model of what’s going on and then you can have a really bad accident. So, thinking about how we alert humans, how we use alarms to get humans attention, and how we employ human factors to make sure that we give the humans the right input, the right mental picture, mental model, is very important. So, back to human factors again, especially important, at this level for task 205.

And of course, there are many specialist common cause failure analysis techniques so we can use fault trees. Normally in a fault tree when you’ve got an and gate, we assume that those two sub-events are independent, but we can use ‘beta factors’ (they’re called) to say, “Let’s say event a and event b are not independent, but we think that 50 percent or 10 percent of the time they will happen at the same time”. So, you can put that beta factor in to change the calculation. So, fault trees can cope with non-independent fate is providing you program the logic correctly. You understand what’s going on. And maybe if there’s uncertainty on the beta factors, you must do some sensitivity modelling on the tree with different beta factors. Or you run multiple models of the tree, but again, we’re now talking quantitative techniques with the fault tree, maybe, or semi-quantitative. We’re talking quite advanced techniques, where you would need a specialist who knows what they do in this area to come up with realistic results, that sensitivity analysis. The other thing you need to do is if the sensitivity analysis gives you an answer that you don’t want, you need to do something about that and not just file away the analysis report in a cupboard and pretend it never happened. (Not that that’s ever happened in real life, boys and girls, never, ever, ever. You see my nose getting longer? Sorry, let’s move on before I get sued.)

So other classic techniques. Zonal hazard analysis, it looks at lots of different components in a compartment. If component A blows up, does it take out everything else in that compartment? Or if the compartment floods, what functionality do we lose in there? And particularly good for things like ships and planes, but also buildings with complex machinery. Big plant where you’ve got different stuff in different locations. There’re also things called particular risk analysis where you think of, and these tend to be very unusual things where you think about what a fan blade breaks in a jet engine. Can the jet engine contain the fan blade failure? And if not, where you’ve got very high energy piece of metal flying off somewhere – where does that go? Does that embed itself in the fuselage of the aeroplane? Does it puncture the pressure hull of the aeroplane? Or, as has sadly happened occasionally, does it penetrate and injure passengers? So, things like that, usually quite unusual things that are all very domain or industry specific. And then there are common mode analysis techniques and a good example of a standard that incorporates those things is ARP 4761. This is a civil aircraft standard which looks at those things quite well, for example, there are many others.

Summary

In summary, I’ve emphasized the differences between Task 205 and 204. So, we might do a first pass 205 and 204 where we’re essentially doing the same thing just at different levels of granularity. So, we might do the whole system initially 205, one big hexagon, and then we might break down the jigsaw and do some 204 at a more detailed level. But where 205 is really going to score is in the differences between 204. So instead of just repeating, it’s valuable to repeat that analysis at a higher-level, but really if we go to diversify if we want success. So, we need to think about the different purpose and timing of these analyses. We need to think about what we’re going to get out of going top-down versus bottom-up, different sides of the ‘V’ model let’s say.

We need to think about the differences of looking at internals versus external interfaces and interactions, and we need to think of appropriate techniques and tools for all those things – and, of course, whether we need to do that at all! We will have an idea about whether we need to do that from all the previous analysis. So, if we’ve done our PHI or PHA, we’ve looked at the history and some simple functional techniques, and we’ve involved end-users and we’ve learnt from experience. If we’ve done our early tasks, we’re going to get lots of clues about how much risk is present, both in terms of the magnitude of the risk and the complexity of the things that we’re dealing with.

So, clearly, if we’ve got a very complex thing with lots of risks where we could kill lots of people, we’re going to do a whole lot more analysis than for a simple low-risk system. And we’re going to be guided by the complexity and risks and the hot spots where they are and go “Clearly, I’ve got a particular interface or particular subsystem, which is a hotspot for risk. We’re going to concentrate our effort there”. If you haven’t done the early analysis, you don’t get those clues. So, you do the homework early, which is quite cheap and that helps you. We direct effort to get the best return on investment.

The Second major bullet point, which I talk about this again and again. That the client and end-user and/or the prime contractor need to do analysis early in order to get the benefits and to help them set requirements for lower down the hierarchy and pass relevant information to the sub-contractors. Because the sub-contractors, if you leave them in isolation, they’ll do a hazard analysis in isolation, which is usually not as helpful as it could be. You get more out of it if you give them more context. So really, the ultimate client, end-user, and probably the prime as well, both need to do this task, even if they’re subcontracting it to somebody else. Whereas, maybe the Sub-System Hazard Analysis, Task 204, could be delegated just down to the sub-system contractors and suppliers. If they know what they’re doing and they’ve got the data to do it, of course. And if they haven’t, there’s somebody further up the food chain on the supply chain may have to do that.

And lastly, Tasks 204 and 205 are complimentary, but not the same. If you understand that and exploit those similarities and differences, you will get a much more powerful overall result. You’ll get synergy. You’ll get a win-win situation where the two different analyses complement, reinforce each other. And you’re going to get a lot more success probably for not much more money and effort time. If you’ve done that thinking exercise and really sought to exploit the two together, then you’re going to get a greater holistic result.

Copyright

So, that’s the end of our session for today. Just a reminder that I’ve quoted from the Mil. Standard 882, which is copyright free, but the contents of this presentation are copyright Safety Artisan, 2020.

For More …

And for more lessons and more resources, please do visit www.safetyartisan.com and you can see the videos at www.patreon.com/safetyartisan.

End

That’s the end of the lesson on system hazard analysis task 205. And it just reminds me to say thanks very much for watching and look out for the next in the series of Mil. Standard 882 tasks. We will be moving on to Task 206, which is Operating and Support Hazard Analysis (OSHA), a quite different analysis to what we’ve just been talking. Well, thanks very much for watching and it’s goodbye from me.

Back to the Home Page | Mil-Std-882 Page | System Safety Page