In this full-length session, The Safety Artisan looks at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. We explore Task 206’s aim, description, scope, and contracting requirements. We also provide value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the pitfalls to avoid. We refer to other lessons for specific tools and techniques, such as Human Factors analysis methods.
Operating & Support Hazard Analysis: Topics
- Task 206 Purpose:
- To identify and assess hazards introduced by O&S activities and procedures;
- To evaluate the adequacy of O&S procedures, facilities, processes, and equipment used to mitigate risks associated with identified hazards.
- Task Description (six slides);
- Reporting (two slides);
- Contracting (two slides); and
- Commentary (four slides).
Operating & Support Hazard Analysis: Transcript
Click here for the Transcript
Hello everyone and welcome to the Safety Artisan; home of safety engineering training. I’m Simon and today we’re going to be carrying on with our series on Mil. Standard 882E system safety engineering.
Operating & Support Hazard Analysis
Today, we’re going to be moving on to the subject of operating and support hazard analysis. This is, as it says, task 206 under the standard. Operating and support hazard analysis, I’ll just call it O&S or OSHA (also O&SHA) for short. Unfortunately, that will confuse people if I call OSHA. Let’s call it O&S.
Topics for this Session
The purpose of O&S hazard analysis is to identify and assess hazards introduced by those activities and procedures and also to evaluate the adequacy of O&S procedures, processes, equipment, facilities, etc, to mitigate risks that have been already identified. A twofold task but a very big task. And as we’ll see, we’ve got lots of slides today on task description, and reporting, contracting, and commentary. As always, I present the full text as is of the task, which is copyright free, but I’m only going to talk about the things that are important. So, we’re not going to go through every little clause of the standard that would be pointless.
O&S Hazard Analysis (T206)
Let’s get started with the purpose. As we’ve already said, it’s to identify and assess those hazards which are introduced by operational and support activities and procedures and evaluate their adequacy. So, we’re looking at operating the system, whatever it may be- And of course, this is a military standard, so we assume a military system, but not all military systems are weapon systems by any means. Not all are physical systems. So, there may be inventory management systems, management information systems, all kinds of stuff. So, does operating those systems and just supporting them (maintaining them are resupplying them, disposing of them, etc.,) does that create any hazards or introduce any hazards? And how do we mitigate? That’s the purpose of the task.
Task Description (T206) #1
Let’s move on to the task description. Again, we’re assuming a contractor is performing the analysis, but that’s not necessarily the case. For this task, this actually says this typically begins during engineering and manufacturing development, or EMD. So, we’re assuming an American style lifecycle for a big system and EMD comes after concept and requirements development. So, we are beginning to move into the very expensive stage of development for a system where we begin to commit serious money. It’s suggesting that O&SHA can wait until then which is fine in general unless you’ve identified any particularly novel hazards that will need to be dealt with earlier on. As it says, it should build on design hazard analyses, but we’ll also talk about the case later on when there is no design hazard analyses. And the O&SHA shall identify requirements or alternatives or eliminating hazards, mitigating risks, etc. This is one of those tasks where the human is very important – In fact, dominant to be honest. Both as a source of hazards and the potential victim of the associated risks. A lot of human-centric stuff going on here.
Task Description (T206) #2
As always, we’re going to think about the system configurations. We’re going to think about what we’re going to do with the system and the environment that we’re going to do it in. So, a familiar triad and I know I keep banging on about this, but this really is fundamental to bounding and therefore evaluating safety. We’ve got to know what the system is, what we’re doing with it, and the environment in which we’re doing it. Let’s move on.
Task Description (T206) #3
Again, Human Factors, regulatory requirements, and particularly specified personnel requirements need to be thought of. Particularly for operating and support, we need to take into account the staffing and personnel concept that we have. It’s frighteningly easy to produce a system that needs so much maintenance, for example, or support activity that it is unaffordable. And lots and lots of military systems and, it must be said, government and commercial systems in the past have come in that required enormous amounts of support, which soon proved to be unaffordable or no one would sign up to the commitment required. So, lots of projects have simply died because the system was going to be too expensive to sustain. That’s a key point of what we’re doing with O&S here. It’s not just about health and safety. It’s about health and safety, which is affordable.
We also need to look at unplanned events. So, not just designed in things, but things introduced- It says human errors. Again, I’m going to re-emphasize it’s erroneous human action because human error makes it sound like a human is at fault. Whereas very often it’s the design or the concept or the requirements that are at fault and place unacceptable burdens on the human being. Again, lots of messy systems seen in the past, which didn’t quite work and we just kind of expected the operator to cope. And most of the time they cope and then every so often they have a bad day at the office or a bunch of factors come together and lots of people die. And then we blame the human. Well, it’s not the human’s fault at all. We put them in that position. And as always, we need to look at past- Past evaluations of related legacy systems and support operations. If you have good data about legacy systems or about similar systems that your organization or another organization has operated, then that’s gold dust. So, do make an effort to get hold of that information if you can. Maybe a trade association or some wider pan organization body can help you there.
Task Description (T206) #4
At a minimum, we’ve got to identify activities involving known hazards. This assumes that we’ve done some hazard analysis in the past, which is very important. We always need to do that. I’ll come back to that commentary. Secondly, changes needed in requirements, be they functional requirements – what we want the system to do. Or design requirements, if we put constraints on how the system may do it for whatever it may be, hardware, software, support equipment, whatever to make those hazards and risks more manageable. Requirements for safety features – so requirements for engineered features and devices, equipment, because always, in almost any jurisdiction, we will have a hierarchy of control that recognizes that designed and engineered in safety features are more effective than just relying on people to get it right. And then we’ve also got to communicate to people the hazards associated with the system. Warnings, cautions, and whatever special emergency procedures might be required associated with the system. Again, that’s something that we see reinforced in law and regulations in many parts of the world. This is all good stuff. It’s accepted good practice all across the world.
Task Description (T206) #5
Moving on, we also need to think about how are we going to move the system around and the associated spares and supplies? How are we going to package them, handle them, stole them, transport them? Particularly if there are hazardous materials, etc, etc, involved. That’s the next part, G. Again, training requirements. We’re thinking about a human-centric approach. Whatever we expect people to do, they’ve got to be trained in how to do it. Point I, we’ve got to include everything, whether it’s developmental or non-developmental terms. We can’t just ignore stuff because it’s GFE or it’s off the shelf. It doesn’t mean it can never go wrong. Far from it. Particularly if we are putting stuff together that’s never been put together before in a novel combination or in a novel environment. Something that might be perfectly safe and stable in an air-conditioned office might start to do odd things in a much more corrosive and uncontrolled environment, let’s say.
We need to think about what modes might the system be potentially hazardous when under operative control. Particularly, we might think about degraded modes of operation. So, for whatever reason, a part of the system has gone wrong or the system has got into an operating environment within which it doesn’t operate as well as it could. It’s not in an optimal operating environment or state. The human being in control of it, we’re assuming, has still got to be able to operate the system, even if it’s only to shut it down or to get it back into a safer state or safer environment. We’ve got to think about all of those nuances.
Then because we’re talking about support as well, we need to think about a related legacy systems, facilities and processes which may provide background information. Also, of course, the system presumably will very often be operating alongside other systems or it will be supported by all systems maybe that exist or being procured separately. So, we’ve got to think about all those interactions as well and all those potential contributions. As you can see, this is quite a wide-ranging broadly scoped task.
Task Description (T206) #6
Finally, on this section, the customer/the end-user/or whoever may specify some specific analysis techniques. Very often they will not. So, whoever is doing the analysis, be they a contractor or third party outside agency, needs to make sure that whatever they propose to do is going to be acceptable to the program manager. In the sense that it is going to be compatible and relevant and useful. And then finally, the contractor has got to do some O&SHA at the appropriate time but maybe more detailed data will come along later. In which case that needs to be incorporated and also operational changes.
An absolute classic [situation] with military and non-military systems is; the system gets designed, it goes into test and evaluation and we discover that things- assumptions that were made during development- don’t actually hold up. The real world isn’t like that or whatever it might be and we find we’re making changes- making changes in assumptions. Those need to be factored in which, sadly, is often not done very well. So, that’s an important point to think about. What’s my change control mechanism and how will the people doing the and O&SHA find out about these changes? Because very often it’s easy to assume that everybody knows about this stuff but when you start making assumptions, the truth is that it very often goes adrift.
Reporting (T206) #1
Let’s talk about reporting- Just a couple of slides here. In the reporting, there’s some fairly standard stuff in here, the physical and functional characteristics of the system- that’s important. Again, we might assume that everybody knows what they are, but it’s important to put them in. It may be that the people doing the analysis were given a different system description to the people developing the system, to the people doing the personnel planning, etc. All the different things that have to be brought together, we need to make sure that they join up again. It’s too easy to get that wrong. Reinforcing the point I made on the previous slide, as more detailed descriptions and specifications come in that needs to be supplied when it becomes available and provided.
Hazard analysis methods and techniques. What techniques are we using? Give a description. If you’re doing it to a particular standard, so much the better. Great- that saves a lot of paper. What assumptions that we made? What data, both qualitative and quantitative have we used to support analysis? That all needs to be declared. By the way, one of the reasons is to be declared is that when things change- not if- that’s when these assumptions and the data and the techniques get exposed. So, if there are changes, if we don’t have this kind of information declared, we can’t assess the impact changes. And it gets even more difficult to keep up with what’s going on.
Reporting (T206) #2
And then hazard analysis results. Again, the leading particulars of the results should be recorded in the hazard tracking system, the HTS, or hazard log, or risk register- whatever you want to call it. But there will be more detailed information that we wouldn’t want to clutter up the risk register with and we also need to provide warnings, cautions, and procedures to be included in maintenance manuals, training courses, operator manuals, etc. So, we’re going to or we’re probably going to generate an awful lot of data out of this task and that needs to be provided in a suitable format. Again, whoever the program manager on the client-side, or is the end-user representation, needs to think about this stuff quite early on.
That leads us neatly on to contracting. Now, this task, in theory, can be specified a little bit down the track, after the program started. In practice, what you find is program managers tried to specify everything up front in a single contract for various reasons.
There are good reasons for doing that sometimes. Also, there are bad reasons but I’m not going to talk about this session. We’ll have a talk about planning your system safety program in another session. There’s a lot of nuances in there to be considered.
Just sticking to this task, identification of functional disciplines – who do we need to get involved in order to do this work properly? It’s likely that the safety team if you have one, may not have relevant operating experience or relevant sustainment experience for this kind of system. If they do, that’s fantastic but that doesn’t negate the read the requirement to get the end-user represented and involved. In fact, that’s a near legal requirement in Australia, for example, and in some other jurisdictions. We need to get the end-users involved. We need the discipline specialist to get involved. Typically, your integrated logistic support team, your reliability people, your maintainability, and your testability people, if you have those disciplines. Or maybe you’re calling them something else, it doesn’t really matter.
We need to know what are the reporting requirements. What, if any, analysis methods and techniques do we desire to be used? Maybe the client or end-user has got to jump through some regulatory hoops and therefore they need specific analysis work and safety results to be done and produced. If that’s the case, then that needs to be specified in the contract. And what data is to be generated in what format? And how is it to be reported on when, etc? Considering the hazard tracking system, etc? And then the client may also select or specify known hazards, known hazardous areas, or other specific items to be examined or excluded because maybe it’s being covered elsewhere or we don’t expect the contractor to be able to do this stuff. Maybe we need to use a specialist organization. Again, maybe a regulator has directed us to do so. So, all of these things need to be thought about when we’re putting together the contract requirements for task 206.
Again, I say this every time, we need to include all items within the scope of the system and the environment, not just developmental stuff. In fact, these days, maybe the majority of programs that I am seeing are mostly non-developmental. So, we’re taking lots of COTS stuff, GFE components, and putting it all together. That’s all going to be included, particularly integration.
We need to think about legacy and related processes and the hazard analysis associated with them if we can get them. They should be supplied to whoever is doing the work and an analyst should be directed to review them and include lessons learned.
Then, reinforcing the previous point that has a tracking system- How will information reported in this task be correlated with tasks and analyses that are being done maybe elsewhere or by different teams? And the example here is 207 health hazard analysis. I’ll talk a little bit about the linkages between the two later. But it’s quite likely in this sort of area there will be large groups of people thinking about operations and maintenance and support. Very often those groups are very different. Sometimes they don’t even talk to each other. That’s the culture in different organizations. You don’t see airline pilots hanging around with baggage handlers very much, do you, down the pub for whatever reason? Different set of people- they don’t always mix very much. And again, you may also have different specialist disciplines, especially the Human Factors people. Again, you’ve got to tie everything in there. So, there’s going to be lots of interfaces in this kind of task that they’ve got to be managed.
Point I – concept of operations. Yes, that’s in every task. You’ve got to understand what we intend to do with this system or what the end-user intends to do with the system in order to have some context for the analysis.
And then finally, what risk definitions and what risk matrix are we using? If we’re not using the standard 882 matrix, then what are we doing?
I’ve got four slides of commentary now – a number of things to say about Task 206.
Now, I’ve picked an Australian example. So, Task 206 ties in very neatly with Australian WHS requirements. I suspect Australian WHS requirements have been strongly influenced by American OSHA and system safety practices. In Australia, we are heavily influenced by the US approach. This standard and legal requirements in Australia, and in many other states and territories let’s be honest, do tie in nicely with the standard. Although not always perfectly, you’ve got to remember that. So, we do need to focus on operations and support activities. That’s a big part of WHS, thinking about all relevant activities and cradle to grave – the whole life of the system. We need to think about the working environment, the workplace. We need to think about humans as an integral part of the system, be they operators or maintainers, suppliers, other kinds of sustainers. And we need to be providing relevant information on hazards, risks, warnings, trainings, and procedures, and requirements for PPE, and so on and so forth to workers.
So, task 206 is going to be absolutely vital to achieving WHS compliance in Australia and compliance with health and safety legislation and regulations in many parts of the world. In the US and UK and I would say in virtually all developed nations. So, this is a very important task for achieving compliance with the law and regulations. It needs to get the requisite amount of attention- It doesn’t always. People so often on a program during procurement and acquisition development, the technical system is the sexy thing. That’s the thing that gets all the attention, especially early on. The operating and particularly the support side tends to get neglected because it’s not so sexy. We don’t buy a system to support it after all do we? We buy a system to do a job. So, we get the operators in and we get their input on how to optimize the system to do the job most cost-effectively and with most mission effectiveness that we can get out of it. We don’t often think about support effectiveness. But to achieve WHS compliance or the equivalent this is a very important task so we will almost always need to do it.
The second item to think about – what is going to be key for the maintenance support side is a technique called Job Safety Analysis or Job Hazard Analysis. I’ve highlighted a couple of sources of information there, particularly I would recommend going to the American www.OSHA.gov site and the guidance that they provide on how to do a job hazard analysis. So, use that or use something else if something different is specified in the jurisdiction you’re working it, then go ahead and use that. But if you don’t have any [guidance] on what to do, this will help you.
This is all about – I’ve got a task to do, whatever it might be doing, how do I do it? Let’s analyse this step-by-step, or at least in reasonable size chunks, thinking about how we do the tasks that need to be done. Now, there’s the operator side, and then, of course, we’re always dealing with human beings working on the system or working with the system. So, we’re going to be seeing potentially a lot of Human Factors type techniques being relevant. And there are lots of tasks that we can think about, Hierarchical Task Analysis and that kind of approach is going to fit in with the Job Hazard Analysis as well. Those are going to link together quite well. There will also be things like workload analysis. Particularly for the operators, if we’re asking the operator to do a lot and to maintain a particular level of concentration or respond rapidly, we need to think about workload and too much workload and too little workload can make things worse.
There are lots of techniques out there, I’m not going to talk about Human Factors here. I’m going to be putting on a series on Human Factors techniques in cooperation with a specialist in that area. So, I’m not going to say more here.
For certain kinds of operators, let’s say, pilots, people navigating a ship and so on, drivers, there will be well-established ways that those operators are trained the way they have to operate. There will often be a legal framework and a regulatory framework that says how they have to operate. And then that may direct a particular kind of analysis to be done or a particular approach to be taken for how operators do their jobs. But equally, there is a vast range of operator roles in industry, in chemical plants. Various specialist operating roles where there’s an industry-specific approach to doing things. Or indeed the general approach may be left up to whoever is developing the system. So, there’s a huge range of approaches here that are going to be largely dictated by the concept of operations and also an awareness of what is relevant law, regulation, and good practice in a particular industry, in a particular situation. That’s where doing your Task 203, your safety requirements analysis really kicks in. It’s a very broad subject we’re covering here. You’ve got to get the specialist in to do it well.
Now, I mention that these days we’re seeing more and more legacy and COTS systems being used and repurposed. Partly to save time and money. We’re not developing mega systems as often as we used to, particularly in defence, but also in many other walks of life as well. So, we may find ourselves evaluating a system where very little technical hazard analysis has been done because there are no developmental items and it’s even difficult to do analysis on legacy or a COTS system because we cannot get the data to do so. Perhaps we can’t get the data for commercial reasons, contractual reasons.
Or maybe we’ve got a legacy system that was developed in a different jurisdiction and whatever information is available with it just doesn’t fit the jurisdictional regulatory system that we’ve got to work in where we want to operate the system. This is very common. Australia, for example, [acquires] a lot of systems from abroad, which have not been developed in line with how we normally do things.
We could in theory just do Task 206 if there was no developmental hazard analysis to do but that’s not quite true. At a minimum, we will always need to do some Preliminary Hazard Listing and hazard analysis – that’s Tasks 201 and 202 respectively. And we will very definitely need to do some System Requirements Hazard Analysis, Task 203, to understand what we need to do for a particular system in a particular application, operating environment, and regulatory jurisdiction. So, we’re always going to have to do those and we may well have to look at the integration of COTS things and do some system-level analysis. That’s 204. We’re definitely going to need to do the early analyses. In fact, the client and the end-user representatives should be doing 201, 202 and 203 and then we may be in a position to finish things off with 206 for certain systems.
Now, having said that, I’ve mentioned already that Task 206 can be very broad in scope and very wide-ranging. There’s a danger that we will turn Task 206 into a bottomless pit into which we pour money and effort and time without end. So, for most systems, we cannot afford to just do O&SHA across the board without any discernment or any prioritization.
So, we need to look at those other hazard analyses and prioritize those areas where people could get hurt. Particularly we should be using legacy and historical data here to say “What does – in reality, what does hurt people when looking after these systems or operating systems?” Again, as I’ve said before, in many industries there is a standard industry approach or good practice to how certain systems are operated, and maintained, and supported. So, if there is a standard industry approach available – particularly if we can justify that by available historical data – if that [is as good] as doing analysis, then why not just use the standard approach? It’s going to be easier to make a SFARP or a ALARP argument that way anyway. And why spend the money on analysis when we don’t have to? We could just spend the money on actually making the system safer. So, let’s not do analysis for the sake of doing analysis.
Also, there’s a strong synergy between the later tasks in the 200 series. There’s a strong linkage between this Task 206 and 207, which is Health Hazard Analysis. Also, there can be a strong linkage between Task 210, which is the Environmental Hazard Analysis. So, this trio of tasks focuses on the impact on living things, whether they be human beings or animals and plants and ecosystems and very often there’s a lot of overlap between them. For example, hazardous chemicals that are dangerous for humans are often dangerous for animals and plants and watercourses and so on and so forth. I’ll be talking about that more in the next session on Task 207.
One word of warning, however. Certainly, in Australia, we have got fixated on hazardous chemicals because we’ve had some very high-profile scandals involving HAZCHEM in the past. Now, there’s nothing wrong, of course, with learning from experience and applying rigorous standards when we know things have gone wrong in the past. But sometimes we go into a mindset of analysis for analysis sake. Dare I say, to cover people’s backsides rather than to do something useful. So, we need to focus on whether the presence of a HAZCHEM could be a problem. Whether people get exposed to it, not just that it’s there.
Certain chemicals may be quite benign in certain circumstances, and they only become dangerous after an emergency, for example. There are lots of things in the system that are perfectly safe until the system catches fire. Then when you’re trying to dispose or repair a fire damage system that can be very dangerous, for example. So, we need to be sensible about how we go about these things. Anyway, more on that in the next session.
That’s the commentary that I have on Task 206. As we said, it links very tightly with other things and we will talk about those in later sessions. I just like to point out that the “italic text in quotations” is from the Mil. standard. That is copyright free as most American government standards are. However, this presentation and my commentary, etc. are copyright of the Safety Artisan 2020.
For More …
Now, for all lessons and resources, please do visit the www.safetyartisan.com. Now, as you’ll notice, it’s an https – it’s a secure website.
End: Operating & Support Hazard Analysis
So, that is the end of the lesson and it just remains for me to say thank you very much for your time and for listening. And I look forward to seeing you again soon. Cheers.