Transcript: System Requirements Hazard Analysis (T203)

Here is the full transcript: Systems Requirements Hazard Analysis.

The full video is here.

Introduction

Hello and welcome to the Safety Artisan, where you will find professional, pragmatic and impartial advice on all things system, safety and related.

System Requirements Hazard Analysis

And so today, which is the 1st of March 2020, we’re going to be talking about – let me just find it for you – we’ll be talking about system requirements, hazard analysis. And this is part of our series on Mil. Standard 882E (882 Echo) and this one a task 203. Task 203 in the Mil. standard. And it’s a very widely used system safety engineering standard and its influence is found in many places, not just on military procurement programs.

Topics for this Session

We’re going to look at this task, which is very important, possibly the most important task of all, as we’ll see. so in to talk about the purpose of the task, which is word for word from the task description itself. We’re going to talk about in the task description, the three aims of this task, which is to determine or work out requirements, incorporate them, and then assess the compliance of the system with those requirements, because, of course, it may not be a simple read-across. We’ve got six slides on that. That’s most of the task. Then we’ve just got one slide on contracting, which if you’ve seen any of the others in this series, will seem very familiar. We’ve got a little bit of a chat about Section 4.2 from the standard and some commentary, and the reason for that will become clear. So, let’s crack on.

Purpose of SRHA

Task 203.1, the purpose of Task 203 is to perform and document a System Requirements Hazard Analysis or SRHA. And as we’ve already said, the purpose of this is to determine the design requirements. We’re going to focus on design rather than buying stuff off the shelf – we’ll talk about the implications of that a little bit later. Design requirements to eliminate or reduce hazards and risks, incorporate those requirements, into a says, into the documentation, but what it should say is incorporate risk reduction measures into the system itself and then document it. And then finally, to assess compliance of the system with these requirements. Then it says the SRHA address addresses all life-cycle phases, so not just meant for you to think about certain phases of the program. What are the requirements through life for the system? And in all modes. Whether it’s in operation, whether it’s in maintenance or refit, whether it’s being repaired or disposed of, whatever it might be.

Task Description #1

First of six slides on the task description. I’m using more than one colour because there’s some quite a lot of important points packed quite tightly together in this description. We’re assuming that the contractor performs and documents this SRHA. The customer needs to do a lot of work here before ever gets near a contractor. More on that later. We need to determine system design requirements to eliminate hazards or reduce associated risks.

Two things here. By identifying applicable policies, regulations and standards etc. More on that later. And analysing identified hazards. So, requirements to perform the analysis as well as to simply just state ‘We want a system to do this and not to do that’. So, we need to put some requirements to say ‘Here’s what we want to be analysed, to what degree? And why.’ is always helpful.

Task Description #2

Breaking those breaking those two requirements down.

Part a. We’re going to identify applicable requirements by reviewing our military and industry standards and specs, historical documentation of systems that are similar or with a system that we’re replacing, perhaps. Look at, it’s assumed that the US Department of Defense is the customer, ultimate customer. So, the ultimate customer’s requirements, including whatever they’ve said about standard ways of mitigating certain common risks. System performance spec, that’s your functional performance spec or whatever you want to call it. Other system design requirements and documents- Bit of a catchall there. And applicable federal, military, state and local regulations.

This is a US standard. It’s a federated system, much like Australia or indeed lots of modern states, even the UK. There are variations in law across England, Wales, Scotland and Ireland. They’re not great, but they do exist. And in the US and Australia, those differences are greater. And it says applicable executive orders. Executive orders, they’re not law, but they are what the executive arm of the U.S. government has issued, and international agreements. A lot of words in there- have a look at the different statements that are in that in white, blue and yellow. Basically, from international agreements right down to whatever requirements may be applicable, they all need to be looked at and taken account of. So, there’s a huge amount of work there for someone to do. I’ll come back to who that someone should be later.

Task Description #3

Part B. It says the contractor shall recommend appropriate system design requirements. The assumption here is that the contractor is the designer and knows the design better than anybody, better than the purchaser, which is fair enough. It’s your system, you should understand it. And the requirement is that the contractor is not just passive, ‘doing as they’re told’, they’re there to actively investigate possible hazards associated with their system and recommend appropriate requirements in order to manage those hazards and risks. And then there’s further guidance here is the contractor to do that in accordance with Section 4 of Mil. Standard 882E. Now, Section 4 is the general requirements of the standards and there’s lots of good advice in that. And I’ll be doing a lesson, maybe more than one lesson in fact, in Section 4 because there is quite a lot in there. The contractor is to refer to the standard and apply the principles therein. All good stuff.

Part C. The contractor shall also define verification and validation approaches. So, the contractor shall define V and V approaches for each design requirement to eliminate hazards and reduce risks. In part C- Well, B and C- we’ve got a very much narrower focus on requirements to eliminate hazards or reduce risks. Whereas in A, notice we’ve got incredibly broad scope looking requirements. It’s not just about the narrow job of dealing with hazards and controlling them, that we’ve got in parts B and C.

Task Description #4

Onwards and upwards. We get to the second major part of this task, which is to incorporate those design requirements. It’s all very well to have them, but they’ve got to be built into the engineering design, into documentation, hardware, software, test plans, etc. And the second highlighted bit that I’ve got is ‘as the design evolves ensure applicable design requirements flow down into lower-level specifications’, etc, etc, etc. There’s a lot of repetition there, so I won’t go through it. Clearly the assumption in this standard is that the design will be done top-down and that the main contractor, design contractor, will be doing work and then identifying lower-level requirements to be passed on to subcontractors and suppliers. And again, the assumption is we’re dealing with a large military system, which is at least, in part, bespoke. It is being developed and/or integrated for the first time for a specific user and specific use.

I’ll come onto the third yellow highlighted bit first, and then it says as appropriate use engineering change proposals to incorporate applicable design requirements into these documents. What we’re saying here is that even if something hasn’t been specified upfront in the original contract, the contractor should use Engineering Change Proposals – ECP – should use it controlled change mechanism in order to change things as they go with approval and refine and evolve the design.

Years of experience have taught me that these statements are coming from the assumption – still true in the US, I believe – whereby major military projects are designed and developed under a cost-plus basis. In other words, the government pays the main contractor / the prime contractor / prime designer on a sort of time and materials basis, not on a firm or fixed price basis, but says ‘Go away and do what we say’. And there are controls there, and there’s open-book accounting to try and prevent the government from being defrauded. But basically, the contractor goes off and does what is required and gets paid for what they do. So, the government has transferred relatively low amounts of risk onto the contractor anticipating that this will result in the lowest possible overall cost of design development. Now, as we probably could know from the news, that doesn’t always work. However, that is the assumption behind this standard. This cost-plus approach will pay you to do the job and therefore we don’t have to specify every single nut and bolt in the contract right at the beginning. Which in some ways takes a lot of risks away from the purchaser because they don’t have to get everything right at the start. So that’s good. There’s always a balance of risk in whichever approach we take.

So, if we go firm price, yes, we could inject more competition into procurement and supply activity, but you’ve got to get your contract upfront right. And all your requirements, right- more or less. That is notoriously difficult to do. Whichever way you go, there are risks. But it’s important to note that this is the assumption underlying the standard. Not every standard follows this approach, follows this philosophy, but 88 2 does. So, if we’re going to use it in a different way, we need to understand the fact that in. More on that later.

Task Description #5

Fifth slide of six. Third part. We need to assess compliance of that development of hardware, software, documentation, data, etc., whatever it might be. In order to do that, the contractor is going to have to address the customer requirements at technical reviews. So again, the assumption is that development is following a systems-engineering process with certain gated reviews. So, you go into a series of reviews, you might start with system requirements review, SRR. Then you might have preliminary design review, top-level design, PDR. And then we go down to detailed design which is reviewed at Critical Design Review, or CDR. And then we might have a further software specification review for software components and then we’ll go on and test readiness routines and so on and so forth.

Mil. Standard 882 is assuming a particular systems-engineering-lifecycle approach to development. This is very widely used not just for military standards, but for civil, and all over the place. Whatever we call these reviews, the idea of a gated review is that you don’t start a review until you’ve reached maturity requirements or design. You then conduct the review against objective criteria and then decide whether the review has passed. Now, usually, there is a hefty payment milestone associated with passing review. The contractor is incentivized to pass the review. And hopefully, if we’ve got the requirements right, a passed review means we’re on the right track and we’re getting the right product. But that’s not always the case that we’ve got to get all these things right.

And then it says during those reviews, the contractor shall address hazards, mitigation measures or controls and methods of V and V, and recommendations arising. A lot goes on at these reviews. They are on big programs, especially, the very important, very high stress. And in fact, in Australia now, there are some projects that are so big that a delay in a PDR review actually made it into the national news on the future submarine because it’s such a huge multi-billion-dollar project. It could all get very painful and political as well.

Task Description #6

However, let’s move on to the final slide of the task description. So, A. was is do the reviews. B. is review test plans and review test results to make sure to verify and validate hardware and software compliance with those requirements. And as it says, this includes V and V of the effectiveness of risk mitigation measures. So, we need to test these risk controls where we can and see how effective they are and whether they live up to the requirements or the assumptions that we’ve made. Now, again, this is an American standard, so it’s very ‘test centric’. The American government likes to test things to death and depending on your point of view, that’s sensible or not, it’s sensible in the sense that you’re testing a real system hopefully in a representative test environment. Although it may not be representative of the operational environment. So, it should be a very solid, robust, valid approach to proving a system.

However, there is a downside to testing in that it’s very expensive and it tends to come at the end of a program. Whereas really you need an indication much earlier on if things are going astray. So, you really need to review documentation and do analysis and so forth. Or maybe you test a prototype for some samples or something early on, rather than waiting until yet when it’s often may be too late and then very expensive to fix things.

And then part C, we need to ensure that hazard control information is incorporated into manuals and plans, whether it be for the operator, the maintainer, the trainer, the logistician, the diagnostics or indeed for the final disposal. We need to take that hazard control information, risk control information, and record it so that it doesn’t get lost and it gets to the people who need it. That’s very important.

OK, so we’ve spent quite a lot of time going through the description because it’s a big, complex task this one, as you can see, with three major parts to it. It’s worth just going back over it. We’ve got our top-level description on slide one, which summarizes the whole thing. We’re talking about finding those requirements, identifying them. We’re talking about the contractor as an active recommender and developer of requirements and actively developing the V and V techniques to make sure that they are met.

In the second major part, we’re talking about incorporating those design requirements as the design evolves and using a controlled change method to make sure that we keep up with what’s going on. We’re talking about assessing compliance both at major systems engineering reviews and during testing. And then finally, we’re talking about making sure that the required information gets through to those who need it at the end of the food chain, as it were. [This is ] all important stuff.

Contracting

Here’s as a page we should be familiar with by now, contracting. We need to require SRHA, Task 203.  We need to put it in the request for proposal and the contractual state, the work. So once again, as I’ve said before, we’ve got to get this stuff in early on. At least the requirement to do it, even if we haven’t fully worked everything out. We need to get that in right at the start of the request for proposal. We need to require task 203 to be done. It’s imposed (A. Imposition of Task 203).

We need to identify (B. Identification of functional disciplines) who we want to take part in it because it’s not, as we will see, it’s not just the discipline and the job of the safety engineers or the safety team to do this. The design engineers, the specialist engineers in reliability, maintainability and testability, whoever, they all need to be involved as well, etc, etc.

Contractor level of effort (C.) for reviews and so on. We may need to specify some hard requirements there to ensure that we get early scrutiny of the product and the design.

A big point is tailoring of the task (D. Tailor 203.2 and 203.2.3 as appropriate). The task may need to be tailored assuming again that the contractor is responsible for the design. Maybe if the prime contractor isn’t responsible for the design, maybe we’re contracting somebody to buy something that’s mostly off the shelf and then operating force for 30 years. Let’s say a so-called turnkey solution. And we might do that for a piece of military kit, or we might do that for a hospital, or whatever it might be. A piece of infrastructure, a service, whatever. So, it may be that the contractor who must do most of task 203 is not the Prime at all. But, the prime needs to pass those requirements down to some key subcontractors who are doing the development stuff. So, it’s not a given that the prime contractor right underneath the customer must do all this stuff. It may have to be done at several different levels.

And again, we’ve got to provide the concept of operations (E.), that gives the context for all this work. Otherwise, it gets very difficult to do it. You’ve got to say, ‘What’s the jurisdictional context?’ ‘Where will we be operating under?’ ‘Which rules and conditions?’ As well as everything else that you would find in Con. Ops (Concept of Operations).

Then if there are any specific hazard management requirements (F.) that need to be imposed and specific measures of risk, then they need to be passed on to the contractor as well. This is how we will assess, and measure, and prioritize risks. That needs to be done for the program otherwise, you can end up with lots of different ways doing it and it becomes difficult to govern mess.

Section 4.2 #1

I promised we would have a little section on Section 4.2 in the standard and I’ve got two slides here that say two important things. We’re not going to go through all of Section 4 of the 882- That’s for another session. But here in 4.2, we’ve got two important things.

It says Section 4 defines system safety requirements through life for any system. And when properly applied, these requirements should enable the identification and management of hazards and their associated risks. Not only during system development but also during sustainment. And any engineering activities that go on in sustainment, whether it be repair, overhaul, modification, update, whatever it might be. These requirements are put in place to enable that good work to take place and make predictions for the through-life operation, support, sustainment of system, whatever it might be.

Section 4.2 #2

And then secondly, there’s another important point here, which I alluded to earlier. System safety staff are not responsible for hazard management in other functional disciplines. If you’re a structural designer, you’re responsible for making your structure or designing your structure such that risks of failure and collapse and catastrophe are managed. And the same for everything else. Whatever it is you’re dealing with, propulsion, fuels, you name it, whatever the discipline is, they’re all responsible for managing the risks.

The safety team is there really to pull it together and try and ensure some consistency and honesty and to report status. They are not there to do it all for the designers. Indeed, they can’t because they will not have the design specialist knowledge to do so. Only the designers can do. But it does go on to say all functional disciplines, using this generic methodology that’s in Section 4, should coordinate their efforts as part of the overall systems engineering process. The standard provides standardization and it should force all these different disciplines to work together in a standardized way following a standardized-systems-engineering process. And remember we said earlier, Mil. standard 882 assumes that there is a higher-level systems-engineering process going on into which the safety program fits. And that’s very, very important.

On so many programs I’ve seen, there’s either no systems engineering process or a weak one. Or the safety program is divorced or isolated from the systems engineering, the higher-level program, and as a result, it can become irrelevant if you’re not careful. So, having these things and making sure that they lock together is very important. And the reasoning given here is because you might mitigate a hazard in one discipline only to make it worse for somebody else. We can all think of examples of one (which is code for me saying I can’t right now). But anyway, trade-offs – that’s what we end up with. There’s Section 4.2, which gives us a little insight into the thrust of the whole of section 4.

Commentary #1

Just two slides of commentary for me. First, it’s worth remembering that there are lots, and lots, and lots of requirements. We’ve got requirements of the standard itself, which is about following a rigorous process. We’ve got law at the international and national levels, and whether those laws apply in a particular jurisdiction or not can be complex. You’ve got product specifications; you’ve got applicable standards, or maybe only parts of the standards that are applicable to your system. And then you’ve got program project requirements, etc., etc. You’ve got lots and lots of layers of requirements that are out there and may or may not be relevant to your system you want to develop, or service, whatever it is going to be. But of course, if we’re using this kind of approach, it’s going to be a complex system or service. It’s going to be challenging to find and identify all these things. It’s going to take some dedicated effort.

That’s one issue, doing all that work. And this is not a trivial exercise and I’ve seen it done badly far more often than I’ve seen it done well. That’s the thing to bear in mind, this is not easy to do. And people didn’t really want to do it – it’s hard work.

And then secondly, we get down to what we might call derived safety requirements. We have a high-level requirement that says, ‘We want a very high level of performance out of this vehicle’ or whatever it might be. And that very demanding performance requirement might force us to use some very high energy fuel, or it might force us to pack a lot of power and a lot of equipment into a very small space, and these requirements can lead to sort of secondary hazards. So, we’ve got high energy fuel inside the vehicle- Well, clearly, that’s dangerous if it leaks. We’ve got a lot of stuff, complex stuff, packed into a small system that can give us thermal control problems. Or if a bit of it goes wrong, if it’s tightly packed together, it can take out something else next to it.

So, these performance requirements can cause hazards that probably weren’t there before or needn’t have been there in, let’s say, a common or garden system that doesn’t have to perform as well. So, we might well look at doing some analysis on our requirements and our top-level design or conceptual design, whatever it might be very early on. And we might say, ‘Well, clearly this is going to drive us down a particular path’ and therefore we will derive some additional safety requirements to deal with these challenges. They don’t come out straight out of higher-level requirements, they’re a secondary effect. But in complex systems, these are very common. And if we’re doing our systems engineering well, we will identify, derive safety requirements for ourselves and for the next level of contractors down the chain.

So, instead of just passing on ‘back-to-back’ requirements from the ultimate customer, which may not mean anything at all to the component supplier (in fact, it probably won’t). We need to change these top-level requirements and say, ‘What’s relevant for you as the supplier role of the engine?’ Let’s say or the wheels, or the wings, or the hull, or whatever it might be. We need to pass on required controls, whether it be the prevention of hazards, detection or mitigation. We also need to remember the order of precedence. It’s preferable to eliminate hazards if we can’t, we put in engineering- engineered features- to reduce the risk or lessen the probability, or severity, etc. And those rules are in section 4.3.4 of the Mil. Standard. There’s a lot of work to do on requirements on many different levels and it may be that this task must be repeated at many different levels.

Commentary #2

But the first level task must be done by the client, and actually by the ultimate end-user because to mangle a famous quote, ‘What you don’t specify – what you don’t see can hurt you’. So, we need to do this work as end-users, and as purchases, as customers. It is tempting to assume that the contractors will just do it, that they’ll just get it. ‘They’ve been making planes for years’ or ‘They’ve been making tanks’, or boots, or guns, or ships, or whatever it might be. ‘They’ve been making fuel for years’, ‘these chemicals for years’. We just assume that they know what they’re doing. Well, they probably do know what they’re doing within a particular context. However, if we impose competition, as we always do because we’re always looking for value for money, and whether we have a competition where we’re asking for a firm price to do something or whether we employ other methods of competition and cost-cutting, that will always be pressure on the contract costs. And that means they will be tempted to tailor the safety approach they’re taking in order to reduce costs. Which is a perfectly legitimate thing to do, nothing immoral about doing that, if it’s done appropriately and sensibly.

But if you as the customer or client are going to incentivize your suppliers to do that, you need to be aware of that and the fact that may just not bother because you haven’t told them to. You’re not contractually specified it so you aren’t going to get it. It’s not their problem. And indeed, the suppliers may not understand how their customer will integrate what they provide or use it. The prime contractor may not have a great idea as to how you’re going to use their product. And you can be certain that the subcontractors and the low level secondary and tertiary suppliers are probably going to have no clue whatsoever about what’s going to happen to their components. They are just not going to know. So, you need to specify that as purchaser and you need to make sure that your immediate suppliers pass on those requirements, and that context, and that they police the contract appropriately. Otherwise, there’s going to be trouble for the ultimate client and end-user.

And then finally, in these days of globalization and business-to-business and international procurement, you may be – probably are – buying stuff that’s been made abroad and designed in another country where they may have completely different laws or no laws at all on how safety is built-in – designed in – to a system. And of course, you don’t always know where design work is going to get done; just because you engage a prime contractor in your own country and think that you’re safe. You don’t know whether the prime contractor is going to subcontract software development – let’s say, out to India. It’s so common it’s a cliché! But there are certain things that tend to be done offshore because it’s cheaper, or quicker, or whatever. Or because somebody has already got a system that you can just plug in and use – allegedly.

There are all kinds of reasons why your supply chain will not necessarily ‘Just get it’, or ‘Just do it”’. In fact, there are lots of good reasons why they won’t. So, the purchaser has got to do a lot of work. It’s critical for the purchaser to know what their obligations are because a lot of purchasers don’t. They sit there in blithe ignorance of what their safety responsibilities are, and the lucky ones get away with it. And the unlucky ones are either killed or maimed, or they kill or maim somebody else and they end up going to jail or massive fines. But you’ve not only got to understand the requirements, the obligations, safety on the end item being used but how do you translate that to the contractors, because it’s not always obvious. You can’t just say, ‘Well, these are the laws that I have to obey- I’ll just pass those on to you, Mr Contractor’ because they may not apply to the contractor if they’re in a different country.

Or it just may not make any sense at their level. Laws that were designed to protect people will not often make much sense to a component supplier. Just doesn’t work. Two important points there on the commentary. Lots of layers of requirements that need to be worked on. This is all classic systems engineering stuff, isn’t it? And then the purchaser and the end-user cannot evade their responsibilities at the top of the food chain. Indeed, they’ll be stuck with the problem, whatever it is, for 30 years or however long they use the system.

It’s important for the end-user and the ultimate client to do this work may be several times at many different layers.

Copyright Statement

Well, that’s the end of the technical content. I just wanted to say that I’ve quoted a lot of text from the Mil, standard, which is itself copyright-free, and it’s available for free online, including on the Web site the Safety Artisan. But this presentation’s copyright of the Safety Artisan 2020.

For More …

And for more resources and for more videos like this one, please go to either www.safetyartisan.com or go to the Safety Artisan page at www.patreon.com.

Well, that is the end of the presentation. And it just remains for me to say thanks again for watching and do look out for the next sessions in the series on 882 echo (882E). There are quite a few to go. We’re going to go through all the tasks and the general and specific requirements of the standard and the appendices. We will also talk about more advanced topics, about how we manage and apply all this stuff.

So, from The Safety Artisan.com, thanks very much and goodbye.

Back to the Home Page | Mil-Std-882 Page | System Safety Page

Professional | Pragmatic | Impartial

Transcript: Preliminary Hazard Analysis (T202)

Here is the full transcript: Preliminary Hazard Analysis.

The full video is here.

Preliminary Hazard Analysis

Hello and welcome to the Safety Artisan, where you’ll find professional, pragmatic and impartial safety training resources. So, we’ll get straight on to our session and it is the 8th February 2020.  

Now we’re going to talk today about Preliminary Hazard Analysis (PHA). This is Task 202 in Military Standard 882E, which is a system safety engineering standard. It’s very widely used mostly on military equipment, but it does turn up elsewhere.  This standard is of wide interest to people and Task 202 is the second of the analysis tasks. It’s one of the first things that you will do on a systems safety program and therefore one of the most informative. This session forms part of a series of lessons that I’m doing on Mil-Std-882E.

Topics for This Session

What are we going to cover in this session? Quite a lot! The purpose of the task, a task description, recording and scope. How we do risk assessments against Tables 1, 2 and 3. Basically, it is severity, likelihood and the overall risk matrix.  We will talk about all three, about risk mitigation and using the order of preference for risk mitigation, a little bit of contracting and then a short commentary from myself. In fact, I’m providing commentary all the way through. So, let’s crack on.

Task 202 Purpose

The purpose of Task 202 is to perform and document a preliminary hazard analysis, or PHA for short, to identify hazards, assess the initial risks and identify potential mitigation measures. We’re going to talk about all of that.

Task Description

First, the task description is quite long here. And as you can see, I’ve highlighted some stuff that I particularly want to talk about.

It says “the contractor” [does this or that], but it doesn’t really matter who is doing the analysis, and actually, the customer needs to do some to inform themselves, otherwise they won’t really understand what they’re doing.  Whoever does it needs to perform and document PHA. It’s about determining initial risk assessments. There’s going to be more work, more detailed work done later. But for now, we’re doing an initial risk assessment of identified hazards. And those hazards will be associated with the design or the functions that we’re proposing to introduce. That’s very important. We don’t need a design to do this. We can get in early when we have user requirements, functional requirements, that kind of thing.

Doing this work will help us make better requirements for the system. So, we need to evaluate those hazards for severity and probability. It says based on the best available data. And of course, early in a program, that’s another big issue. We’ll talk about that more later. It says including mishap data as well, if accessible: American term mishap, it means an accident, but we’re avoiding any kind of suggestion about whether it is accidental or deliberate.  It might be stupidity, deliberate, whatever. It’s a mishap. It’s an undesirable event. We look for accessible data from similar systems, legacy systems and other lessons learned. I’ve talked about that a little bit in Task 201 lesson about that, and there’s more on that today under commentary. We need to look at provisions, alternatives, meaning design provisions and design alternatives in order to reduce risks and adding mitigation measures to eliminate hazards. If we can all reduce associated risk, we need to include all of that. What’s the task description? That’s a good overview of the task and what we need to talk about.

Reading & Scope

First, recording and scope, as always, with these tasks, we’ve got to document the results of the PHA in a hazard tracking system. Now, a word on terminology; we might call hazard tracking system; we might call it hazard log; we might call it a risk register. It doesn’t really matter what it’s called. The key point is it’s a tracking system. It’s a live document, as people say, it’s a spreadsheet or a database, something like that. It’s something relatively easy to update and change. And, we can track changes through the safety program once we do more analysis because things will change. We should expect to get some results and to refine them and change them as time goes on. Very important point.

Scope #1

Scope. Big section this. Let me just check. Yes, we’ve got three slides on the scope. This does go on and on. The scope of the PHA is to consider the potential contribution from a lot of different areas. We might be considering a whole system or a subsystem, depending on how complex the thing is we’re dealing with. And we’re going to consider mishaps, the accidents and incidents, near misses, whatever might occur from components of the system (a. System components), energy sources (b. Energy sources), ordnance (c. Ordnance)- well that’s bullets and explosives to you and me, rockets and that kind of stuff.

Hazardous materials (d. Hazardous Materials (HAZMAT)), interfaces and controls (3. Interfaces and controls), interface considerations to other systems (f. Interface considerations to other systems when in a network or System-of-Systems (SoS) architecture), external systems. Maybe you’ve got a network of different systems talking to each other. Sometimes that’s called a system of systems architecture. Don’t worry about the definitions. Our system probably interacts and talks to other systems, or It relies on other systems in some way, or other systems rely on it. There are external interfaces. That’s the point.

Scope #2

We might think about material compatibilities (g. Material Compatibilities) – Different materials and chemicals are not compatible with others- inadvertent activation (h. Inadvertent activation).

Now, I’ve highlighted I. (Commercial-Off-the-Shelf (COTS), Government-Off-the-Shelf (GOTS), Non-Developmental Items (NDIs), and Government-Furnished Equipment (GFE).) because it’s something that often gets neglected. We also need to think about stuff that’s already been developed. The general term is NDIs and it might be commercial off the shelf, it might be a government off the shelf system, or government-furnished equipment  GFE- doesn’t really matter what it is. These days, especially, very few complex systems are developed purely from scratch. We try and reuse stuff wherever we can in order to keep costs down and schedule down.

We’re going to need to integrate all these things and consider how they contribute to the overall risk picture. And as I say, that’s not often done well. Well, it’s hardly ever done well. It’s often not done at all. But it needs to be, even if only crudely. That’s better than nothing.

J. (j. Software, including software developed by other contractors or sources.  Design criteria to control safety-significant software commands and responses (e.g., inadvertent command, failure to command, untimely command or responses, and inappropriate magnitude) shall be identified, and appropriate action shall be taken to incorporate these into the software (and related hardware) specifications)  we need to include software, including software developed elsewhere. Again, that’s very difficult, often not done well. Software is intangible. If somebody else has developed it maybe we don’t have the rights to see the design, or code, or anything like that. Effectively it’s a black box to us. We need to look at software. I’m not going to bother going through all the blurb there.

Another big thing in part k (k.  Operating environment and constraints) is we need to look at the operating environment. Because a piece of kit that behaves in a certain way in one environment, you put it in a different environment and it behaves differently. And it might become much more dangerous. You never know. And the constraints that we put under on the system. Operating environment is very big. And in fact, if you see the lesson I did on the definition of safety, we can’t really define whether a system is safe or not until we define the operating environment. It’s that important, a big point there.

Scope #3

And then the third slide of three procedures (l. Procedures for operating, test, maintenance, built-in-test, diagnostics, emergencies, explosive ordnance render-safe and emergency disposal). Again, these are well these often don’t appear until later unless of course, we’ve gone off the shelf system. But if we have got off the shelf system; there should be a user manual, there should be maintenance manuals, there should be warnings and cautions, all this kind of stuff. So, we should be looking for procedures for all these things to see what we could learn from them. We want to think about the different modes (m. Modes) of operation of the system. We want to think about health hazards (n. Health hazards) to people, environmental impacts (o. Environmental Impacts), because they take to includes environmental.

We need to think about human factors, human engineering and human error analysis (p. Human factors engineering and human error analysis of operator functions, tasks, and requirements). And it says operator function tasks and requirements, but there’s also maintenance and disposal of storage. All the good stuff. Again, Human Factors is another big issue. Again, it’s not often done well, but actually, if you get a human factor specialist statement early, you can do a lot of good work and save yourself a lot of money, and time, and aggravation by thinking about things early on.

We need to think about life support requirements (q.  Life support requirements and safety implications in manned systems, including crash safety, egress, rescue, survival, and salvage). If the system is crewed or staffed in some way, I’m thinking about, well, ‘What happens if it crashes?’ ‘How do we get out?’ ‘How do we rescue people?’ ‘How do we survive?’ ‘How do we salvage the system?’

Event-unique hazards (r. Event-unique hazards). Well, that’s kind of a capsule for your system does something unusual. If it does something unusual you need to think about it.

And then thinking about part s. infrastructure (s.  Built infrastructure, real property installed equipment, and support equipment), property installed equipment and support equipment in property and infrastructure.

And then malfunctions (t. Malfunctions of the SoS, system, subsystems, components, or software) of all the above.

I’m just going to whizz back and forth. We’ve got to sub-item T there. We’ve got an awful lot of stuff there to consider. Now, of course, this is kind of a hazard checklist, isn’t it? It’s sort of a checklist of things. We need to look at all that stuff. And in that respect, that’s excellent, and we should aim to do something on all of them just to see if they’re relevant or not if nothing else. The mistake people often make is because they can’t do something perfect and comprehensive, they don’t do anything. We’ve got a lot of things to go through here. And it’s much better to have a go at all these things early and do a bit of rough work in order to learn some stuff about our system. It’s much better to do that than to do nothing at all. And with all of these things, it may be difficult to do some of these things, the software, the COTS, things where we don’t have access to all the information, but it’s better to do a little bit of work early than to do nothing at all waiting for the day to arrive when we’ll be able to do it perfectly with only information. Because guess what? That day never comes! Get in and have a go at everything early, even if it’s only to say, ‘I know nothing about this subject, and we need to investigate it.’ That’s the pros and cons of this approach. Ideally, we need to do all these things, but it can be difficult.

Risk Assessment

Moving on. Well, we’ve looked to a broad scope of things for all the hazards that we identify and there are various techniques you can use. The PHA has got to include a risk assessment. That means that we’ve got to think about likelihood and severity and then that gives us an overall picture of risk when we combine the two together. That’s tables 1 and 2.

And then, forget risk assessment codes I’m not sure why that’s in there, table 3 is the risk matrix and 88 2 has a standard risk matrix. And it says to use that unless you’ve got a tailored matrix for your system that’s been approved for use. And in this case, it says approved effectively in accordance with the US Department of Defence. But it’s whoever is the acquiring organization, the authority, the customer, the purchaser, whatever you want to call it, the end-user. We’ll talk about that more in a sec.

Table I, Severity

Let’s start by looking at severity, which in many ways is the easiest thing to look at. Now, here we’ve got in this standard we’ve got an approach based on harm to people, harm to the environment, and monetary loss due to smashing stuff up. At the top catastrophic accident. Category 1 is a fatal accident. This accident could result in death, permanent total disability, irreversible significant environmental impact, or monetary loss. And in this case, it says $10 million. Well, this, that’s 10 million US dollars. This standard was created in 2012, this version of the standard, probably inflation has had an effect since then. And a critical accident, we could cause partial disability injuries or occupational illness that can hospitalized three people are reversible. Significant environmental impact or some losses between 1 million and 10. And then we go down to marginal. Injury or hospital, lost workdays for one person, reversible moderate environmental impact or monetary loss between $100,000 and one million dollars. And then finally negligible is less than that. Negligible is an injury or illness that doesn’t result in any lost time at work, minimal environmental impact, or a monetary loss of less than a hundred thousand dollars. That’s easy to do in this standard. We just say, ‘What are the losses that we think could result?’ Worst case, reasonable scenario or an accident? That’s straightforward.

Table II, Probability

Now let’s look at probability. We’ve got a range here from ‘a’ to ‘e’, frequent down to improbable, and then F is eliminated. And eliminated in the standard really does mean eliminated. It cannot happen ever! It does not mean that we managed to massage the figures, the likelihood a probability figures, down Low that we pretend that it will never happen. It means that it is a physical impossibility. Please take note because I’ve seen a lot of abuse of that approach. That’s bad practices to massage the figures down to a level where you say, ’I don’t need to bother thinking about this at all!’ because the temptation is just to frig [massage] the figures and not really consider stuff that needs to be considered. Well, I’ll get off my soapbox now.

Let’s go back to the top. Frequent- you’ve said, for one item, likely to occur often. Down to probable- occur several times in the life of an item. Occasional- likely to occur sometimes, we think it’ll happen once in the life of an item. Remote- we don’t think it’ll happen at all, but it could do. And improbable – so unlikely for an individual item that we might assume that the occurrence won’t happen at all. But when we consider a fleet, particularly, I’ve got hundreds or thousands of items, the cumulative risk or cumulative probability, sorry, I should say, is unlikely to occur across the fleet, but it could.

And this is where this specific vs. fleet occurrence or probability is useful. For example, if we think ‘Let’s imagine a frequent hazard’. We think that something could happen to an item, per item, let’s say once a year. Now, if we’ve got a fleet of fifty of these items or fifty-something of these items, that means it’s going to happen across the fleet pretty much every week on average. That’s the difference. And sometimes it’s helpful to think about an individual system. And sometimes it’s helpful to think about a fleet where you’ve got the relevant experience to say, ‘Well the fleet that we’re replacing. We had a fleet of 100 of these things. And this went wrong every week or every month or once a year or only happened once every 10 years across the entire fleet.’ And therefore, we could reason about it that way.

We’ve got two different ways of looking at probability here. And use whichever one is more useful or helps you. But when we’re doing that, try and do that with historical data, not just subjective judgment. Because otherwise your subjective judgment, one individual might say ‘That will never happen!’, whereas another will say, ‘Well, actually we experienced it every month on our fleet!’. Circumstances are different.

Table III, Risk Matrix

We put severity and probability together. We have got ‘1’ to ‘4’ for severity, and ‘A’ to ‘F’ for probability, and we get this matrix. We’ve got probability down the side and severity along the top. And in this standard, we’ve got high risk, serious risk, medium risk and low risk. And now how exactly you define these things is, of course, somewhat arbitrary. We’ll just look at some general principles.

The good thing about this risk matrix is- First, the thing to remember is that risk is the product of probability and severity. Effectively we multiply the two together and we go, well, if we’ve got a catastrophic or critical risk. And it’s if we’ve got a more serious risk and it’s going to happen often that’s a big risk. That’s a high risk. Whereas, if we’ve got a low severity accident that we think will happen very, very rarely, then that’s a low risk. That’s great.

One thing to note here it’s easier to estimate the severity than it is the probability. It’s quite easy to under- or overestimate probability. Usually, because of the physical mechanism involved, it’s easier to estimate the severity correctly. If we look on the right-hand side, at negligible. We can see that if we’re confident that something is negligible, then it can be a low risk. But at the very most, it can only be a medium risk. We are effectively prioritizing negligible severity risks quite low down the pecking order.

Now, on the other side, if we think we’ve got a risk that could be catastrophic, we could kill somebody or do irreversible environmental damage, then, however improbable we think it is, it’s never going to be classified less than medium. That’s a good point to note. This matrix has been designed well, in the sense that all catastrophic and critical risks are never going to get the low medium and they can quite easily become serious or high. That means they’re going to get serious management attention. When you put risks up in front of a manager, senior person, a decision-maker, who’s responsible and they see red and orange, they’re going to get uncomfortable and they’re going to want to know all about that stuff. And they will want to be confident that we’ve understood the risk correctly and it’s as low as we can get it. This matrix is designed to get attention and to focus attention where it is needed.

And in this standard, in 88, you ultimately determine whether you can accept risk based on this risk rating. In 882, there is no unacceptable, intolerable risk. You can accept anything if you can persuade the right person with the right amount of authority to sign it off. And the higher the risk, the higher the level of authority you must get in order to accept the risk and expose people to it. This matrix is very important because it prioritizes attention. It prioritizes how much time and effort money gets spent on reducing risks. You will use it to rank things all the time and it also prioritizes, as we’ll see later, how often you review a risk because clearly, you don’t want to have high risks or serious risks. Those are going to get reviewed more often than a medium risk or low risk. A low risk might just get review routinely, not very often, maybe once a year or even less. We want to concentrate effort and attention on high risks and this matrix helps us to do that. But of course, no matrix is perfect.

Now, if we go back. Looking at the yellow highlight, we’re going to use table three unless there’s a tailored alternative definition, a tailored alternative matrix. Now, noting this matrix, catastrophic risk, the highest possible risk, we’ve got one death. Now, if we had a system where it was feasible to kill more than one person in an accident, then really, we would need another column worse than catastrophic. We could imagine that if you had a vehicle that had one person in it and the vehicle crashed, whatever it was, a motorbike let’s say. Let’s imagine you only said ‘We’re only going to have solo riders. We can only kill one person.’. We’re assuming we won’t hurt anybody else. But if you’ve got a car where you’ve got four or more people in, you could kill several people. If you’ve got a coach or a bus, you could drive it off a cliff and kill everybody, or you might have a fire and some people die, but most of them get out. You can see that for some vehicles, for some systems, you would need additional columns. Killing one person isn’t the worst conceivable accident.

Some systems. You might imagine quite easily, say with a ship, it’s actually very rare for a ship to sink and everybody dies. But it’s quite common for individuals on ships to die in health and safety type accidents, workplace accidents. In fact, being a merchant seaman is quite a risky occupation. But also in between those two, it’s also quite possible to have a fire or asphyxiating gases in a compartment. You can kill more than one person, but you won’t kill the entire ship’s company. Straight away in a ship, you can see there are three classes, if you like, of serious accidents where you can kill people. And we knew we should really differentiate between the three when we’re thinking about risk management. And this matrix doesn’t allow you to do that. If you’ve got a system where more than one death this is feasible, then this matrix isn’t necessarily going to serve well, because all of those types of accidents get shoved over into a catastrophic column, on this matrix, and you don’t differentiate between any of between them which is not helpful. You may need to tailor your matrix and add further columns.

And depending on the system, you might want to change the way that those risks are distributed. Because you might have a system, for example riding a bicycle. It’s very common riding a bicycle to get negligible type injuries. You know you fall off, cuts and bruises, that kind of thing. But, if you’re not on the road, let’s say you’re riding off-road it is quite rare to get utilities unless you do a mountain biking on some extreme environment. You’ve got to tailor the matrix for what you’re doing. I think we’ve talked about that enough. We’ll come back to that in later lessons, I’m sure.

Risk Mitigation

Risk mitigation, we’re doing this analysis, not for the sake of it, we’re doing it because we want to do something about it. We want to reduce the risk or eliminate it if we can. 88 2 standard gives us an order of precedence, and as it says it’s specified in section 4.3.4, but I’ve reproduced that here for convenience. Ideally, we would like to eliminate hazards by designing them. We would make a design decision to say, ‘We won’t have a petrol engine, let’s say, in this vehicle or vessel because petrol is a serious fire/explosion hazard. We’ll have something else. We’ll have diesel or we’ll have an all-electric vehicle maybe these days or something like that.’ We can eliminate the risk.

We could reduce the risk by altering the design introducing sort of failsafe features, or making the design crashworthy, or whatever it might be. We could add engineered features or devices to reduce risk safety features seatbelts in cars or airbags, roll balls, crash survivable cages around the people, whatever it might be. We can provide warning devices to say ‘Something’s going wrong here, and you need to pull over’ or whatever it is you need to do. ‘Watch out!’ because the system is failing and maybe ‘Your brakes are failing. You’ve got low brake fluid. Time to pull over now before it gets worse!’.

And then finally, the least effective precautions or mitigations signage, warning signs – because nobody reads warning signs, sadly. Procedures. Good, if they’re followed. Again, very often people don’t follow them. They cut corners. We train people. Again, they don’t always listen to the training or carry it out. And we provide PPE. That’s personal protective equipment. And again, PPE is great if you enforce it. For example, I live in Australia. If you cycle in Australia, if you ride a bicycle, it’s the law that you wear a bike helmet. Most people obey the law because they don’t want to get a $300 fine or whatever it is if the cops catch you, but you still see people around who don’t wear one. Presumably, because they think they’re bulletproof, and it will never happen to them.

PPE is fine if it’s useful. But of course, sometimes PPE can make a job so much harder that people discard it. We really need to think about designing a job to make it easy to do, if we’re going to ask people to wear awkward PPE. Also, by the way, we need to not ask them to wear PPE for trivial reasons just so that the managers can cover their backsides. If you ask people to wear PPE when they’re doing trivial jobs where they don’t need it then it brings the system into disrepute. And then people end up not wearing PPE for jobs where they really should be wearing it. You can over-specify safety and lose goodwill amongst your workers if you’re not careful.

Now those risk mitigation priorities, that’s the one in this standard, but you will see an order of precedence like that in many different countries in the law. It’s the law in Australia. It’s the law in the UK, for example, expressed slightly differently. It’s in lots of different standards for good reason because we want to design out the risks. We want to reduce them in the design because that’s more effective than trying to bolt on or stick home safety afterwards. And that’s another reason why we want to get in early in a project and think about our hazards and our risks early on. Because it’s cheaper at an early stage to say, ‘We will insist on certain things in the design. We will change the requirements to favour a design that is inherently safe.’

Contracting

We only get these things if we contract for them. The model in 88 2, the assumption is it’s a government somewhere contracting a contractor to do stuff. But it doesn’t have to be a government, it can be any client or purchase of world authority or end-user asking for something, buying something, contracting something, be it the physical system, or service, or whatever it might be. The assumption is that the client issues a request for proposal.

Right at the start, they say ‘I want a gizmo’. Or ‘I want- I don’t even want to specify that I want a gizmo. I want something that will do this job. I don’t care what it is. Give me something that will do this job.’ But even at that early stage, we should be asking for preliminary hazard analysis (PHA) to be done. We should be saying, ‘Well, who?’ ‘Which specialists?’ ‘Which functional disciplines need to be involved?’. We need to specify the data that we require and the format that it’s in. Considering, especially the tracking system, which is task 106. If we’re going to get data from lots of different people, best we get it in a standardized format we can put it all together. We want to insist that they identify hazards, hazardous locations, etc. We want to insist on getting technical data on non-developmental items, either getting it for the client or the client supplies it. Says to the contractor or doing it ‘This is the information that I’m going to supply you’ and you will use it. We need to supply the concept of operations and of course, the operating environment. Let me just check, no that that’s it. We’ve only got one slide on commentary. It doesn’t say the environment, but we do need to specify that as well, and hopefully, that should be in the concept of operations, and a specific hazard management requirement. For example, what matrix are we going to use? What is a suitable matrix to use for this system?

Now to do all of this, the purchaser, the client really probably needs to have done Task 202 and 201 themselves, and they’ve done some thinking about all of this in order to say, ‘With this system, we can envisage- with this kind of requirement, we can envisage these risks might be applicable.’ And ‘We think that the risks might be large or small’ depending on what the system is or ‘We think that-’. Let’s say if you purchase a jet fighter, jet fighters because of that demand, the overwhelming demand for performance, they tend to be a bit riskier than airliners. They fall out of the sky more often. But the advantage is that there are normally only one or two people on board. And jet fighters tend to fly a lot of the time in the middle of nowhere. You’re likely to hurt relatively few people, but it happens more often.

Whereas if you’re buying an airliner something, you can shove a couple of hundred people in at one go, those fall out of the sky much less frequently, thank goodness, but when they do, lots of people get hurt. Aa different approach to risk might be appropriate for different types of system. And when your, you should be thinking about early on, if you’re the client, if you’re the purchaser. You should have done some analysis to enable you to write a good request for proposal because if you write a bad request for proposal, it’s very difficult to recover the situation afterwards because you start at a disadvantage. And the only way often to fix it is to reissue the RFP and start again. And of course, nobody wants to do that because it’s expensive and it wastes a lot of time. And it’s very embarrassing. It is a career-limiting thing to do, a lot of people. You do need to do some work upfront in order to get your RFP correct. That’s what it says in the standard.

Commentary

I want to add a couple of comments, I’m not going to say the much. First, it’s a little line from a poem by Kipling that I find very, very helpful. And Kipling used to be a journalist and it was his job to go out and find out what the story was and report it. And to do that he used his six honest serving men. He asked ‘What?’ and ‘Why?’ and “When?’ and ‘Who?’, sorry, and ‘How?’ and ‘Where?’ and ‘Who?’. Those are all good questions to ask. If you can ask all those questions and get definite answers, you’re doing well. And a little tip here as a consultant, I rock up and one of the tricks of the trade I use is I turn up as the ‘dumb consultant’ – I always pretend to be a bit dumber than I really am- and I ask these stupid questions. And I ask the same questions to several different people. And if I get the same answer to the same question from everyone, I’m happy. But that doesn’t always happen. If you start getting very different answers to the same question from different people, then you think, ‘Okay, I need to do some more digging here’. And it’s the same with hazard analysis. Ask the what, why, when, where and who questions.

Another issue, of course, is ‘How much?’ ‘How much is this going to take?’ ‘How long is this going to take?’ ‘How many people am I going to have to invite to this meeting?’, etc. And that’s difficult. And really, the only way to answer these questions properly is to just do some PHI and PHA early and to learn from the results. The other alternative, which we are really good as human beings, is to ask the questions early to get answers that we don’t really like and then just to sweep them under the carpet and not ask those questions ever again because we’re frightened of the answers that we might. However frightened you are of the answer, you might get do ask the question because forewarned is forearmed. And if you know about a problem, you can do something about it. Even if that something is to rewrite your CV and start looking for another job. Do ask the questions even if it makes people uncomfortable. And I guess learning how to ask the questions without making people uncomfortable is one of the tricks that we must learn as safety engineers and consultants. And that’s an important part of the job. The soft skills really that you can only learn through practice, really, and observing people.

What’s the way to do it? Well, I’ve said this several times but do your PHI and PHA early. Do it as early as possible because it’s cheap to do it early. If you’re the only safety person or safety, you often in the beginning, maybe you’re a manager, maybe safety is part of your portfolio, you’ve got other responsibilities as well. Just sit down one day and ask these dumb questions, go through the checklist in Task 202 and say, ‘Do I have these things in my system?’

If you know for sure you’re not going to have explosive ordnance, or radiation, or whatever it might be, you can go, ‘Great. I can cross those off the list’. I can make an assumption or I can put a constraint in, by the way, if you really want to do it well and say ‘We will have no explosive devices’, ‘We will have no energetic materials.’, ‘We will have no radiation’ or whatever it might be. Make sure that you insist that you’ll have none of it then you can hopefully move on and never have to deal with those issues again.

Do the analysis early, but expect to repeat it because things change, and you learn more and more information comes in. But of course, the further you go down the project, the more expensive everything gets. Now, having said do it, do it early, the Catch 22 is very often people think ‘How can I analyse when I don’t have a design?’

The ‘Catch-22’ question is what comes first, design or analysis? Now, the truth is that you could do an analysis of very simple functions. You don’t need any design at all. You don’t even need to know what kind of vehicle or what kind of system you might be dealing with. But of course, that will only take you so far. And it may be that you want to do early analysis, but for whatever reason, [Intellectual Property Rights] IPR or whatever it might be, you can’t get access to data.

What do you do? You can’t get access to data about your system or the system that you’re replacing. What do you do? Well, one of the things you can do is you can borrow an idea from the logistics people. Logistic support analysis Task 203 is a baseline comparison system. Imagine that you’re going to have a new system, maybe is replacing an old system, but maybe it does a lot more than the old system used to do. Just looking at the old system isn’t going to give you the full picture. Maybe what you need to do is make up an imaginary comparison system. You take the old system and say, ‘Well, I’m adding all this extra functionality’. Maybe the old system, we just bought the vehicle. We didn’t buy the support system, we didn’t buy the weapons, we didn’t buy the training, whatever it might be. But, this time around, we’re buying the complete package. We’re going to have all this extra stuff that probably has hazards associated with it, but just doing lessons learned from the previous system will not be enough.

Maybe you need to construct an imaginary Baseline Comparison System and go, ‘I’ll borrow bits from all these other systems, put them all together, and then try and learn from that sort of composite system that I’ve invented, even though it’s imaginary.’ That can be a very powerful technique. You may get told, ‘Oh, we haven’t got the money’ or ‘We haven’t got the time to do that’. But to be honest, if there’s no other way of doing effective, early analysis, then spend the money and do it early. Because many times I’ve seen people go, ‘Oh, we haven’t got time to do that’. They’ve never got time to do it properly and therefore, you end up doing it. You go around the buoy two or three times. You do it badly. You do it again slightly less badly. You do it a third time. And it’s sort of barely adequate. And then you move forward. Well, you’ve wasted an awful lot of time and money and held up other people, the rest of the project doing that. Probably it’s better off to spend the money and just get on with it. And then you’re informed going forwards before you start to spend serious money elsewhere on the project.

Copyright Statement

Well, that’s it for me. Just one thing to say, that Mil. Standard 882E came out in 2012. Still going strong, unlikely to be replaced anytime soon. It’s copyright free. All the quotations are from the standard, they’re copyright free. But this video is copyright of The Safety Artisan 2020.

For More …

And you can find a lot more information, a lot more safety videos, at The Safety Artisan page at www.Patreon.com and you can find more resources at www.safetyartisan.com.

That is the end of the show. Thank you very much for listening. And it just remains for me to say. Come and watch some more videos on Mill-Std-882E. There’s going to be a complete course on them, and you should be able to get, I hope, a lot of value out of the course. So, until I see you again, cheers.


Back to the Home Page | Mil-Std-882 Page | System Safety Page

Professional | Pragmatic | Impartial

Transcript: Preliminary Hazard List (T201)

Here is the full transcript: Preliminary Hazard List (Task 201 in Mil-Std-882E).

The full video is here.

Preliminary Hazard Identification

Hello, everyone, and welcome to the Safety Artisan, where you will find instructional materials that are professional, pragmatic and impartial because we don’t have anything to sell and we don’t have an axe to grind. Let’s look at what we’re doing today, which is Preliminary Hazard Identification. We are looking at one of the first actual analysis tasks in Mil-Std-882E, which is a systems safety engineering standard from the US government, and it’s typically used on military systems, but it does turn up elsewhere.

Preliminary Hazard ID is Task 201.

I’m recording this on the 2nd of February 2020, however, the Mil-Std has been in existence since May 2012 and it is still current, it looks like it is sticking around for quite a while, this lesson isn’t likely to go out of date anytime soon.

Topics for this session

What we’re going to cover is, quoting from the task, first of all, we’re going to look at the purpose and the task description, where the task talks quite a lot about historical review (I think we’ve got three slides of that), recording results, putting stuff in contracts and then I’m adding some commentary of my own. I will be commenting all the way through, that’s the value add, that’s why I’m doing this, but then there’s some specific extra information that I think you will find helpful, should you need to implement Task 201. In this session, we’ve moved up one level from awareness and we are now looking at practice, at being equipped to actually perform safety jobs, to do safety tasks.

Preliminary Hazard Identification (T201)

The purpose of Task 201 is to compile a list of potential hazards early in development. two things to note here: it is only a list, it’s very preliminary. I’ll keep coming back to that, this is important. Remember, this is the very first thing we do that’s an analytical task. There are planning tasks in the 100 series, but actually some of them depend on you doing Task 201 because you can’t work out how are you going to manage something until you’ve got some idea of what you’re dealing with. We’ll come back to that in later lessons.

It is a list of potential hazards that we’re after, and we’re trying to do it early in development. And I really can’t overemphasise how important it is to do these things early in development, because we need to do some work early on in order to set expectations, in order to set budgets, in order to set requirements and to basically get a grip, get some scope on what we think we might be doing for the rest of the program. this is a really important task and it should be done as early as possible, and it’s okay to do it several times. Because it’s an early task it should be quick, it should be fairly cheap. We should be doing it just as soon as we can when we’re at the conceptual stage when we don’t even have a proper set of requirements and then we redo it thereafter maybe. And maybe different organisations will do it for themselves and pass the information on to others. And we’ll talk about that later as well.

The task description. It says the contractor shall – actually forget about who’s supposed to do it, lots of people could and should be doing this as part of their project management or program management risk reduction because as I said, this is fundamental to what we’re doing for the rest of the safety program and indeed maybe the whole project itself. So, what we need to do is “examine the system shortly after the material solution analysis begins and compile a Preliminary Hazard List (PHL) identifying potential hazards inherent in the concept”. That’s what the standard actually says.

A couple of things to note here. Saying that you start doing it after material solution analysis has begun might be read as implying you don’t do it until after you finish doing the requirements, and I think that’s wrong, I think that’s far too late. to my mind, that is not the correct interpretation. Indeed, if we look at the last four words in the definition, it says we’re “identifying potential hazards inherent in the concept”. that, I think, gives us the correct steer. we’ve got a concept, maybe not even a full set of requirements, what are the hazards associated with that concept, with that scope? And I think that’s a good way to look at it.

Historical Review

This task places a great deal of emphasis on review of historical documentation, and specifically on reviewing documentation with similar and legacy systems. an old system, a legacy system that we are maybe replacing with this system but there might be other legacy systems around. We need to look at those systems. The assumption is that we actually have some data from similar and legacy systems. And that’s a key weakness really with this, is that we’re assuming that we can get hold of that data. But I’ll talk about the issues with that when I get to my commentary at the end.

We need to look at the following (and it says including but not limited to).

a) Mishap and incident reports, this is a US standard. they talk about mishaps because they’re trying to avoid saying accidents because that implies that something has gone wrong accidentally. Whereas the term mishap, I believe, is meant to imply that it might be accidental, it might be deliberate, whatever it might be, it doesn’t matter, something has gone wrong. An undesirable event has happened, it’s a mishap. we need to look at mishap and incident reports. Well, that’s great, if you’ve got them if they’re of good quality.

b) You need to look at hazard tracking systems. When the Mil-Std talks about hazard tracking systems it is referring to what you and I might describe as a hazard log or a risk register. It doesn’t really matter what they called, where are you storing information about your hazards? And indeed, the tracking implies that they are live hazards, in other words, associated with a live system and things are dynamic and changing. But don’t worry about that, you should, we should, be looking in our hazard logs, in our risk registers, that kind of thing.

c) Can we look at lessons learned? Fantastic, again, if we’ve got them. But unfortunately, learning lessons can be a somewhat political exercise, unfortunately. it doesn’t always happen.

d) We need to look at previous safety analysis and assessments. That’s fantastic. If we’ve got stuff that’s even halfway relevant, maybe we could use it and save ourselves a lot of time and trouble. Or maybe we could look at what’s around and go, actually, I think that’s not suitable because…, and then even that gives you a steer to say, we need to avoid what’s gone wrong with the previous set of analysis. But hopefully without just throwing them out and dismissing them out of hand, because that’s far too easy to do (not invented here, I didn’t do it, therefore it’s no good). Human pride is a dangerous thing.

e) It says health hazard information. Maybe there are some medical results, some toxicology, maybe we’ll be tracking the exposure of people to certain toxins in similar systems. What can we learn from that?

f) And test documentation. let’s look at these legacy systems. What went right, what didn’t go right and what had to be done about it. all useful sources of information.

g) And then that list continues. Mil-Std 882 includes environmental impact, its safety and environmental impact is implicit all the way through the standard. we also need to look at environmental issues, thinking about system testing, training, where it’s going to be deployed and maintenance at different levels. And we talk about potential locations for these things because often environmental issues are location sensitive. doing a particular task in the middle of nowhere in a desert, for example, might be completely harmless, doing it next to a significant watercourse, which is near a Ramsar Wetland (an environment of international importance) or an area of outstanding natural beauty or a national park, something like that, might have very different implications. it’s always location-sensitive with environmental stuff.

h) And being an American publication, it goes on to give a specific example: The National Environmental Policy Act (NEPA), which is in the U.S. and then similarly there is an executive order looking at actions by the federal government when abroad and how the federal government should manage that. Now, those are U.S. examples. If you’re not in the U.S. there’s probably a local equivalent of these things. I live and work in Australia, where we have an Australian Environmental Protection and Biodiversity Conservation (EPBC) Act. It doesn’t just apply in Australia, it also applies to what the Commonwealth Government does abroad as well. outside the normal Australian jurisdiction, it does apply.

i) And then finally, we’ve got to think about disposing of the kit. Demilitarisation: maybe we’re going to take out the old military stuff and flog it to somebody, we need to think about the safety and environmental impacts of doing that. Or maybe we’re just going to dispose of the kit, whatever it might be, we’re going to scrap it or destroy it or put it away somewhere, store it again in the desert somewhere for a rainy day. If that’s not a contradiction in terms. we’re going to think about the disposal of it as well and what are the safety and environmental implications of doing so? there’s a good, broad checklist here to help us think about different issues.

Recording Results

It says the, whoever is doing this stuff, the contractor, shall document identified hazards in this hazard tracking system, in this hazard log, this risk register, whatever you want to call it. And the content of this recording and the formats to be used have got to be agreed between, it says the contractor and the program office, but generally the purchaser and whoever is doing the work. the purchaser might also be the ultimate end-user, as is often the case with the government, or it might be something else. Again, it might be the purchaser will sell on to an end-user, but they’ve got to agree what they’re going to do with the contractor.

And of course, doing so, you’ve got to understand what your legal obligations are. Again, for example, in Australia, the WHS Act puts particular obligations on designers, manufacturers, suppliers, importers, etc. There are three duties and two of them are associated with passing on information to the end-user. be aware of what your obligations are, the kind of information that at minimum you must provide and probably make sure that you’re going to get that minimum information in a usable format and maybe some other stuff as well that you might need. And it says unless specified elsewhere, in other words, by agreement with the government or whoever is the purchaser, you’ve got to have a brief description of the hazard and the causal factors associated with each identified hazard.

Now this is beginning to get away from just a pure list, isn’t it? it’s not just a list, we have to have a description that we can scope out the hazard that we’re talking about. Bear in mind, early on we might identify a lot of hazards that subsequently actually turn out to be just one hazard or are not applicable or are covered by something else. we need a description that allows us to understand the boundaries of what we’re talking about. And then we’re also being asked to identify causes or causal factors. maybe circumstances, what could cause these things, etc. it’s a little bit more than just a list, but we’re beginning to fill in the fields in the hazard log as we do this at the start.

Contracting

Now, this is very useful, in the standard for every task it says here are the details to be specified in the contractual documentation, and notice it says details to be specified in the Request for Proposal. you’ve got to ask for this stuff if you need it. You’ve got to know that you need it and why you need it and what you’re going to do with the information as purchaser. And you’ve got to put that in right at the start in the Request for Proposal and the Statement of Work. And here’s some guidance on what to include.

The big point here is this needs to be done very early on. In fact, to be honest, the purchaser is going to have to do Task 201 themselves and maybe some other tasks in order to get enough data and enough understanding to write the Request for Proposal and the Statement of Work in the first place. you do it yourself and then maybe you do a quick job to inform your contracting strategy and what you’re going to do and then you get the contractor to do it as well.

What have we got to include? Well, we’ve got to impose Task 201. I’ve seen lots of contracts where they just say, ah, do safety, do safety in accordance with this standard, do Mil-Std 882 or whatever it might be. And a very broad open-ended statement like that is vulnerable to interpretation because what your contractors, your tenderers, will do is in order to come in at the minimum price and try and be competitive is they will tailor the Mil-Std and they will chop out things that they think are unnecessary, or that they can get away without doing and they might chop out some stuff that actually you find that you need. that can cause problems. But also even worse, if you’ve got a contractor who doesn’t understand how to do system safety engineering, who doesn’t understand Mil-Std 882, they might just blindly say, oh, yeah we’ll do that, and the classic mistake is you get in the contract, it says do Mil-Std 882E and here are all the DIDs, data item descriptors which describe what’s got to be in the various documents that the contractor has to provide. And of course, government projects love having lots of documentation, whether it’s actually helpful or not.

But the danger with this is this can mislead the contractor because if they don’t understand what a system safety program is, they might just go, I’ve got to produce all these documents, yeah, I can do that and not actually realise that they’ve got to do quite a lot of analysis work in order to generate the content for those reports. And I know that sounds daft, but it does happen, I’ve seen it again and again. You got a contractor who produces these reports that on paper have met the requirements of the DID because it’s got all the right headings, it’s got all the right columns or whatever else. But it’s full of garbage information or TBD or stuff that is obviously rubbish. And you think, no, no, you actually have to specify, you need to do the task and the documentation is the result of the task. we don’t want the tail wagging the dog. Anyway, I’ll get off my soapbox. You’ve got to impose the task, it’s a job to be done, not just a piece of paper to be produced.

Identification of the functional disciplines to be addressed. who’s going to be involved? What are you including? Are you including engineering, maintenance, human factors? Who’s got to be involved? Ideally, you want quite a wide involvement, you want lots of stakeholders, which you need to think about.

Guidance on obtaining access to government information. Now, whether it’s the government or whoever the purchaser is, it doesn’t have to be a government, getting a hold of information and guidance out of the purchaser can be very difficult. And very often that’s because the purchaser hasn’t done their homework. They haven’t worked out what information they will need to provide because maybe they don’t understand the demands of the task or they’ve just not thought it through, quite frankly. And the contractor or whoever is trying to do the analysis finds that they are hamstrung, they can’t actually do the work without information being provided by the purchaser.

And that means the contractor can’t do the work, and then they just pass the risk straight back to the government, back to the purchaser and say: I need this stuff. And then the purchaser ends up having to generate information very quickly at short notice, which is never good, you never get a quality result doing that. And often my job as a consultant is I get called in by the purchaser as often as I do by the supplier to say help, we don’t know what’s going on here, the contractor has said I can’t do the safety program without this information and I don’t understand what they want or what to tell them. as a consultant, I find myself spending a lot of time providing this kind of expertise because either the purchase or contractor doesn’t understand their obligations and hasn’t fulfilled them. Which is great for me, my firm gets paid a lot of money. It’s not good for the safety program.

Content and format requirements. Yes, we need to specify the content that we need. I say need not want. What are we going to do with this stuff? If we’re not going to do anything with it, do we actually need it at all? And what’s the format requirements? Because maybe we need to take information from lots of different subcontractors and put it all together in a consistent risk register. if it comes in all different formats, that’s going to make a lot more work and it may even make merging the information impossible. we need to think about that.

Now, what’s the concept of operations? We’ll come back to that in later tasks. But the concept of operations is, what are we going to do with this system? that should provide the operating environment. It should provide an overview of some basic requirements, maybe how the system will interface with other systems, how it will interact, concepts of operation deployment basing and maintenance. And maybe they’re only assumptions at this stage, but the people doing the analysis will need this stuff. You recall the environmental stuff is very location sensitive, we need a stab at where these things will happen and we need to understand what the system is going to be used for because in safety, context is everything. A system that might be perfectly safe in one context, if it’s being used not for what it was originally designed for or conceived for, can become very dangerous without anybody realising.

Other specific hazard management requirements. What definitions are we using? Very important because again, it’s very easy to get different information that’s being generated against different definitions by different contractors. And then it’s utter confusion. Can we compare like with like, or can’t we? What risk matrix are we going to use on this program? What normally happens on 882 programs is people just take the risk matrix out of the standard and use it without changing it. Now, that might be appropriate in certain circumstances, but it isn’t always. But I’m going to I’m going to talk about that, that’s a very complex, high-level management issue and I’m going to be talking about that in a separate issue about how do we actually derive a suitable risk matrix for our purposes and why we should do so. Because the use of an unsuitable matrix can cause all sorts of problems downstream, both conceptual problems in the way that we think about stuff and lower levels, sort of mechanistic problems. But I don’t have time to go into that here.

Then references and sources of hazard identification. This is another reason why the purchaser needs to have done their homework. Maybe we want the contractor or whoever is doing the analysis to look at particular sources of information that we consider to be relevant and necessary to consider. we need to specify that and understand what they are. And usually, we need to understand why we want them as well.

Commentary

That’s what was in the standard, as you see it’s very short, is only a page and a half in the standard and it is quite a light, high-level definition of the task because it’s an early task. Now let’s add some value here. Task 201 goes talks all about historical data. However, that is not the only way to do preliminary hazard identification. There are in fact two other classic methods to do PHI. One is the use of hazard checklists and you can also use some simple analysis techniques. And we need to remember that this is preliminary hazard identification, we’re doing this early and often to identify as many hazards as possible to find those hazards and the associated causes, consequences, maybe some controls as well. we’re trying to find stuff, not dismiss it or close out the hazards. And again, I’ve seen projects where I’ve read a preliminary hazard identification report and it says, we closed 50 hazards, and I think, no, you didn’t, you weren’t supposed to close anything because this is preliminary hazard identification. You identify stuff and then it gets further analysed. And if upon analysis, you discover actually this hazard is not relevant, it cannot possibly happen, then, and only then, can you close it. let’s remember, this is preliminary hazard ID.

Commentary – Historical Data

First of all, let’s look at historical data. And first of all some issues with using this historical data, availability. Can we actually get hold of it? Now, it may be that you work for a big corporate or government organisation that for whatever reason has good record keeping and you’ve got lots and lots of internal data that is of good quality that you’re allowed to access and that you know about and you can find or discover. If you are one of those people who are very, very lucky, you are in a minority, in my experience. If you’ve got all that stuff, fantastic, use it. But if you haven’t or if the information is of poor quality or people won’t give you access for whatever reason. And there are all sorts of reasons why people want to conceal information, they’re frightened of what people may discover, especially safety engineers. You may have to go out to external sources.

Now, the good news is that in the age of the Internet, getting hold of external data is extremely easy. There are lots of potential sources of data out there, and it may range from stuff on Wikipedia, public reporting of accidents and incidents by regulators or by trade associations or by learned societies that study these things or by academics or by consultancy such as the one that I work for. There are all kinds of potential sources of information out there that might be relevant to what you’re doing. And even if you’ve got good internal information, it’s probably worth searching out there for what’s external as a due diligence exercise, if nothing else, just to show that you haven’t just looked inwardly, that you’ve actually looked outwardly the rest of the world. There are lots of good sources of information out there. And depending on what industry you’re in, what domain you work in, you will probably know some of the things that are relevant in your area.

Now, just because data is available doesn’t mean that it’s reliable. It might be vague or inconsistent. We’ll come onto that later. It might be patchy. It’s usual for incidents to be underreported, especially minor incidents. you will find often that the stuff that gets reported is only the more serious stuff, and you should really assume that there has been under-reporting unless you’ve got a good reason not to. But to be honest, underreporting is the norm almost everywhere. there’s the issue of reliability, the data that you’ve got will be incomplete.

Secondly, another big issue is consistency. People might be reporting mishaps or incidents or accidents or events or occurrences. They might be using all sorts of different terminology to describe stuff that may or may not be relevant to what you’re talking about. And there’s lots of information out there, but actually, how has it been classified? Is it consistent? Can you compare all these different sources of information? And that can be quite tricky. And very often because of inconsistencies in the definition of a serious injury, for example, you may find that all you can actually compare with confidence are fatalities, because it’s difficult to interpret death in different ways. as a safety engineer, frequently I find myself I start with fatal accidents, if there are any, because those can’t be misinterpreted. And then you start looking at serious injuries, minor injuries, incidents where no one gets hurt, but somebody could have been. There are all sorts of pitfalls with the consistency of the data that you might get a hold of.

There’s relevance. It may be that you’re looking at data from a system that superficially looks similar to yours, but with a bit of digging, you may discover that although the system was similar, it’s being used in a completely different context and therefore there are significant differences in the reporting and what you’re seeing. there may be data that is out there, but just not relevant for whatever reason.

And finally, objectivity. Now, this is a two-way street. Historical data is fantastic for objectivity because it stops people saying subjectively, this couldn’t possibly happen. And I’ve heard this many times, you come up with something and somebody said, oh that couldn’t possibly happen, and then you show them the historical evidence that says, well it’s happened many times already and then they have to eat their words. historical data is fantastic for keeping things objective, provided of course, that it’s available, reliable, consistent and relevant. you’ve got to do a bit of work to make sure that you’re getting good data, but if you can, it’s absolutely worth its weight in gold, not just for Hazard I.D., but for torpedoing some of the stupid things that people come out with when they’re trying to stop you doing your job for whatever reason. historical data is great for shooting down prejudice is basically what I’m saying. reality always wins. That’s true in safety in the real world and in safety analysis.

Having said all that, what’s the applicability of historical data? It may be that really we can only use it for preliminary hazard identification and analysis. (I’ve just noticed I’ve got preliminary hazard identification and analysis.) Sometimes I see contractors try to use historical data to say, that’s the totality of my safety argument, my kit is wonderful, it never goes wrong and therefore it will never go wrong, that’s the totality of my safety argument. And that never works, because when you start trying to use historical data as the complete safety argument, you very quickly come up against these problems of availability, reliability, consistency and relevance.

It’s almost impossible to argue that a future system will be safe purely because it’s never gone wrong in the past. And in fact, trying to make such claims as, it’s never gone wrong, we’ve never had a problem, we’ve never had an incident, straight away that would suggest to me that they don’t have a very good incident reporting system or that they’ve just conveniently ignored the information they do have and not that people selling things ever do anything like that? Of course, no, never. There’s a lot of used car salesman out there. probably this use of historical data, we might have to keep it fairly limited. It might be usable for preliminary work only. And then we have to do the real work with analysis. But almost certainly it’s not going to be the whole answer on its own. do bear that in mind, historical data has its limits.

It’s also worth remembering that we get data from people as well. In Australia, the law requires managers to consult with workers in order to get this kind of information. No doubt in other countries there are similar obligations. there’s lots of people out there, potentially workers, management, suppliers and users, maintainers, regulators, trade associations, lots of people who might have relevant information. we really ought to consult them if we can. Sometimes that information is published, but other times we have to go and talk to people or get them to come to a preliminary hazard I.D. meeting in order to take part. There are lots of good ways of doing this stuff.

Commentary – Hazard Checklists

Let’s move on to hazard checklists. Checklists are great because someone else has done the work for you to a degree that it’s quick and cheap to get a checklist from somewhere and go through it to see if you can find anything that prompts you to go, yeah that could be an issue with my system. And the great thing about checklists is they broaden the scope of your hazard I.D. because if your historical data is a bit patchy or a bit inconsistent as it often is, it will identify some stuff, but not everything. the great thing about a checklist is very often broad and shallow, it really broadens the scope of the hazard I.D., it complements your historical data. I would always recommend having a go with a checklist.

Now, bear in mind that checklists tend to identify causes, you then have to use some imagination to go, okay, here’s a cause, how in the context of my system, how in the context of this concept of operations (very important), in this context, how could that cause lead to a hazard and maybe to a mishap? you need to apply some imagination with your checklist and it can be a good way of prompting a meeting of stakeholders to think about different issues because people will turn up with an axe to grind, they’ll have their favourite thing they want to talk about. Having a checklist keeps it objective or having historical data to review, keep it objective, and it keeps people on track that they don’t just go down a rabbit hole and never look at anything else.

But again, this is preliminary hazard identification only. if something comes up, I would advise you to take the position that it could happen unless we have evidence that it could not. And notice, I say evidence, not opinion. I’ve met plenty of people who will swear blind, that such and such could not possibly happen. A classic one that suckered me, somebody said no British pilot would ever be stupid enough to take off with that problem and like a fool, I believed them. So don’t listen to opinion, however convincing it is, unless there’s evidence to say it cannot happen, because it will. And in that case, it did two weeks later. don’t believe people when they say, oh that couldn’t possibly happen, it just shows a lack of imagination. Or they’ve got some vested interests and they’re trying to keep peace and keep you away from something.

It’s worth mentioning, in Australia at a minimum, we need to use the approach for Hazard I.D. that is in the WHS Risk Management Code of Practice. there’s some good basic advice in that code of practice on what to do to identify and analyse hazards and assess risks and manage them. We need to do it, at a minimum. It’s a good way to start, and in fact, there’s a bit of a hazard checklist in there as well. It’s not great, it’s workplace stuff mainly rather than design stuff or systems engineering stuff. But nevertheless, there’s some good stuff in there and that is the absolute bare minimum that we have to do in Australia. And there will probably be local equivalents wherever you are.

If you’re looking for a good example for a general checklist, if you look in, the UK’s ASEMS systems, which is the MOD acquisition safety and environmental management system, in POSMS, which is the project-oriented safety management system, there is a safety management procedure, SMP04, which is PHI. And that’s got a checklist in there. It’s aimed at sort of big equipment, military equipment, but there’s a lot of interesting stuff in there that you could apply to almost anything. If you look online, you’ll probably find lots of checklists, both general checklists and specialist checklists for your areas, maybe your trade association or whatever has a specialist checklist for the particular stuff, the thing that you do. always good to look up those things online, and see if you can access them and use them. And as I say, using multiple techniques helps us to ensure or have confidence that we’ve got fairly complete coverage, which is something that we’re going to need later on. And dependent on your regulator, you might have to demonstrate that you’ve done a thorough job, using multiple techniques is a good way of doing that. I’ve already said checklists nicely complement historical data because they’ve got different weaknesses.

Commentary – Analysis Technique

A third technique, which again takes a different approach, it complements the other two, is to use some kind of analysis technique to identify hazards. And there are lots of them out there. Again, I’m not going to go into them now in this session, I’m just going to give you one example, which is probably the simplest one I know, and therefore the most cost-effective. Probably it’s a good idea to do it as a desktop exercise and then get some stakeholders in and do it live with the stakeholders, either using what you’ve prepared or keep what you’ve prepared in your back pocket if you need to get things going, if people are stumped, they’re not sure what to do.

Now, this technique I’m just going to talk about is called functional failure analysis (FFA). And really all it does, you take a basic top-level function of whatever it is that you’re considering, you’ve got your concept of operations that says, I need a system to do X, Y, Z, you go, let’s look at X, Y and Z, and with each one of these functions, what happens if it doesn’t work when it’s supposed to work, or what happens if it works when I don’t want it to? That’s the un-commanded function or unwanted function, maybe. And then what if it happens, but it doesn’t happen completely correctly. What if it happens incorrectly? And there might be several different answers to that.

I’ll give you an example. Let’s assume that we were Mercedes Mr Mercedes, and you’re inventing the horseless carriage, you’re inventing the automobile, the car, and you say, this thing, it’s got a motor, I wanted it to start off, I want it to go and then I want to stop. those really, really simple conceptual ideas, I want it to go, or I want it to start moving. What happens if it doesn’t? Well, nothing actually, from a safety point of view. The driver might be a bit frustrated, but it’s not going to hurt anybody. An un-commanded function, what if it goes when it’s not supposed to? Now that’s bad. Or maybe the vehicle will roll away downhill when it’s not supposed to. We need a parking brake, in that case, we need a handbrake it doesn’t do that or use chocks or something or we restrain it.

Straight away, something as simple and simplistic as this, you can begin to identify issues and say, we need to do something about that. this is a really powerful technique, you get a lot of bangs per buck. And then, of course, we could go on with the example, it’s a trivial example, but you can see potentially how powerful it is providing you’re prepared to ask these open-ended questions and answer them imaginatively without closing your mind to different possibilities. there’s an example of analysis technique, and again, remember that this preliminary hazard ID. If we’ve identified something that could happen, then it could happen unless we have evidence that it could not.

Signing Off

I’ve talked for long enough, it just remains for me to point out that the quotations from Mil-Std are copyright free. But this video is copyright of The Safety Artisan 2020. And you can find more safety information, more lessons and more safety resources at my Safety Artisan page on Patreon and also at www.safetyartisan.com. I just want to say that’s the end of the lesson, thank you very much for listening and I hope you’ve found today’s session useful. Goodbye.

Back to the Home Page | Mil-Std-882 Page | System Safety Page

Professional | Pragmatic | Impartial

Mil-Std-882E Preliminary Hazard List (T201) & Analysis (T202)

This is Mil-Std-882E Preliminary Hazard List & Analysis.
Back to: 100-series Tasks.

The 200-series tasks fall into several natural groups. Tasks 201 and 202 address the generation of a Preliminary Hazard List and the conduct of Preliminary Hazard Analysis, respectively.

TASK 201 PRELIMINARY HAZARD LIST

201.1 Purpose. Task 201 is to compile a list of potential hazards early in development.

201.2 Task description. The contractor shall:

201.2.1 Examine the system shortly after the materiel solution analysis begins and compile a Preliminary Hazard List (PHL) identifying potential hazards inherent in the concept.

201.2.2 Review historical documentation on similar and legacy systems, including but not limited to:

  • a. Mishap and incident reports.
  • b. Hazard tracking systems.
  • c. Lessons learned.
  • d. Safety analyses and assessments.
  • e. Health hazard information.
  • f. Test documentation.
  • g. Environmental issues at potential locations for system testing, training, fielding/basing, and maintenance (organizational and depot).
  • h. Documentation associated with National Environmental Policy Act (NEPA) and Executive Order (EO) 12114, Environmental Effects Abroad of Major Federal Actions.
  • i. Demilitarization and disposal plans.

201.2.3 The contractor shall document identified hazards in the Hazard Tracking System (HTS). Contents and formats will be as agreed upon between the contractor and the Program Office. Unless otherwise specified in 201.3.d, minimum content shall included:

  • a. A brief description of the hazard.
  • b. The causal factor(s) for each identified hazard.

201.3 Details to be specified. The Request for Proposal (RFP) and Statement of Work (SOW) shall include the following, as applicable:

  • a. Imposition of Task 201. (R)
  • b. Identification of functional discipline(s) to be addressed by this task. (R)
  • c. Guidance on obtaining access to Government documentation.
  • d. Content and format requirements for the PHL.
  • e. Concept of operations.
  • f. Other specific hazard management requirements, e.g., specific risk definitions and matrix to be used on this program.
  • g. References and sources of hazard identification.

TASK 202 PRELIMINARY HAZARD ANALYSIS

202.1 Purpose. Task 202 is to perform and document a Preliminary Hazard Analysis (PHA) to identify hazards, assess the initial risks, and identify potential mitigation measures.

202.2 Task description. The contractor shall perform and document a PHA to determine initial risk assessments of identified hazards. Hazards associated with the proposed design or function shall be evaluated for severity and probability based on the best available data, including mishap data (as accessible) from similar systems, legacy systems, and other lessons learned. Provisions, alternatives, and mitigation measures to eliminate hazards or reduce associated risk shall be included.

202.2.1 The contractor shall document the results of the PHA in the Hazard Tracking System (HTS).

202.2.2 The PHA shall identify hazards by considering the potential contribution to subsystem or system mishaps from:

  • a. System components.
  • b. Energy sources.
  • c. Ordnance.
  • d. Hazardous Materials (HAZMAT).
  • e. Interfaces and controls.
  • f. Interface considerations to other systems when in a network or System-of-Systems (SoS) architecture.
  • g. Material compatibilities.
  • h. Inadvertent activation.
  • i. Commercial-Off-the-Shelf (COTS), Government-Off-the-Shelf (GOTS), NonDevelopmental Items (NDIs), and Government-Furnished Equipment (GFE).
  • j. Software, including software developed by other contractors or sources. Design criteria to control safety-significant software commands and responses (e.g., inadvertent command, failure to command, untimely command or responses, and inappropriate magnitude) shall be identified, and appropriate action shall be taken to incorporate these into the software (and related hardware) specifications.
  • k. Operating environment and constraints.
  • l. Procedures for operating, test, maintenance, built-in-test, diagnostics, emergencies, explosive ordnance render-safe and emergency disposal.
  • m. Modes.
  • n. Health hazards.
  • o. Environmental impacts.
  • p. Human factors engineering and human error analysis of operator functions, tasks, and requirements.
  • q. Life support requirements and safety implications in manned systems, including crash safety, egress, rescue, survival, and salvage.
  • r. Event-unique hazards.
  • s. Built infrastructure, real property installed equipment, and support equipment.
  • t. Malfunctions of the SoS, system, subsystems, components, or software.

202.2.3 For each identified hazard, the PHA shall include an initial risk assessment. The definitions in Tables I and II, and the Risk Assessment Codes (RACs) in Table III shall be used, unless tailored alternative definitions and/or a tailored matrix are formally approved in accordance with Department of Defense (DoD) Component policy.

202.2.4 For each identified hazard, the PHA shall identify potential risk mitigation measures using the system safety design order of precedence specified in 4.3.4.

202.3 Details to be specified. The Request for Proposal (RFP) and Statement of Work (SOW) shall include the following, as applicable:

  • a. Imposition of Task 202. (R)
  • b. Identification of functional discipline(s) to be addressed by this task. (R)
  • c. Special data elements, format, or data reporting requirements (consider Task 106, Hazard Tracking System).
  • d. Identification of hazards, hazardous areas, or other specific items to be examined or excluded.
  • e. Technical data on COTS, GOTS, NDIs, and GFE to enable the contractor to accomplish the defined task.
  • f. Concept of operations.
  • g. Other specific hazard management requirements, e.g., specific risk definitions and matrix to be used on this program.

Forward to the next excerpt: Task 203

Back to the Home Page | Mil-Std-882 Page | System Safety Page

Professional | Pragmatic | Impartial