Categories
Work Health and Safety

Risk Management Code of Practice

In this 40-minute session, we look at the Risk Management Code of Practice (CoP). We cover: who has WHS duties; the four-step process; keeping records, appendices & a summary of detailed requirements; and further commentary. This CoP is one of the two that are generally applicable.

The Risk Management Code of Practice (Demo of the full, 40-minute, video).

Risk Management Code of Practice: Topics

Risk Management Code of Practice (CoP):

  • Who has WHS duties;
  • The four-step process;
  • Keeping records, appendices & summary of detailed requirements;
  • Further commentary; and
  • Where to get more information.

Risk Management Code of Practice: Transcript

Risk Management Code of Practice: Transcript

Hello, everyone, and welcome to the Safety Artisan. I’m Simon, your host, and today we’re going to be talking about the Risk Management Code of Practice.

Today we’re talking about the Risk Management Code of Practice. It’s a code of practice that I’ve used myself. I’ve used it to guide my work and to guide other people to help them in their work. I’ve used it to simplify the whole practice of what we do because once you know what you’re supposed to do, you can do that and then you don’t have to worry about working out what you need to do. And conversely, it’s giving you everything you need to do so you can do more if you want to, but you don’t have to. So, it makes life a lot easier and simpler. And then finally, you can use it to justify what you’ve done. That what you’ve done is correct, and what you’ve done is complete and is enough. So, it’s very useful and that’s why I’m teaching it because it makes life easier.

And I’m going to explain how to use it- you’ll still need to go away and read the Code of Practice, as you’ll see, to get all the details – but I’m going to go through the leading particulars and explain how to use it. And then finally, at the end of the session, I’m going to show you where you can get more help on this topic and indeed other related topics because this Code of Practice is one of several. And there’s one other that you must refer to. This Risk Management Code of Practice is one that you really can’t do without. There is one more and then the others are optional, depending on whether you’re working in their respective areas. Anyway, let’s get on with it.

Code of Practice: Risk Management

So we’re talking about the Risk Management Code of Practice, which is under Australian Work Health and Safety Law. Now, if you’re not operating in Australia, this is not a requirement for you but nevertheless, it does contain some very useful guidance. And I’ve seen similar requirements in the US and in the UK, and I suspect all across the English-speaking world.

Topics for this Session

So, what we’re going to cover today. First of all, who has WHS duties because it’s a wider group of people than you might think it is. There’s the four-step process for actually doing risk management. And then I think we’ve got a slide each on keeping records, the appendices in the Code of Practice, and a summary of the detailed requirements in the Code of Practice. Then I’ve provided some further commentary and, as I’ve said before, where to get more information.

Who has WHS Duties?

So, first of all, who has WHS duties? Well, it’s kind of everybody. First of all, if you are a person conducting a business or undertaking or a PCBU for short, then you have duties. And it says business or undertaking, so it includes voluntary groups, non-profit, government, military, you name it. It doesn’t have to be a commercial business. Then you have duties if you are a designer, manufacturer, importer, supplier, or if you install test or commission plant substances or structures. So again, a wide range of people.

And it’s not just about managing safety in a workplace. There’re lots of duties on duty holders with upstream software- sorry not software, upstream safety duties. Like designers and manufacturers. Then finally, officers have additional duties and an officer basically is like a director of a company that sort of level. So, senior management with control over resources and they have to provide due diligence. So, there’s a bunch of requirements on them as well. And then, of course, there’s the workers and any visitors. They’ve got to cooperate and take reasonable care of themselves and look out for each other, which is all very important.

And as it says, and this is a quote from the CoP, “A person can have more than one duty at the same time, and more than one person can share the same duty”. So, you can’t go playing tag, as it were. A sort of a responsibility tag. ‘It wasn’t me. It was him. Governor!’ The court ultimately decides who is responsible.

A Four-Step Process

So, in our four-step process, we have; first of all, we have to identify hazards. We have to assess the risks. So, we need to look at causes and consequences. And the CoP doesn’t say this, but exposure comes into it as well. So, a risk might be present, but if nobody is exposed to that risk, then you can’t hurt them. So, that’s an important point to remember. And controlling exposure is important to one degree or another in almost all areas, but very important in certain industries. Those industries that have got the real estate to be able to separate the risky thing from the human and this is very useful. So step three, we have to control risks. And then step four, we have to review control measures because it’s recognized that these control measures will be in place for some time, for the lifetime of whatever it is we’re doing or undertaking. So, they need to be periodically reviewed and there’s guidance on that.

Now, I keep saying guidance – take a look at the introduction to Codes of Practice and you will see why Codes of Practice are a bit more than guidance. They are guidance that you cannot afford to ignore because if things go wrong, you will get hung out to dry based on what CoP said you should have done. So, if you are ignorant of what CoP said and haven’t done it, then you’re stuffed basically before you even start. That’s point one to note.

And secondly, you’ll notice in the diagram on the left, we’ve got management commitment at the centre and we’ve got consultation all the way around. And there’s another Code of Practice, the Code of Practice on Communication, Cooperation and Coordination . So the C,C&C CoP and that is the other CoP that is essential. So, this one and the C, C and C CoP you must have a look at because they apply to everything in effect. Let’s move on.

Step 1, Identify Hazards

So, first of all, we need to identify hazards. Now, CoP is written for any Australian business or undertaking, so it’s pretty basic. It’s pretty pragmatic, but it’s pretty basic and it’s got a workplace focus. So, it says inspect the workplace, look around, talk to your workers. Now, I work in a business and day job for a consultancy where we, generally speaking, are not looking at an existing workplace, but we’re helping a customer buy or assure a complex product that’s going to come into service at some time in the future. So, there are no current workers to discuss, but we always do try and include end-user representatives in our safety workshops. So, you may not be able to consult workers directly, but you should try and include people who have relevant work experience.

Secondly, the CoP tells us to use good work design and safe design. Now that’s a whole topic in itself and I’ve got some guidance on safe design. If you go to that safety artisan.com page on safe design (www.safetyartisan.com/welcome/safe-design), you will see it and I’ll take you through the subject and refer you on to the source material itself.

Thirdly, we need to consult supply chains and networks. I think that works two ways. First of all, when you get people to supply you stuff, make sure that they supply the data that you need. The safety data, all the information that you need to take and use the product safely. And that’s part of the duty on all of these duty holders, on the designer, the manufacturer, the importer, the supplier. They all have duties to pass on the relevant safety information but make sure you ask for it in your contract. And secondly, suppliers, particularly if you’re buying an expensive piece of kit off them, suppliers can be an excellent source of information. If they’re the designers, then they know this kit better than anybody else. Make use of their expertise, contract them to do some work for you and take part of the load off you. They are best placed to do some of the work, so get them to do it.

And then fourthly, it says review available information. Now, this is very important. There’s historical information or there should be – it’s not always easy to come by sometimes. Do make the effort to get actual historical information for your piece of kit, maybe from the supplier. Or if you can’t do that, if it’s a new piece of kit, then try and get information on similar equipment, or services, or functionality, or go to a trade organization, or go to the regulator depending on what domain you’re in. Do look around for historical information. It is out there. It can be hard to find, but it is worth the effort because, again, the guidance requires it. So, if you don’t do it, if you don’t bother or you’ve not made reasonable efforts to do so, you’ll get clobbered if things go wrong.

And then it’s also advisable to compliment that historical information with diverse approaches. One of them is you can use a hazard checklist approach, and we talk about that in the session on preliminary hazard identification. There are lots of checklists freely available out there on the Internet. Some are general and some are more specific to different pieces of kit or different domains. Try and find the most relevant one for you and use it. And then maybe there are specific safety analyses techniques that you can use as well so have a go at those. And a lot of them are quite simple so don’t be put off. You don’t have to necessarily have to get an expensive consultant in to do this for you. A lot of these techniques are really quite simple and just require a bit of imagination and a little bit of self-discipline in the way you go about it. And I talk about analysis methods for hazard identification in that same session on Preliminary Hazard Identification (PHI).  

So, that’s identifying hazards.

Step 2, Assess Risks

Step two, we need to assess the risks. So, if we recall risk is a combination of likelihood and severity. So, how likely is the harm could arise? And how severe is that harm? The way to do that, the CoP says, is to work out how hazards may cause harm. And as always, don’t be afraid to ask the dumb questions. That’s part of my job as a consultant. You’re allowed to turn up and ask dumb questions. Or maybe sensitive questions that nobody in the firm dares to ask because they think they get fired. So, be brave and do try and work out how to ask the questions in a non-threatening way, but do ask the questions.

Work out how severe the harm could be. What is the worst credible consequence? And also, to keep it simple, what’s the worst direct consequence? Yes, you can come up with a fanciful chain of events that will lead to ‘it’s the end of the world as we know it’, but keep it direct would be my advice. At least to start with. It’s better to get a range of stuff than to work one scenario to the nth degree, I would suggest.

Then work out the likelihood of that harm occurring. Very often the most severe harm can only occur when there is a particular combination of circumstances. And if you read any kind of accident report, even in the press, you’ll very often say this was happening and it just so happened on this particular day that somebody wasn’t available to supervise and then this went wrong and something else went wrong. And then the final result of this chain of consequences was somebody gets hurt. So, do factor in all of those things.

There are probably lots of existing controls already unless you’re doing something very novel indeed, which is unusual. So, do look at what’s there and record it all. Conversely, do be aware of the ‘it will never happen brigade’ is I’ve met several people who say, ‘Oh, that will never happen; or was it ‘No British pilot would be stupid enough to do that. Ho, ho, ho.’ I was foolish enough to believe that. Anyway, that’s another story. So, don’t believe the people who say, ‘It can never happen’. Well, if I say, ‘OK, what’s the justification? Why can it never happen? Where’s the evidence for that claim?’ So, do dig into those responses.

There’s more detail in the Code of Practice. There are some good questions to ask in the workplace. And with a bit of imagination, you can take your imaginary piece of kit and sort of think about it in the workplace and go, ‘Well, let’s think up a suitable question.’ So, there’s good guidance in there. Historical data can’t be beat as a reality check and it shuts up the naysayers as well because if you can pull out information, say, ‘Well this accident has happened and it’s happened lots of times to lots of good people who thought they were clever’. So, it shuts up the naysayers do work hard to get the historical data. It’s fantastic if you can get it.

And then, as I said before, there are multiple specialist cause and consequence analysis techniques available. I talk about some of them and in other posts that I’ve already done, and I will talk about more in the future. But you may not need that level of sophistication. It’s always better to do some good basic work as early as you can. Then maybe if you come up against something and say, ‘We’re not cracking this. We suspect there’s a problem, but we can’t be sure’ then think about bringing out big guns. But if you’ve done the basic work first, that will really help you zero in on the areas where you think you need to do more work.

Step 3, Control Risks

The third one, controlling risks. Really, this is what it’s all about because you can do all the analysis you like, but you don’t do analysis for the sake of it. You do analysis in order to inform your selection of risk controls. And we are required to use a hierarchy of control measures, and that’s a legal requirement in Australia. It’s also a requirement in other jurisdictions and in other many other standards – safety standards that you’ll see it just may not be called this. But it will talk about more and less effective controls.

At the top of the control hierarchy, we’ve got the most effective control which is to eliminate the risk entirely. And by that, I mean you get rid of it. Let’s say you’re working in an explosive atmosphere and you’ve decided you don’t want any electrical devices in that explosive atmosphere. So, if you need to have power for machinery, you’re going to do it with pneumatics, let’s say, or hydraulics. So, you’ve eliminated the electrical risk. Elimination does not mean massaging the probability figures to get them very low and then you have eliminated the risk you have not. You’ve just played games with probability figures. So first off, that’s what elimination really means.

The second level, you’ve got three choices. We can substitute something hazardous with a safer alternative. I’ve mentioned getting rid of electricity entirely. You could say, ‘Well, I’ve got hydraulics, but they can burst and cause damage so I’ll have something else. Or let’s say there was a particular lubricant, which is ideal, but actually it’s quite dangerous this lubricant, so we’ll pick something safer. Maybe it doesn’t perform quite as well. Or a refrigerant, let’s say, an ideal refrigerant might be a potent greenhouse gas so we go ‘We’re going to have something else instead’.

You can isolate the hazard from people – I’ve spoken about that before. Some industries you’ve got a lot of real estate to play with. You can keep the hazard away from people. Or you can reduce the risk through engineering controls. And by engineering controls, I mean, you can build a safety feature or an interlock or something physically into the product. You’re not relying on a person to avoid the risk. It’s been done for them. It’s automatic or built-in.

At third level, we can use admin controls. So we can give people procedures and rules and we can say, ‘Do this, don’t do that’. And most of the time they’ll probably do it and obey the rules, but sometimes they won’t. And sometimes for good reason, by the way, because people come up with ridiculous rules that can’t be obeyed or that make the task or the job so difficult that people break the rules all the time because that’s the only way to get the job done effectively. So, do be aware of putting silly controls onto people because they won’t get obeyed. It’s your responsibility to consult the workers and come up with something practical.

And then finally, we can use personal protective equipment. Now that doesn’t do anything to the probability of the accident, but it reduces the severity. So, for example, if I’m wearing a hard hat, something falls on my head. It reduces the severity of the accident. If I’m wearing protective goggles and there’s a spark or a piece of debris flies out of the machine. If I’m wearing the goggles, it just bounces off probably and saves my eyes. So, there’s a couple of really good examples of where the PPE will help us. And of course, in this season of COVID, we’ve all got PPE bonkers. It’s become headline news all over the world. So, we all now know what PPE is, which is great. Well, and it’s not great. It’s terrible, but it’s good for knowledge.

So, we have to work through that hierarchy in that order. We have to see whether it’s feasible to eliminate the risk to start at the top with the most effective controls and work our way down. We have to do that. And the subject of another chat, another lesson, we have to apply all reasonably practical controls in order to say that we have eliminated or minimized risks SFARP. So far as is reasonably practicable. So, we’ve got to apply all reasonably practical controls. I’ll explain exactly what that means in a separate session.

Aside: Control Effectiveness

A Quick aside: are controls effective? I’ve sort of hinted at this before about the admin stuff. How do we get effective controls? Well, the CoP says we need people to be accountable for health and safety. We need maintenance of plant and equipment. We need up to date training and competency for our people. We need up to date hazard information – that’s a duty in its own right. And we need regular review and consultation. And you’ll find out about that in the CC&C CoP in my next lesson.

Now, these things are required everywhere, they can be achieved informally. If you work in a high-risk industry, you’ll probably have a thing called a safety management system. And your safety management system will be documented in a safety management plan. And typically, the safety management system is the thing that delivers all of these things, all five of these things and much more. So, that’s what you’ll probably end up doing.

First thing to say on that, of course, is that this information has got to be generated. You’ve got to get it from source and it’s usually the designer, the manufacturer, and the installer, and the testers who can provide this information. So, do make sure that you are imposing requirements on your suppliers, on your subcontractors to do this stuff and to provide you with the information. It is their duty to do so. It’s a legal duty, but you’re probably still going to have to pay for it and say when you want it and in what format that’s most useful to you and all the other good stuff.

Step 4, Reviewing Controls

Step four, which is maybe not so obvious. We’ve got some controls, we’re up and running, we need to review those controls. Well, why would we review them? First of all, if you’ve discovered that the control measure is not effective. So, you might have had some incident data., you might’ve had some near misses. Or you might have some reliability data that says ‘My control isn’t as reliable as I thought it was going to be’. But of course, to be aware of that, you’ve got to be collecting this information and you’ve got to be on the lookout for it.

So, you do need a workable incident reporting system and you do need to encourage people to use it and use it either anonymously or honestly. So, that’s where a good safety culture comes in, where you do not punish people for telling the truth. Where you encourage and reward them for the reporting stuff and making things better, you champion. And that’s where management commitment comes in.

The other point where the guidance says you have to do it is if you’re making any kind of change that’s likely to alter or give rise to new risks and you suspect that the existing control measures may not be effective. So, you’re going to make some kind of change – you’ve got to review what you’re doing. But of course, how would the PCBU know that unless they’d actually sort of basically documented the baseline situation? So, you’ve got to have some kind of control over your workplace or over your product or functionality to know what your current situation is and to know that a change is coming. You’ve got to have some kind of baseline control and change control to be able to do that. As I say, it doesn’t have to be that complicated, you just control what goes on at the workplace.

You’ve got to do it if you’ve identified a new hazard or risk. Once you’ve identified something, you’ve got to kind of start from scratch. But that’s okay because hopefully, you’ve already got all of the background analysis that you’ve done. So, you know what you’ve done in the past and therefore you can spot what the delta is. I’m anticipating the record-keeping, but this is where good record keeping really helps you when it comes to managing change. Because if you’ve documented the baseline and understand it, change is relatively straightforward.

Another reason, maybe you’ve consulted with workers or health and safety representatives and you’ve discovered those consultations suggest that a review is necessary. Or maybe a health and safety representative requests a review. In that case, you need to do one.

So those are the five cases where you must conduct a review of controls in order to keep things safe. And very often that’s how accidents occur. We start pretty well and then over a period of time, maybe years or decades, slowly our performance degrades over time or we get a bit blasé about stuff because we’ve never had a problem or so we think. If you’ve got poor incident and near-miss reporting, you won’t be aware of the problems that are happening. So, things slide over time so maybe it’s a good idea to have a periodic review even if you haven’t had any of these triggers. So, that’s a good idea as well. I don’t think it’s in the Code of Practice, but it’s sensible.

Keeping Records

Those are the four steps. Now let’s talk about these three other things, the first of which is keeping records. As it says, keeping records demonstrates what you have done. So, if you have a problem and the regulator comes round to inspect you or maybe even consider shutting you down or issuing a notice to improve or prohibition, then the fact that you’ve got some documentation is going to help you. And also helps you with downstream risk management activities, as I’ve just said.

Then also, there are some specific recordkeeping requirements for particular hazards. So, if you’re exposing people to noise or certain chemicals that may accumulate in the body, then you’re almost certainly going to have to have a monitoring program and a tracking program to keep an eye on this stuff and monitor people’s exposure. So, if you if you’ve got those particular hazards, then there’s going to be some very specific requirements on you that you have to meet and you must keep the records for the time periods required. In general, I would advise keeping the records for at least the life of the system, equipment service, whatever it is, and then a few years afterwards. Just in case there’s an issue that emerges later on. Exactly what you do is up to you.

And from a pragmatic point of view, I would say from experience precision and clarity in record-keeping is so important. Work hard on precision. It might sound like you’re being a bit anal about the way you record stuff if you feel you’re overdoing it, believe me, you are not. Make it simple. Make it crystal clear what you mean. Be very specific and precise as you can and then your records will be a lot more use. I put my hand up and say I’ve written stuff down and then a couple of years or even a few months later, I’ve gone back to something I’ve written down and thought, ‘What did I mean by that?’ Ambiguity is very easy to achieve so write some stuff down. Get somebody else to independently look at it for you and say’ What do you understand that to mean?’ Because English, unfortunately, is a very ambiguous language, very flexible.

Appendices

So, going back to the CoP, in particular, there are four appendices to the CoP. First of all, in A there’s a glossary of terms, which is very useful. Appendix B, we got some examples of a risk management process. Appendix C, there’s some help and guidance on assessing how things can go wrong. And then in Appendix D, there is a sample format blank risk register for you to use if you haven’t got anything else. And all of these examples and appendices, they are simple. They are workplace focused. As I say, if you work in a high-risk domain, maritime, aviation, you work with flammable chemicals or a big industrial plant, the CoP is not going to be sophisticated enough for your use. You’re going to have to meet and exceed it but you’re probably going to be using a standard that requires far more than what the CoP asks for. And that’s okay.

Detailed Requirements

But looking at it the other way around, the CoP is where everybody needs to start and there are some detailed requirements in each Code of Practice. And in this one, the words ‘must’, ‘requires’ or ‘mandatory’ tell you that there is a legal requirement that must be complied with. There are 35 ‘musts’, 39 ‘required’ of various kinds, and three instances are ‘mandatory’ in this Code of Practice. So, you’ve got to obey them.

Then there’s the word ‘should’, which indicates a recommended course of action and ‘may’ is an option. There are 43 ‘shoulds’ in this document and 82 ‘mays’. Again, my advice would be if it’s a ‘should’, I would do it unless you’ve got a reason not to. In which case you should probably write down why you’re not doing it. And that’s perfectly okay. If it isn’t going to work in your circumstances, or you don’t think it’s reasonable to do something, or you’ve got another way of doing it, which is better. Great. Do that, write it down.

And then the ‘mays’ are options so if you think they’re going to be useful and helpful, do it. If not, you don’t have to. There’re the different levels of compliance that you’ve got in the Code of Practice. And those three levels are in all the Codes of Practice.

Commentary

So, I’ve gone through what’s in the Code of Practice, I’m just going to give you a brief resumé of what I think is good advice based on personal and practical experience. I’ve said it already, but a quick reminder, Code of Practice provide minimum requirements. So, you do need to start with CoP and probably as the risk gets higher in whatever industry you’re in, you need to do more with higher-risk or to manage higher-risk.

It does have a workplace focus, so it isn’t a lot to use if you’re a designer and you’re trying to work out ‘What safety margins do I need? I need to do a design trade-off’. I know I’ve sort of leaked into the final point. The CoP won’t help you do that. You’ll need a more sophisticated approach, probably based on standards and tolerability. So, the CoP won’t help you with this sophisticated design decisions and trade-offs, and how much margin is enough. You’re probably going to have to go to standards and industry good practice for that.

And, really, what we’re now talking about is, are the risks are SFARP. Have we done everything that’s reasonably practicable? So first of all, have we done enough? Look at the definition of reasonably practicable, which is in Section 18 of the WHS Act. And if you look at that definition, you’ll find that it is a risk assessment process. So, by following the risk management CoP, the risk assessment process, you will have inherently begun to address SFARP. And you need to do that to demonstrate that you reduce risks SFARP. Then deciding how much is enough, well that depends on the particular risk. A simple approach may suffice and for most instances, for some risks can have to do some more sophisticated work. Which will take you beyond the bounds of the CoP.

And then the last point I’m going to make is the Codes of Practice, not just this one but all of them will repay careful reading. There are some detailed requirements in there and they contain lots of good, sensible, pragmatic advice. And if you have to write a safety management plan or a hazard management plan, then do go to CoP and steal the wording. Don’t make stuff up when you don’t have to. If the CoP tells you what to do and that’s part of your solution just copy and paste it. Use it – you’re allowed to!

Do pay attention to the copyright where you go to do make sure you get the right version of CoP for your jurisdiction. So, if it’s a federal workplace you need the Commonwealth version of CoP. If it’s commercial, then you probably state and territory. So, go to the correct regulator’s website, find the right CoP. You will probably find that the copyright allows you to copy and paste absolutely everything out of the CoP. So, do that and save yourself some work. And also, if you’ve done that it’s very easy to demonstrate that you’ve met the requirements of CoP because you’ve copied them. What could be easier? Save yourself some hassle.

As a consultant, I never make up anything unless I can’t possibly avoid it. So, do use the stuff out there because CoP has been developed for you by a bunch of people in consultation. Lots of people have put a lot of hard work into coming up with a good CoP, which is authorised by the relevant government minister. So, use it, don’t ignore it. It’s there to help you.

Copyright & Attribution

Now, I’ve mentioned that you can dig this stuff out of the right website, and that’s exactly what I’ve done. So, any words that you see in italics, in speech marks, I have lifted from the Federal Register of legislation and I’m allowed to do so under the terms of the Creative Commons license. And as part of the terms of that license, I’m required to tell you that I got this stuff on the 15th of August 2020. But you should always go to the www.legislation.gov.au website to check that you’re using the latest version. Don’t rely on what I’ve said, go and check you using the latest version. And for more information on what you can and can’t do with this Creative Commons license, I’ve got a page at the Safety Artisan that sets out what my obligations are and you’ll be able to see that I’ve met them.

For More…

And then for more information, if you’d like to get free video lessons on safety and free previews of paid content, do please go look at the Safety Artisan channel on YouTube and hit that subscribe- Yes, please! And you will then be informed of whenever a new video comes out which you believe you will find very helpful. And then for all lessons and resources, you can go to www.safetyartisan.com. And as you can see, it’s a secure website, so you’re safe to browse there. Go and have a look at the stuff that’s on there. This lesson is there, as are many others.

End

So that’s the end of our lesson for today, and we’ve gone on for almost 40 minutes. That’s because there’s a lot of good stuff out there to talk about. So just remains me to say thanks very much for tuning in and bothering to listen to this. Thank you for supporting the Safety Artisan. Your subscription, your money, enables me to carry on doing this stuff, and I hope you and many others will find it helpful. So, thanks very much. Bye-bye.

End: Risk Management Code of Practice

You can find the Model Code of Practice here.  Back to the Topics Page.

Categories
Mil-Std-882E Safety Analysis

System of Systems Hazard Analysis

In this full-length (38-minute) session, The Safety Artisan looks at System of Systems Hazard Analysis, or SoSHA, which is Task 209 in Mil-Std-882E. SoSHA analyses collections of systems, which are often put together to create a new capability, which is enabled by human brokering between the different systems. We explore the aim, description, and contracting requirements of this Task, and an extended example to illustrate SoSHA. (We refer to other lessons for special techniques for Human Factors analysis.)

This is the seven-minute demo version of the full 38-minute video.

System of Systems Hazard Analysis: Topics

  • System of Systems (SoS) HA Purpose;
  • Task Description (2 slides);
  • Documentation (2 slides);
  • Contracting (2 slides);
  • Example (7 slides); and
  • Summary.

Transcript: System of Systems Hazard Analysis

Click here for the Transcript

Introduction

Hello everyone and welcome to the Safety Artisan. I’m Simon and today we’re going to be talking about System of Systems Hazard Analysis – a bit of a mouthful that. What does it actually mean? Well, we shall see.

System of Systems Hazard Analysis

So, for Systems of Systems Hazard Analysis, we’re using task 209 as the description of what to do taken from a military standard, 882E. But to be honest, it doesn’t really matter whether you’re doing a military system or a civil system, whatever it might be – if you’ve got a system of systems, then this will help you to do it.

Topics for this Session

Looking at what we’ve got coming up.

So, we look at the purpose of system of systems – and by the way, if you’re wondering what that is what I’m talking about is when we take different things that we’ve developed elsewhere, e.g. platforms, electronic systems, whatever it might be, and we put them together. Usually, with humans gluing the system together somewhere, it must be said, to make it all tick and fit together. Then we want this collection of systems to do something new, to give us some new capability, that we didn’t have before. So, that’s what I’m talking about when I say a system of systems. I’ll show you an example – it’s the best way. So, we’ve got a couple of slides on task description, a couple of slides or documentation, and a couple of slides on contracting. Tasks 209 is a very short task, and therefore I’ve decided to go through an example.

So, we’ve got seven slides of an example of a system of systems, safety case, and safety case report that I wrote. And hopefully, that will illustrate far better than just reading out the description. And that will also give us some issues that can emerge with systems of systems and I’ll summarize those at the end.

SOSHA Purpose

So, let’s get on. I’m going to call it the SOSHA for short; Systems of Systems Hazard Analysis. The purpose of the SOSHA, task 209, is to document or perform and document the analysis of the system of systems and identify unique system of systems hazards. So, things we don’t get from each system in isolation. This task is going to produce special requirements to deal with these hazards, which otherwise would not exist. Because until we put the things together and start using them for something new – We’ve not done this before.

Task Description (T209) #1

Task description: As in all of these tasks, the contractor shall perform and document an analysis of the system of systems to identify hazards and mitigation requirements. A big part of this, as I said earlier, we tend to use people to glue these collections, these portfolios, of systems together and humans are fantastic at doing that. Not always the ideal way of doing it, but sometimes it’s the only way of doing it within the constraints that we’ve got. The human is very important. The human will receive inputs from one or more systems and initiate outputs within the analysis and in fact within the real world, to be honest, which is what we’re trying to analyse. That’s probably a better way of looking at it.

And we’ve got to provide traceability of all those hazards to – it says – architecture locations, interfaces, data and stakeholders associated with the hazard. This is particularly important because with a system of systems each system tends to come with its own set of stakeholders, its own physical location, its own interfaces, etc, etc. The issue of managing all of those extraneous things and getting the traceability, it goes up. It is multiplied with every system you’ve got. In fact, I would say it was the square of. The example we’ll see: we’ve got three systems being put together in a system of systems and, in effect, we had nine times the amount of work in that area, I would say. I think that’s a reasonable approximation.

Task Description (T209) #2

Part two of the task description: The contractor will assess the risk of each hazard and recommend mitigation measures to eliminate the hazards. Or, very often, we can’t eliminate the hazards to reduce the associated risks. Then, as always with this standard, it says we’re going to use tables one, two and three, which are the severity, probability and the risk matrix that comes with the standard. Unless, of course, we have created or tailored our own matrix. Which we very often should do but it isn’t often done – I’ll have to do a session on how to do tailoring a matrix.

Then the contractor has got to verify and validate the effectiveness of those recommended mitigation measures. Now, that’s a really good point and I often see that missed. People come up with control measures or mitigation measures but don’t always assess how effective they’re going to be. Sometimes you can’t so we just have to be conservative but it’s not always done as well as it could be.

Documentation (T209) #1

So, let’s move on. Documentation: So, whoever does the analysis- the standard assumes it’s a contractor – shall document the results to include: you’ve got to describe the system of systems, the physical and functional characteristics of the system of systems, which is very important. Capturing these things is not a given. It’s not easy when you’ve got one system, but when you’ve got multiple systems, some of which are being misused to do something they’ve never done before, perhaps, then you’ve got to take extra care.

Then basically it says when you get more detail of the individual systems you need to supply that when it becomes available. Again, that’s important. And not only if the contractor supplies it, who’s going to check it? Who’s going to verify it? Etc., etc.

Documentation (T209) #2

Slide two on documentation: We’ve got to describe the hazard analysis methods and techniques used, providing a description of each method and technique used, and the assumptions and the data used in support. This is important because I’ve seen lots of times where you get a hazard analysis’ results and you only get the results. It’s impossible to verify those results or validate them to say whether they’ve been done in the correct context. And it’s impossible to say whether the results are complete or whether they’re up to date or even whether they were analysing the correct system because often systems come in different versions. So, how do you know that the version being analysed was the version you’re actually going to use? Without that description, you don’t know. So, it’s important to contract for these things.

And then hazard analysis results. What contents and formats do you want? It’s important to say. Also, we’re going to be looking to put the key items, the leading particular’s, from the results. The top-level results are going to go into the hazard tracking system which is more commonly known as a hazard log or a risk register, whatever it might be. Might be an Excel spreadsheet, might be a very fancy database, but whatever it’s going to be you’re going to have to standardize your fields of what things mean. Otherwise, you’re going to have – the data is going to be a mess and a poor quality and not very usable. So, again, you’ve got a contract for these things upfront and make sure you make clear definitions and say what you want.

Contracting #1

Contracting; implicitly, we’ve been talking about contracting already, but this is what a standard says. So, the request for proposal or statement of work has got to include the following. Typically we have an RFP before we’ve got a contract, so we need to have worked out what we need really early in the program or project, which isn’t always done very well. To work out what you need the customer, the purchaser, has probably got to do some analysis of their own in order to work all this stuff out. And I know I say this every time with these tasks, but it is so important. You can’t just dump everything on the contractor and expect them to produce good results because often the contractor is hamstrung. If you haven’t done your homework to help them do their work, then you’re going to get poor results and it’s not their fault.

So, we’ve got to impose the requirement for the task if we want it or need it. We’ve got to identify the functional disciplines. So, which specialists are going to do this work? Because very often the safety team are generalists. They do not have specialist technical knowledge in some of these areas. Or maybe they are not human factor specialists. We need somebody in, some human factor specialists, some user representatives, people who understand how the system will be used in real life and what the real-world constraints are. We need those stakeholders involved – That’s very important. We’ve got to identify those architectures and systems which make up the SOS -very important. The concept of operations. SOS is very much about giving capability. So, it’s all about what are you going to do with the whole thing when you put it together? How’s all that going to work?

Contracting #2

Interesting one, E, which is unique, I think, to task 209, what are the locations of the different systems and how far apart are they? We might be dealing with systems where the distance between them is so great that transmission time becomes an issue for energy or communications. Let’s say you’re bouncing a signal from an aircraft or a drone around the world via a couple of satellites back to home base. There could be a significant lag in communications. So, we need to understand all of these things because they might give rise to hazards or reduce the effectiveness of controls.

Part F; what analysis, methods, techniques do you want to use? And any special data to be used? Again, with these collections of systems that becomes more difficult to specify and more important. And then do we have any specific hazard management requirements? For example, are we using standard definitions and risk matrix from a standard or have we got our own? That all needs to be communicated.

Example #1

So, that is the totality of the task. As you can see, there’s not much to Task 209, so I thought it would be much more helpful to use an example, an illustration, and as they used to say in children’s TV, “Here’s one I made earlier” because a few years ago I had to produce a safety case report. I was the safety case report writer, and there was a small team of us generating the evidence, doing the analysis for the safety case itself.

What we were asked to do is to assure the safety of a system and – in fact, it was two systems but I just treat it as one – of a system for guiding aircraft onto ships in bad weather. So, all of these things existed beforehand. The aircraft were already in service. The ships were already in service. Some of the systems were already in service, but we were putting them together in a new combination. So, we had to take into account human factors. That was very important. We’ll see why in just a moment.

The operating environment, which was quite demanding. So, the whole point is to get the aircraft safely back to the ships in bad weather. They could do it in good weather you could do it visually, but in bad weather, visual wasn’t going to cut it. So, the operating environment- we were being asked to operate in a much more difficult environment. So, that changed everything and drove everything.

We’ve got to consider operating procedures because, as we’re about to see, people are gluing the systems together. So, how do they make it work? And also got to think about maintenance and management. Although in actual fact, we didn’t really consider maintenance and management that much. As an ex-maintainer, this annoys me, but the truth is people are much more focused on getting their capability and service. Often, they think about support as an afterthought. We’ll talk about that one day.

Example #2

Here’s a little demonstration of our system of systems. Bottom right-hand corner, we’ve got the ship with lots of people on the ship. So, if the aircraft crashes into it that could be bad news, not just for the people in the aircraft, but for the people on the ship – big risks there!

We’ve got our radar mounted on the ship so the ship is supplying the radar with power and control and data, telling it where to point for example. Also, the ship might be inadvertently interfering with the radar. There are lots of other electronic systems on the ship. There are bits of the ship getting in the way of the radar, depending on where you’ve put it, and so on and so forth. So, the ship interacts with the radar, the radar interacts with the ship, radars producing radiation. Could that be doing anything to the ship systems?

And then the radar is being operated. Now, I think that symbol is meant to indicate a DJ, but we’ve got the DJ wearing headphones and we got a disk there but it looks like a radar scope to me. So, I’ve just hijacked that. That’s the radar operator who is going to talk to the pilot and give the pilot verbal commands to guide them safely back to the ship. So, that’s how the system works.

In an ideal world, the ship would use the radar and then talk electronically direct to the aircraft and guide it – maybe automatically? That would be a much more sensible setup. In fact, that’s often the way it’s done. But in this particular case, we had to produce a bit of a – I hesitate to call it a lash-up because it was a multi-million-dollar project, but it was a bit of a lash-up.

So, there is the human factors. We’ve got a radar operator doing quite a difficult job and a pilot doing a very difficult job trying to guide their aircraft back onto the ship in bad weather. How are they going to interact and perform? And then lastly, as I alluded to earlier, the aircraft and the ship do actually interact in a limited way. But of course, it’s a physical interaction, so you can actually hurt people and of course, if we get it wrong, the aircraft interacts with the surface of the ocean, which is very bad indeed for the aircraft. So, we’ve got to be careful there. So, there’s a little illustration of our system of systems.

Example #3

And – this is the top-level argument that we came up with – it’s in goal structuring notation. But don’t worry too much about that – We’ll have a session on how to do GSN another time.

So, our top goal, or claim if you like, is that our system of systems is adequately safe for the aircraft to locate and approach the ship. So, that’s a very basic, very simple statement, but of course, the devil is in the detail and all of that detail we call the context. So, surrounding that top goal or claim, we’ve got descriptions of the system, of the aircraft and the ship. We got a definition of what we mean by adequately safe and we’ve got safety targets and reporting requirements.

So, what supports the top goal? We’ve got a strategy and after a lot of consultation and designing the safety argument, we came up with a strategy where we said, “We are going to show that all elements of the system of systems are safe and all the interactions are safe”. To do that, we had to come up with a scope and some assumptions to underpin that as well to simplify things. Again, they sit in the context, we just keep the essence of the argument down the middle.

And then underneath, we’ve got four subgoals. We aim to show that each system equipment is safe to operate, so it’s ready to be operated safely. Then each one is safe in operation so it can be operated safely with real people, etc. And then we’ve got all system safety requirements are satisfied for the whole collection of stuff and then finally that all interactions are safe. So, if we can argue all four of those, we should have covered everything. Now, I suspect if I did this again today, I might do it slightly differently. Maybe a little bit more elegantly, but that’s not the point. The point is, we came up with this and it worked.

Example #4

So, I’m going to unpack each one very briefly, just to illustrate some points. First of all, each component system is safe to operate. Each of these systems, bar one, had all been purchased already, sometimes a long time ago. They all came with their own safety targets, their own risk matrices, etc, etc. So, we had to make sure that when an individual system said, “This is what we’ve got to achieve” that that was good enough for the overall system of systems. So, we had to make sure that each system met its own safety requirements and targets and that they were valid in context.

Now, you would think that double-checking existing systems would be a foregone conclusion. In reality, we discovered that the ship’s communication system and its combat data system were not as robust as assumed. We discovered some practical issues were reported by stakeholders and we also discovered some flaws in previous analysis that had been accepted a long time ago. Now, in the end, those problems didn’t change the price of fish, as we say. It didn’t make a difference to the overall system of systems.

The frailty of the ship’s comms got sorted out and we discovered it didn’t actually matter about the combat system. So, we just assumed that the data coming out of the combat system was garbage and it didn’t make any difference. However, we did upset a few stakeholders along the way. So beware, people don’t like discovering that a system that they thought was “tickety-boo” was not as good as they thought.

Example #5

The second goal was to show that the system of systems is safe in operation. So, we looked at the actual performance. We looked at test results of the radar and then also we were very fortunate that trials of the radar on the ship with aircraft were carried out and we were able to look at those trials reports. And once again, it emerged that the system in the real world wasn’t operating quite as intended, or quite as people had assumed that it would. It wasn’t performing as well. So, that was an issue. I can’t say any more about that but these things happen.

Also, a big part of the project was we included the human element. So, as I’ve said before, we had pilots and we had radar air traffic talk-down operators. So, we brought in some human factors specialists. They captured the procedures and tasks that the pilots and the radar operators had to perform. They captured them with what’s called a Hierarchical Task Analysis, they did some analysis of the tasks and what could go wrong. Then they created a model of what the humans were doing and ran it through a simulation several thousand times. So in that way, they did some performance modelling.

Now, they couldn’t give us an absolute figure on workload or anything like that but what they could do – fortunately, our new system was replacing an older system which was even more informally cobbled together than the one that we were we were bringing in. And so, the Human Factor specialists were able to compare human performance in the old system vs. human performance with the new system. Very fortunately, we were pleased to find out that the predicted performance was far better with the new system. The new system was much easier to operate for both the pilots and the talk-down radar operators. So, that was terrific.

Example #6

So, the third one; All system of systems safety requirements are satisfied. Now, this is a bit more nebulous, this goal, but what it really came down to was when you put things together, very often you get what’s called emergent behaviour. As in things start to happen that you didn’t expect or you didn’t predict based on the individual pieces. It’s the saying, two plus two equals five. You get more out of a system – you get synergy for good or ill out when you start putting different things together.

So, does the whole thing actually work? And broadly speaking, the answer was yes, it works very well. There were some issues, a good example the old radar that they used to use to talk the planes down was a search radar so the operator could see other traffic apart from the plane they were they were guiding in. Now, the operator being able to see other things is both good and bad because on the one hand gives them improved situational awareness so they can warn off traffic if it’s a collision situation develops. But also, it’s bad because it’s a distraction for the operator. So, it could have gone either way.

So, the new radar was specialized. It focused only on the aircraft being talked down. So, the operator was blind to other traffic. So that was great in terms of decreasing operator workload and ultimately pilot workload as well. But would this increase the collision risk with other traffic? And I’ll talk about that in the summary briefly.

Example #7

And then our final goal is to show that all interactions are safe between the guidance system, the aircraft and the ship. This was a non-trivial exercise because ships have large numbers of electronic systems and there’s a very involved process to go through to check that a new piece of kit doesn’t interfere with anything else or vice versa.

And also, of course, does the new electronic system/the new radar does the radiation effect ship? Because you’ve got weapons on the ship and some of those explosive devices that the weapons uses are electrically initiated. So, could the radiation set off an explosion? So, all of those things had to be checked. And that’s a very specialized area.

And then we’ve got, does the system interfere with the aircraft and the aircraft with the system? What about the integration of the ship and the aircraft and the aircraft to the ship? Yet another specialized area where there’s a particular way of doing things. And of course, the aircraft people want to protect the aircraft and the ship people want to protect the ship. So, getting those two to marry up is also another one of those non-trivial exercises I keep referring to but it all worked out in the end.

Summary

Points to note: When we’re doing system of systems – I’ve got five points here, you can probably work some more points out from what I’ve said for yourself – but we’re putting together disparate systems. They’re different systems. They’ve been procured by different organizations, possibly, to do different things. The stakeholders who bought them and care about them have got different aims and objectives. They’ve got different agendas to each other. So, getting everyone to play nicely in the zoo can be challenging. And even with somebody pulling it all together at the top to say “This has got to work. Get with the program, folks!” there’s still some friction.

Particularly, you end up with large numbers of stakeholders. For example, we would have regular safety meetings, but I don’t think we ever had two meetings in a row with exactly the same attendees because with a large group of people, people are always changing over and things move up. And that can be a challenge in itself. We need to include the human in the loop in systems of systems because typically that’s how we get them all to play together. We rely on human beings to do a lot of translation work and in effect. So, how do the systems cope?

A classic mistake really with systems design is to design a difficult-to-operate system and then just expect the operator to cope. That can be from things as seemingly trivial as amusement park rides – I did a lesson on learning lessons from an amusement park ride accident only a month or two ago and even there it was a very complex system for two operators, neither of whom had total authority over the system or to be honest, really had the full picture of what was going on. As a result, there were several dead bodies. So, how did the operators cope, and have we done enough to support them? That’s a big issue with a system of systems.

Thirdly, this is always true with safety analysis, but especially so with system of systems. The real-world performance is important. You can do all the analysis in the world making certain assumptions and the analysis can look fine, but in the real world, it’s not so simple. We have to do analysis that assumes the kit works as advertised because you’ve got nothing else to go on until you get the test results and you don’t get them until towards the end of the program. So, you’re going down a path, assuming that things work, that they do what they say on the tin, and perhaps you then discover they don’t do what they say on the tin. Or they don’t do everything they say on a tin. Or they do what they say and they do some other things that you weren’t expecting as well and then you’ve got to deal with those issues.

And then fourthly, somewhat related to what I’ve just talked about, but you put systems together in an informal way, perhaps, and then you discover how they actually get on – what really happens. In reality, once you get above a certain level of complexity, you’re not really going to discover all the emergent behaviours and consequences until you get things into service and it’s clocked up a bit of time in service under different conditions in the real world. In fact, that was the case with this and I think with a system of systems, you’ve just got to assume that it’s sufficiently complex that that is the case.

Now, that’s not an unsolvable problem but, of course, how do you contract for that? Where you’ve got your contractors wanting you to accept their kit and pay them at a certain date or a certain point in the program, but you’re not going to find out whether it all truly works until it’s got into service and been in service for a while. So, how do you incentivize the contractor to do a good job or indeed to correct defects in a timely manner? That’s quite a challenge for system systems and it’s something that needs thinking about upfront.

And then finally, I’ve said, remember the bigger picture. It’s very easy when you’re doing analysis and you’ve made certain assumptions and you set the scope, it’s very easy to get fixated on that scope and on those assumptions and forget the real world is out there and is unpredictable. We had lots of examples of that on this program. We had the ship’s comms that didn’t always work properly, we couldn’t rely on the combat system, the radar in the real world didn’t operate as well as it said in the spec, etc, etc. There were lots of these things.

And, one example I mentioned was that with the new radar, the radar operator does not see any traffic other than the aircraft that is being guided in. So, there’s a loss of situational awareness there and there’s a risk, maybe an increased risk, of collision with other traffic. And that actually led to a disagreement in our team because some people who had got quite fixated on the analysis and didn’t like the suggestion that maybe they’d missed something. Although it was never put in those terms, that’s the way they took it. So, we need to be careful of egos. We might think we’ve done a fantastic analysis and we’ve produced hundreds of pages of data and fault trees or whatever it might be but that doesn’t mean that our analysis has captured everything or that it’s completely captured what goes on in the real world because that’s very difficult to do with such a complex system of systems.

So, we need to be aware of the bigger picture, even if it’s only just qualitatively. Somebody, a little voice, piping up somewhere saying, “What about this? And we thought about that? I know we’re ignoring this because we’ve been told to but is that the right thing to do?” And sometimes it’s good to be reminded of those things and we need to remember the big picture.

Copyright Statement

Anyway, I’ve talked for long enough. It just remains for me to point out that all the text in quotations, in italics, is from the military standard, which is copyright free but this presentation is copyright of the Safety Artisan. As I’m recording this, it’s the 5th of September 2020.

For More …

And so if you want more, please do subscribe to the Safety Artisan channel on YouTube and you can see the link there, but just search for Safety Artisan in YouTube and you’ll find us. So, subscribe there to get free video lessons and also free previews of paid content. And then for all lessons, both paid and free, and other resources on safety topics please visit the Safety Artisan at www.safetyartisan.com/  where I hope you’ll find much more good stuff that you find helpful and enjoyable.

End: System of Systems Hazard Analysis

So, that is the end of the presentation and it just remains for me to say thanks very much for watching and listening. It’s been good to spend some time with you and I look forward to talking to you next time about environmental analysis, which is Task 210 in the military standard. That’ll be next month, but until then, goodbye.

Categories
Start Here Work Health and Safety

Introduction to WHS Codes of Practice

In the 30-minute session, we introduce Australian WHS Codes of Practice (CoP). We cover: What they are and how to use them; their Limitations; we List (Federal) codes; provide Further commentary; and Where to get more information. This session is a useful prerequisite to all the other sessions on CoP.

Codes of Practice: Topics

  • What they are and how to use them;
  • Limitations;
  • List of CoP (Federal);
  • Further commentary; and
  • Where to get more information.

Codes of Practice: Transcript

Click Here for the Transcript

Hello and welcome to the Safety Artisan, where you will find professional, pragmatic, and impartial teaching and resources on all thing’s safety. I’m Simon and today is the 16th of August 2020. Welcome to the show.

Introduction

So, today we’re going to be talking about Codes of Practice. In fact, we’re going to be introducing Codes of Practice and the whole concept of what they are and what they do.

Topics for this Session

What we’re going to cover is what Codes of Practice are and how to use them – several slides on that; a brief word on their limitations; a list of federal codes of practice – and I’ll explain why I’m emphasizing it’s the list of federal ones; some further commentary and where to get more information. So, all useful stuff I hope.

CoP are Guidance

So, Codes of Practice come in the work, health and safety hierarchy below the act and regulations. So, at the top you’ve got the WHS Act, then you’ve got the WTS regulations, which the act calls up. And then you’ve got the Codes of Practice, which also the act calls up. We’ll see that in a moment. And what Codes of Practice do are they provide practical guidance on how to achieve the standards of work, health and safety required under the WHS act and regulations, and some effective ways to identify and manage risks. So, they’re guidance but as we’ll see in a moment, they’re much more than guidance. So, as I said, the Codes of Practice are called up by the act and they’re approved and signed off by the relevant minister. So, they are a legislative instrument.

Now, a quick footnote. These words, by the way, are in the introduction to every Code of Practice. There’s a little note here that says we’re required to consider all risks associated with work, not just for those risks that have associated codes of practice. So, we can’t hide behind that. We’ve got to think about everything. There are codes of practice for several things, but not everything. Not by a long way.

…Guidance We Should Follow

Now, there are three reasons why Codes of Practice are a bit more than just guidance. So, first of all, they are admissible in court proceedings. Secondly, they are evidence of what is known about a hazard, risk, risk assessment, risk control. And thirdly, courts may rely, or regulators may rely, on Codes of Practice to determine what is reasonably practicable in the circumstances to which the code applies. So, what’s the significance of that?

So first of all, the issue about being admissible. If you’re unfortunate enough to go to court and be accused of failing under WHS law, then you will be able to appeal to a Code of Practice in your defence and say, “I complied with the Code of Practice”. They are admissible in court proceedings. However, beyond that, all bets are off. It’s the court that decides what is anadmissible defence, and that means lawyers decide, not engineers. Now, given that you’re in court and the incident has already happened a lot of the engineering stuff that we do about predicting the probability of things is no longer relevant. The accident has happened. Somebody has got hurt. All these probability arguments are dust in your in the wake of the accident. So, Codes of Practice are a reliable defence.

Secondly, the bit about evidence of what is known is significant, because when we’re talking about what is reasonably practicable, the definition of reasonably practicable in Section 18 of the WHS act talks about what it is reasonable or what should have been known when people were anticipating the risk and managing it. Now, given that Codes of Practice were published back in 2012, there’s no excuse for not having read them. So, they’re pre –existing, they’re clearly relevant, the law has said that they’re admissible in court. We should have read them, and we should have acted upon them. And there’ll be no wriggling out of that. So, if we haven’t done something that CoP guided us to do, we’re going to look very vulnerable in court.  Or in the whatever court of judgment we’re up against, whether it be public opinion or trial by media or whatever it is.

And thirdly, some CoP can be used to help determine what is SOFARP. So in some circumstances, if you’re dealing with a risk that’s described a CoP, CoP is applicable. Then if you followed everything in CoP, then you might be able to claim that just doing that means that you’ve managed the risk SFARP. Why is that important? Because the only way we are legally allowed to expose people to risk is if we have eliminated or minimized that risk so far as is reasonably practicable, SFARP. That is the key test, the acid test, of “Have we met our risk management obligations? “And CoP are useful, maybe crucial, in two different ways for determining what is SFARP. So yes, they’re guidance but it’s guidance that we ignore at our peril.

Standards & Good Practice

So, moving on. Codes of Practice recognize, and I reemphasize this is in the introduction to every code of practice, they’re not the only way of doing things. There isn’t a CoP for everything under the sun. So, codes recognize that you can achieve compliance with WHS obligations by using another method as long as it provides an equivalent or higher standard of work, health and safety than the code. It’s important to recognize that Codes of Practice are basic. They apply to every business and undertaking in Australia potentially. So, if you’re doing something more sophisticated, then probably CoP on their own are not enough. They’re not good enough.

And in my day job as a consultant, that’s the kind of stuff we do. We do planes, trains and automobiles. We do ships and submarines. We do nuclear. We do infrastructure. We do all kinds of complex stuff for which there are standards and recognized good practice which go way beyond the requirements of basic Codes of Practice. And many I would say, probably most, technical and industry safety standards and practices are more demanding than Codes of Practice. So, if you’re following an industry or technical standard that says “Here’s a risk management process”, then it’s likely that that will be far more detailed than the requirements that are in Codes of Practice.

And just a little note to say that for those of us who love numbers and quantitative safety analysis, what this statement about equivalent or higher standards of health and safety is talking about  –We want requirements that are more demanding and more rigorous or more detailed than CoP. Not that the end –result in the predicted probability of something happening is better than what you would get with CoP because nobody knows what you would get with CoP. That calculation hasn’t been done. So, don’t go down the rabbit hole of thinking “I’ve got a quantitatively demonstrate that what we’re doing is better than CoP.” You haven’t. It’s all about demonstrating the input requirements are more demanding rather than the output because that’s never been done for CoP. So, you’ve got no benchmark to measure against in output terms.

The primacy of WHS & Regulations

A quick point to note that Codes of Practice, they are only guidance. They do refer to relevant WHS act and regulations, the hard obligations, and we should not be relying solely on codes in place of what it says in the WHS Act or the regulations. So, we need to remember that codes are not a substitute for the act or the regs. Rather they are a useful introduction. WHS ACT and regulations are actually surprisingly clear and easy to read. But even so, there are 600 regulations. There are hundreds of sections of the WHS act. It’s a big read and not all of it is going to be relevant to every business, by a long way. So, if you see a CoP that clearly applies to something that you’re doing, start with the cop. It will lead you into the relevant parts of WHS act and regulations. If you don’t know them, have a read around in there around the stuff that – you’ve been given the pointer in the CoP, follow it up.

But also, CoP do represent a minimum level of knowledge that you should have. Again, start with CoP, don’t stop with them. So, go on a bit. Look at the authoritative information in the act and the regs and then see if there’s anything else that you need to do or need to consider. The CoP will get you started.

And then finally, it’s a reference for determining SOFARP. You won’t see anything other than the definition of reasonably practicable in the Act. You won’t see any practical guidance in the Act or the regulations on how to achieve SOFARP. Whereas CoP does give you a narrative that you can follow and understand and maybe even paraphrase if you need to in some safety documentation. So, they are useful for that. There’s also guidance on reasonably practicable, but we’ll come to that at the end.

Detailed Requirements

It’s worth mentioning that there are some detailed requirements in codes. Now, when I did this, I think I was looking at the risk management Code of Practice, which will go through later in another session. But in this example, there are this many requirements. So, every CoP has the statement “The words ‘must’, ‘requires’, or ‘mandatory’ indicate a legal requirement exists that must be complied with.” So, if you see ‘must’, ‘requires’, or ‘mandatory’, you’ve got to do it. And in this example CoP that I was looking at, there are 35 ‘must’s, 39 ‘required’ or ‘requirement’ – that kind of wording – and three instances of ‘mandatory’. Now, bearing in mind the sentence that introduces those things contains two instances of ‘must’ and one of ‘requires’ and one of ‘mandatory’. So, straight away you can ignore those four instances. But clearly, there are lots of instances here of ‘must’ and ‘require’ and a couple of ‘mandatory’.

Then we’ve got the word ‘should’ is used in this code to indicate a recommended course of action, while ‘may’ is used to indicate an optional course of action. So, the way I would suggest interpreting that and this is just my personal opinion – I have never seen any good guidance on this. If it says ‘recommended’, then personally I would do it unless I can justify there’s a good reason for not doing it. And if it said ‘optional’, then I would consider it. But I might discard it if I felt it wasn’t helpful or I felt there was a better way to do it. So, that would be my personal interpretation of how to approach those words. So, ‘recommended’ – do it unless you can justify not doing it. ‘Optional’ – Consider it, but you don’t have to do it.

And in this particular one, we’ve got 43 instances of ‘should’ and 82 of ‘may’. So, there’s a lot of detailed information in each CoP in order to consider. So, read them carefully and comply with them where you have to work and that will repay you. So, a positive way to look at it, CoP are there to help you. They’re there to make life easy for you. Read them, follow them. The negative way to look at them is, ”I don’t need to do all this says in CoP because it’s only guidance”. You can have that attitude if you want. If you’re in the dock or in the witness box in court, that’s not going to be a good look. Let’s move on.

Limitations of CoP

So, I’ve talked CoP up quite a lot; as you can tell, I’m a fan because I like anything that helps us do the job, but they do have limitations. I’ve said before that there’s a limited number of them and they’re pretty basic. First of all, it’s worth noting that there are two really generic Codes of Practice. First of all, there’s the one on risk management. And then secondly, there’s the one on communication, consultation and cooperation. And I’ll be doing sessions on both of those. Now, those apply to pretty much everything we do in the safety world. So, it’s essential that you read them no matter what you’re doing and comply with them where you have to.

Then there are other codes of practice that apply to specific activities or hazards, and some of them are very, very specific, like getting rid of asbestos, or welding, or spray painting – or whatever it might be – shock blasting. Those have clearly got a very narrow focus. So, you will know if you’re doing that stuff. So, if you are doing welding and clearly you need to read the welding CoP. If welding isn’t part of your business or undertaking, you can forget it.

However, overall, there are less than 25 Codes of Practice. I can’t be more precise for reasons that we will come to in a moment. So, there’s a relatively small number of CoP and they don’t cover complex things. They’re not going to help you design a super –duper widget or some software or anything like that. It’s not going to help you do anything complicated. Also, Codes of Practice tend to focus on the workplace, which is understandable. They’re not much help when it comes to design trade –offs. They’re great for the sort of foundational stuff. Yes, we have to do all of this stuff regardless. When you get to questions of, “How much is enough?” Sometimes in safety, we say, “How much margin do I need?” “How many layers of protection do I need?” “Have I done enough?” CoP aren’t going to be a lot of use helping you with that kind of determination but you do need to have made sure you’ve done everything CoP first and then start thinking about those trade –offs, would be my advice. You’re less likely to go wrong that way. So, start with your firm basis of what you have to do to comply and then think “What else could I do?”

List of CoP (Federal) #1

Now for information, you’ve got three slides here where we’ve got a list of the Codes of Practice that apply at the federal or Commonwealth level of government in Australia. So, at the top highlighted I’ve already mentioned the ‘how’ to manage WHS risks and the consultation, cooperation, and coordination codes. Then we get into stuff like abrasive, blasting, confined spaces, construction and demolition and excavation, first aid. So, quite a range of stuff, covered.

List of CoP (Federal) #2

Hazardous manual tasks – so basically human beings carrying and moving stuff. Managing and controlling asbestos, and removing it. Then we’ve got a couple on hazardous chemicals on this page, electrical risks, managing noise, preventing hearing loss, and stevedoring. There you go. So, if you’re into stevedoring, then this CoP is for you. The highlighted ones we’re going to cover in later sessions.

List of CoP (Federal) #3

Then we’ve got managing risk of Plant in the workplace. There was going to be a Code of Practice for the design of Plant, but that never saw the light of day so we’ve only got guidance on that. We’ve got falls, environment, work environment, and facilities. We’ve got another one on safety data sheets for another one on hazardous chemicals, preventing falls in housing – I guess because that’s very common accident – safe design of structures, spray painting and powder coating, and welding processes. So, those are the list of – I think it’s 24 – Codes of Practice are applied by Comcare, the federal regulator.

Commentary #1

Now, I’m being explicit about which regulator and which set of CoP, because they vary around Australia. Basically, the background was the model Codes of Practice were developed by Safe Work Australia, which is a national body. But those model Codes of Practice do not apply. Safe Work Australia is not a regulator. Codes of Practice are implemented or enforced by the federal government and by most states and territories. And it says with variations for a reason. Not all states and territories impose all codes of practice. For example, I live in South Australia and if you go and look at the WorkSafe South Australia website or Safe Work – whatever it’s called – you will see that there’s a couple of CoP that for some reason we don’t enforce in South Australia. Why? I do not know. But you do need to think about these things depending on where you’re operating.

It’s also worth saying that WHS is not implemented in every state in Australia. Western Australia currently have plans to implement WHS, but as of 2020 but I don’t believe they’ve done so yet. Hopefully, it’s coming soon. And Victoria, for some unknown reason, have decided they’re just not going to play ball with everybody else. They’ve got no plans to implement WHS that I can find online. They’re still using their old OHS legislation. It’s not a universal picture in Australia, thanks to our rather silly version of government that we have here in Australia – forget I said that. So, if it’s a Commonwealth workplace and we apply the federal version of WHS and Codes of Practice. Otherwise, we use state or territory versions and you need to see the local regulator’s Web page to find out what is applied where. And the definition of a Commonwealth workplace is in the WHS Act, but also go and have a look at the Comcare website to see who Comcare police. Because there are some nationalised industries that count as a Commonwealth workplace and it can get a bit messy.

So, sometimes you may have to ask for advice from the regulator but go and see what they say. Don’t rely on what consultants say or what you’ve heard on the grapevine. Go and see what the regulator actually says and make sure it’s the right regulator for where you’re operating.

Commentary #2

What’s to come? I’m going to do a session on the Risk Management Code of Practice, and I’m also, associated with that, going to do a session on the guidance on what is reasonably practicable. Now that’s guidance, it’s not a Code of Practice. But again, it’s been published so we need to be aware of it and it’s also very simple and very helpful. I would strongly recommend looking at that guidance if you’re struggling with SFARP for what it means, it’s very good. I’ll be talking about that soon. Also, I’m going to do a session on tolerability of risk, because you remember when I said “CoP aren’t much good for helping you do trade–offs in design” and that kind of thing. They’re really only good for simple stuff and compliance. Well, what you need to understand to deal with the more sophisticated problems is the concept of tolerability of risk. That’ll help us do those things. So, I’m going to do a session on that.

I’m also going to do a session on consultation, cooperation, and coordination, because, as I said before, that’s universally applicable. If we’re doing anything at a workplace, or with stuff that’s going to a workplace, that we need to be aware of what’s in that code. And then I’m also going to do sessions on plant, structures and substances (or hazardous chemicals) because those are the absolute bread and butter of the WHS Act. If you look at the duties of designers, manufacturers, importers, suppliers, and installers, et cetera, you will find requirements on plant, substances and structures all the way through those clauses in the WHS Act. Those three things are key so we’re going to be talking about that.

Now, I mentioned before that there was going to be a Code of Practice on plant design, but it never made it. It’s just guidance. So, we’ll have a look at that if we can as well – Copyright permitting. And then I want to look at electrical risks because I think the electrical risks code is very useful. Both for electrical risks, but it’s also a useful teaching vehicle for designers and manufacturers to understand their obligations, especially if you operate abroad and you want to know, or if you’re importing stuff “Well, how do I know that my kit can be safely used in Australia?” So, if you can’t do the things that the electrical risk CoP requires in the workplace if your piece of kit won’t support that, then it’s going to be difficult for your customers to comply. So, probably there’s a hint there that if you want to sell your stuff successfully, here’s what you need to be aware of. And then that applies not just to electrical, I think it’s a good vehicle for understanding how CoP can help us with our upstream obligations, even though CoP applies to a workplace. That session will really be about the imaginative use of Code of Practice in order to help designers and manufacturers, etc.

And then I want to also talk about noise Code of Practice, because noise brings in the concept of exposure standards. Now, generally, Codes of Practice don’t quote many standards. They’re certainly not mandatory, but noise is one of those areas where you have to have standards to say, “this is how we’re going to measure the noise”. This is the exposure standard. So, you’re not allowed to expose people to more than this. That brings in some very important concepts about health monitoring and exposure to certain things. Again, it’ll be useful if you’re managing noise but I think that session will be useful to anybody who wants to understand how exposure standards work and the requirements for monitoring exposure of workers to certain things. Not just noise, but chemicals as well. We will be covering a lot of that in the session(s) on HAZCHEM.

Copyright & Attribution

I just want to mention that everything in quotes/in italics is downloaded from the Federal Register of Legislation, and I’ve gone to the federal legislation because I’m allowed to reproduce it under the license, under which it’s published. So, the middle paragraph there – I’m required to point that out that I sourced it from the Federal Register of legislation, the website on that date. And for the latest information, you should always go to the website to double–check that the version that you’re looking at is still in force and is still relevant. And then for more information on the terms of the license, you can go and see my page at the www.SafetyArtisan.com because I go through everything that’s required and you can check for yourself in detail.

For More…

Also, on the website, there’s a lot more lessons and resources, some of them free, some of them you have to pay to access, but they’re all there at www.safetyartisan.com. Also, there’s the Safety Artisan page at www.patreon.com/SafetyArtisan where you will see the paid videos. And also, I’ve got a channel on YouTube where the free videos are all there. So, please go to the Safety Artisan channel on YouTube and subscribe and you will automatically get a notification when a new free video pops up.

End

And that brings me to the end of the presentation, so thanks very much for listening. I’m just going to stop sharing that now. It just remains for me to say thank you very much for tuning in and I look forward to sharing some more useful information on Codes of Practice with you in the next session in about a month’s time. Cheers now, everybody. Goodbye.

There’s more!

You can find the Model WHS Codes of Practice here. Back to the Topics Page.

Categories
Mil-Std-882E Safety Analysis

Health Hazard Analysis

In this full-length (55-minute) session, The Safety Artisan looks at Health Hazard Analysis, or HHA, which is Task 207 in Mil-Std-882E. We explore the aim, description, and contracting requirements of this complex Task, which covers: physical, chemical & biological hazards; Hazardous Materials (HAZMAT); ergonomics, aka Human Factors; the Operational Environment; and non/ionizing radiation. We outline how to implement Task 207 in compliance with Australian WHS. (We refer to other lessons for specific tools and techniques, such as Human Factors analysis methods.)

This is the seven-minute-long demo. The full version is a 55-minute-long whopper!

Health Hazard Analysis: Topics

  • Task 207 Purpose;
  • Task Description;
  • ‘A Health Hazard is…’;
  • ‘HHA Shall provide Information…’;
  • HAZMAT;
  • Ergonomics;
  • Operating Environment;
  • Radiation; and
  • Commentary.

Health Hazard Analysis: Transcript

Click here for the Transcript

Introduction

Hello, everyone, and welcome to the Safety Artisan. I’m Simon, your host, and today we are going to be talking about health hazard analysis.

Task 207: Health Hazard Analysis

This is task 207 in the Mil. standard, 882E approach, which is targeted for defense systems, but you will see it used elsewhere. The principles that we’re going to talk about today are widely applicable. So, you could use this standard for other things if you wish.

Topics for this Session

We’ve got a big session today so I’m going to plough straight on. We’re going to cover the purpose of the task; the description; the task helpfully defines what a health hazard is; says what health hazard analysis, or HHA, shall provide in terms of information. We talk about three specialist subjects: Hazardous materials or hazmat, ergonomics, and operating environment. Also, radiation is covered, another specialist area. Then we’ll have some commentary from myself.

Now the requirements of the standard of this task are so extensive that for the first time I won’t be quoting all of them, word for word. I’ve actually had to chop out some material, but I’ll explain that when we come to it. We can work with that but it is quite a demanding task, as we’ll see.

Task Purpose

Let’s look at the task purpose. We are to perform and document a health hazard analysis and to identify human health hazards and evaluate what it says, materials and processes using materials, etc, that might cause harm to people, and to propose measures to eliminate the hazards or reduce the associated risks. In many respects, it’s a standard 882 type approach. We’re going to do all the usual things. However, as we shall see it, we’re going to do quite a lot more on this one.

Task Description #1

So, task description. We need to evaluate the potential effects resulting from exposure to hazards, and this is something I will come back to again and again. It’s very easy dealing in this area, particularly with hazardous materials, to get hung up on every little tiny amount of potentially hazardous material that is in the system or in a particular environment and I’ve seen this done to death so many times. I’ve seen it overdone in the UK when COSHH, a control of substance hazardous to health, came in in the military. We went bonkers about this. We did risk assessments up the ying-yang for stuff that we just did not need to worry about. Stuff that was in every office up and down the land. So, we need to be sensible about doing this, and I’ll keep coming back to that.

So, we need to do as it says; identification assessment, characterisation, control, and communicate assets in the workplace environment. And we need to follow a systems approach, considering “What’s the total impact of all these potential stressors on the human operator or maintainer?” Again, I come from a maintenance background. The operator often gets lots of attention because a) because if the operator stuffs up, you very often end up with a very nasty accident where lots of people get hurt. So, that’s a legitimate focus for a human operator of a system.

But also, a lot of organizations, the executive management tend to be operators because that’s how the organization evolves. So, sometimes you can have an emphasis on operations and maintenance and support, and other things get ignored because they’re not sexy enough to the senior management. That’s a bad reason for not looking at stuff. We need to think about the big picture, not just the people who are in control.

Task Description #2

Moving on with task description. We need to do all of this good stuff and we’re thinking about materials and components and so forth, and if they cause or contribute to adverse effects in organisms or offspring. We’re talking about genetic effects as well. Or pose a substantial present or future danger to the environment. So in 882, we are talking about environmental impact as well as human health impact. There is a there is an environmental task as well that is explicitly so.

Personally, I would tend to keep the human impact and the environmental impact separate because there are very often different laws that apply to the two. If you try and mix them together or do a sort of one size fits all analysis, you’ll frequently make life more difficult for yourself than you need to. So, I would tend to keep them separate. However, that’s not quite how the standard is written.

A Health Hazard is …

So what is a health hazard? As it says, a health hazard is a condition and it’s got to be inherent to the operation, etc, through to disposal of the system. So, it’s cradle to grave – That’s important. That’s consistent with a lot of Western law. It’s got to be capable of causing death, injury, illness, disability, or even in this standard, they’ve just reduced the job performance of personnel by exposure to physiological stresses.

Now I’m getting ahead of myself because, in Australia, health hazards can include psychological impacts as well, not just impacts on physical health. Now reduced job performance? – Are we really interested in minor stuff? Maybe not. Maybe we need to define what we mean by that. Particularly when it comes to operators or maintainers making mistakes, perhaps through fatigue that can have very serious consequences.

So, this analysis task is going to address lots of causes or factors that we typically find in big accidents and relate them to effects on human performance. Then it goes on to specify that certain specific hazards must be included chemical, physical, biological, ergonomic – for ergonomic, I would say human factors, because when you look at the standard, what we call ergonomics is much wider than the narrow definition of ergonomics that I’m used to.

Now, this is the first area that chops some material because where in a-d it says e.g. in those examples there is in effect a checklist of chemical, physical, biological, and ergonomic hazards that you need to look at. This task has its own checklist. You might recall when we talked about preliminary hazard identification, a hazard checklist is a very good method for getting broad coverage in general. Now, in this task, we have further checklists that are specific to human health. That’s something to note.

We’ve also got to think about hazardous materials that may be formed by test, operation, maintenance, disposal, or recycling. That’s very important, we’ll come back to that later. Thinking about crashworthiness and survivability issues. We’ve got to also think about it says non-ionizing radiation hazards, but in reality, we’ve got to consider ionizing as well. If we have any radioactive elements in our system and it does say that in G. So, we’ve got to do both non-ionizing and ionizing.

HHA Shall Provide Info #1

What categories of information should this health hazard analysis generate? Well, first of all, it’s got to identify hazards and as I’ve said or hinted at before, we’ve got to think about how could human beings be exposed? What is the pathway, or the conditions, or mode of operations by which a hazardous agent could come into contact with a person? I will focus on people. So, just because there is a potentially hazardous chemical present doesn’t mean that someone’s going to get hurt. I suspect if I looked around in the computer in front of me that I’m recording this on or at the objects on my desk, there are lots of materials that if I was to eat them or swallow them or ingest them in some other way would probably not do me a lot of good. But it’s highly unlikely that I’m going to start eating them so maybe we don’t need to worry about that.

HHA Shall Provide Info #2

We also need to think about the characterization of the exposure. Describing the assessment process: names of the tools or any models used; how did we estimate intensities of energy or substances at the concentrations and so on and so forth? This is one of those analyses that are particularly sensitive to the way we go about doing stuff. Indeed, in lots of jurisdictions, you will be directed as to how you should do some of these analyses and we’ll talk about that in the commentary later. So, we’ve got to include that. We’ve got to “show our working” as our teachers used to tell us when preparing us for exams.

HHA Shall Provide Info #3

We’ve got to think about severity and probability. Here the task directs us to use the standard definition tables that are found in 882. I talked about those under task 202 so I’m not going to talk about further here. Now, of course, we can, and maybe should tailor these matrices. Again, I’ve talked about that elsewhere, but if we’re not using the standard matrices and tables, then we should set out what we’ve done and why that’s appropriate as well.

HHA Shall Provide Info #4

Then finally, the mitigation strategy. We shouldn’t be doing analysis for the sake of analysis. We should be doing to say, “How can we make things better?” And in particular for health, “How can we make things acceptable?” Because health hazards very often attract absolute limits on exposure. So, questions of SFARP or ALARP or cost-benefit analysis simply may not enter into the equation. We simply may be direct to say “This is the upper limit of what you can expose a human being to. This is not negotiable.” So, that’s another important difference with this task.

Three More Topics

Now, at this point, I am just foreshadowing. We’re about to move on to talk about some different topics. First of all, in this section, we’re going to talk about three particular topics. Hazardous material or HAZMAT for short; ergonomics; and the operational environment. When we say the operational environment, it’s mainly about the people, aspects of the system, and the environment that they experience. Then after these three, we would go on to talk about radiation. There are special requirements in these three areas for HAZMAT, ergonomics, and operational environment.

HAZMAT (T207) #1

First of all, we have to deal with HAZMAT. If it’s going to appear in our system, or in the support system, we’ve got to identify the HAZMAT and characterize it. There are lots of international and national standards about how this is to be done. There’s a UN convention on hazardous materials, which most countries follow. And then there will usually be national standards as well that direct what we shall do. More on that later. So, we’ve got to think about the HAZMAT.

A word of caution on that. Certainly in Australian defence, we do HAZMAT to death because of a recent historical example of a big national scandal about people being exposed to hazardous materials while doing defence work. So, the Australian Defence Department is ultrasensitive about HAZMAT and will almost certainly mandate very onerous requirements on performing this. And whilst we might look at that go “This is nuts! This is totally over the top!” Unfortunately, we just have to get on with it because no one is going to make, I’m afraid, a sensible decision about the level of risk that we don’t have to worry about because it’s just too sensitive a topic.

So, this is one of those areas were learning from experience has actually gone a bit wrong and we now find ourselves doing far too much work looking at tiny risks. Possibly at the expense of looking at the big picture. That’s just something to bear in mind.

HAZMAT (T207) #2

So, lots of requirements for HAZMAT. In particular, we need to think about what are we going to do with it when it comes to disposal? Either disposal of consumables, worn components or final disposal of the system. And very often, the hazardous material may have become more hazardous. In that, let’s say engine or lubricating oil will probably have metal fragments in it once it’s been used and other chemical contamination, which may render it carcinogenic. So, very often we start with a material that is relatively harmless, but use – particularly over a long period of time – can alter those chemicals or introduce contaminants and make them more dangerous. So, we need to think about the full life of the system.

Ergonomics (T207) #1

Moving on to ergonomics, and this is another big topic. Now, Mil.standard 882 doesn’t address human factors, in my view, particularly well. The human factors stuff gets buried in various tasks and we don’t identify a separate human factors program with all of the interconnections that you need in order to make it fully effective. But this is one task where human factors do come in, very much so, but they are called ergonomics rather than human factors. Under this task description, we need to think about mission scenarios. We need to think about the staff who will be exposed as operators or maintainers, whatever they might be doing. We’ve got to start to characterize the population at risk.

Ergonomics (T207) #2

We’ve got to think about the physical properties of things that personnel will handle or wear and the implications that has on body weight. So, for example, there is a saying that the “Air Force and the Navy man their equipment and the army equip their men”. Apologies for the gendered language but that’s the saying. So, we’re putting human beings – very often – inside ships and planes and tanks and trucks. And we’re also asking soldiers to carry – very often – lots of heavy equipment. Their rations, their weapons, their ammunition, water, various tools and stuff that they need to survive and fight on the battlefield. And all that stuff weighs and all of that stuff, if you’re running about carrying it, bangs into the body and can hurt people. So, we need to address that stuff.

Secondly, we need to look at physical and cognitive actions that operators will take. So, this is really very broad once we get into the cognitive arena thinking about what are the operators going to be doing. And exposures to mechanical stress while performing work. So, maybe more of a focus on the maintainer in part three. Now, for all of this stuff, we need to identify characteristics of the design of the system or the design of the work that could degrade performance or increase the likelihood of erroneous action that could result in mishaps or accidents.

This is classic human factor’s stuff. How might the designed work or the designed equipment induce human error? So, that’s a huge area of study for a lot of systems and very important. And this will be typically a very large contributor to serious accidents and, in fact, accidents of all kinds. So, it should be an area of great focus. Often it is not. We just tend to focus on the so-called technical risks and overdo that while ignoring the human in the system. Or just assuming that the human will cope, which is worse.

Ergonomics (T207) #3

Continuing with ergonomics. How many staff do we need to operate and maintain the system and what demands are we placing on them? Also, if we overdo these demands, what are we going to do about that? Now, this can be a big problem in certain systems. I come from an aviation background and fatigue and crew duty time tend to be very heavily policed in aviation. But I was actually quite shocked when I sort of began looking at naval surface ships, submarines, where it seemed that fatigue and crew duty time was not well policed. In fact, there even seemed to be, in some places, quite a macho attitude to forcing the crew into working long hours. I say macho attitude because the feeling seemed to be “Well if you can’t take it, you shouldn’t have joined.”

So, it seems to be to me, quite a negative culture in those areas potentially, and it’s something that we need to think about. In particular, I’ve noticed on certain projects that you have a large crew who seem to be doing an extraordinary amount of work and becoming very fatigued. That’s concerning because, of course, you could end up with a level of fatigue where the crew might as well – they’re making mistakes to the same level as a drunk driver. So, this is something that needs to be considered carefully and given the attention it deserves.

Operating Environment #1

Moving on to the operating environment. How will these systems be used and maintained? And what does that imply for human exposure? This is another opportunity where we need to learn from legacy systems and go back and look at historical material and say “What are people being exposed to in the past? And what could happen again?”

Now, that’s important. It’s often not very systematically done. We might go and talk to a few old, bold operators and maintainers and ask their advice on the things that can go wrong but we don’t always do it very systematically. We don’t always survey past hazard and accident data in order to learn from it. Or if we do there is sometimes a tendency to say, “That happened in the past, but we will never make those mistakes. We’re far too clever to stuff up like that – like our predecessors did.” Forgetting that our predecessors were just as clever as we are and just as well –meaning as we are but they were human and so are we.

I think pride can get in the way of a lot of these analyses as well. And there may be occasions where we’re getting close to exposure limits, where regulations say we simply cannot expose people to a certain level of noise, or whatever, and then “How are we going to deal with that? How are we going to prevent people from being overexposed?” Again, this can be a problem area.

Operating Environment #2

This next bit of operating environment is really – I said about putting people in the equipment. Well, this is this bit. This is part A and B. So, we’re thinking about “If we stick people in a vehicle – whether it be a land vehicle, marine vehicle, an air vehicle, whatever it might be – what is that vehicle going to do to their bodies?” In terms of noise, of vibration and stresses like G forces, for example, and shock, shock loading? Could we expose them to blast overpressure or some other sudden changes of pressure or noise that’s going to damage their ears, temporarily or permanently? Again, remarkably easy to do. So, that’s that aspect.

Operating Environment #3

Moving on, we continue to talk about noise and vibration in general. In this particular standard, we’ve got some quite stringent guidance on what needs to be looked at. Now, these requirements, of course, are assuming a particular way of doing things, which we will come to later. There are a lot of standards reference by task 207. This task is assuming that we’re going to do things the American government or the American military way, which may not be appropriate for what we’re doing or the jurisdiction we’re in. So, we’ll just move on.

Operating Environment #4

Then again, talking about noise, blast, vibration, how are we going to do it? Some quite specific requirements in here. And again, you’ll notice, two-thirds of the way down in the paragraph, I’ve had to chop out some examples. There is some more in effect, hazard checklists in here saying we must consider X, Y, Z. Now, again, this seems to be requiring a particular way of doing things that may not be appropriate in a non-American defence environment.

However, the principle I think, to take away from this is that this is a very demanding task. If we consider human health effects properly, it’s going to require a lot of work by some very specialist and skilled people. In fact, we may even get in some specialist medical people. If you work in aviation or medicine, you may be aware that there is a specialist branch of medicine for called aviation medicine where these things are specifically considered. And similarly, there are medical specialists are a diving operations and other things where we expose human beings to strange effects. So, this can be a very, very demanding task to follow.

Operating Environment #5

So, when we’re going to equip people with protective equipment or we’re going to make engineering changes to the system to protect them, how effective are these things going to be? And given that most of these things have a finite effectiveness – they’re rarely perfect unless you can take the human out of the system entirely, then we’re going to be exposing people to some level of hazard and there will be some risk that that might cause that injury.

So, how many individuals are we going to expose per platform or over the total population exposed over the life of the system? Now, bearing in mind we’re talking sometimes about very large military systems that are in service for decades. This can be thousands and thousands of people. So, we may need to think about that and certainly in Australia, if we expose people to certain potential contaminants and noise, we may have to run a monitoring program to monitor the health and exposure of some of this exposed population or all of them. So, that can be a major task and we would need to identify the requirements to do that quite early on, hopefully.

And then, of course, again, we’re not doing this for the sake of it. How can we optimize the design and effectively reduce noise exposure and vibration exposure to humans? And how did we calculate it? How did we come to those conclusions? Because we’re going to have to keep those records for a long, long time. So, again, very demanding recording requirements for this task.

Operating Environment #6

And then I think this is the final one on operating environment. What are the limitations of this protective equipment and what burden do they impose? Because, of course, if we load people up with protective equipment that may introduce further hazards. Maybe we’re making the individual more likely to suffer a muscular musculoskeletal disorder.

Or maybe we are making them less agile or reducing their sensitivity to noise? Maybe if we give people hearing protection, if somebody else has assumed that they will hear a hazard coming, well, they’re not going to anymore, are they? If they’re wearing lots of protective equipment, they may not be as aware of the environment around them as they once were. So, we can introduce secondary hazards with some of this stuff. And then we need to look at the trade-offs. When and where? Is it better to equip people or not to equip people and limit their exposure or just keep them away altogether?

Radiation (T207)

So moving on briefly, we’re just going to talk about radiation. Now in this task – again, I’ve had to chop a lot of stuff out – you’ll see that in square brackets this task refers to certain US standards for radiation. Both ionizing and non-ionizing, lasers and so forth. That’s appropriate for the original domain, which this standard was targeted at. It may be wholly inappropriate for what you and I are doing.

So, we need to look at the principles of this task, but we may need to tailor the task substantially in order to make it appropriate for the jurisdiction we’re working in. Again, we’re going to have to keep these records for a long time. Radiation is always going to be dreaded by humans so it’s a controversial topic. We’re going to have to monitor people’s exposure and protect them and show that we have done so, potentially decades into the future. So, we should be looking for the very highest standards of documentation and recording in these areas because they will come under scrutiny.

Contracting #1

Moving onto contracting, this is more of a standard part of this task or part of the standard, I should say. These words or very similar words exist in every task. So, I’m not going to go through all of these things in any great detail. It’s worth noting, and I’ll come back to this in part B, we may need to direct whoever is doing the analyses to consider or exclude certain areas because it’s quite possible to fritter away a lot of resources doing either a wide but shallow analysis that fails to get to the things that can really hurt people.

So, we might be doing a superficial analysis or we might go overboard on a particular area and I’ve mentioned HAZMAT but there are many things that people can get overexcited about. So, we might see people spending a lot of time and effort and money in a particular area and ignoring others that can still hurt people. Even though they might be mundane, not as sexy. Maybe the analysts don’t understand them or don’t want to know. So, the customer who is paying for this may need to direct the analysis. I will come on to how you do that later.

Also the customer or client may need to specify certain sources of information, certain standards, certain exposure standards, certain assumptions, certain historical sets of data and statistics to be used. Or some statistics about the population, because, of course, for example, the military systems, the people who operate military systems tend to be quite a narrow subset of the population. So, there are very often age limits. Frontline infantry soldiers tend to be young and fit. In certain professions, you may not be allowed to work if you are colour-blind or have certain disabilities. So, it may be that a broad analysis of the general population is not appropriate for certain tasks.

It may be perfectly reasonable to assume certain things about the target population. So, we need to think about all of these things and ensure that we don’t have an unfocused analysis that as a result is ineffective or wastes a lot of money looking at things that don’t really matter, that are irrelevant.

Contracting #2

Standards and criteria. In part F, there are 29 references which the standard lists, which are all US military standards or US legal standards. Now, probably a lot of those will be inappropriate for a lot of jurisdictions and a lot of applications. So, there’s going to be quite a lot of work there to identify what are the appropriate and mandatory references and standards to use. And as I said, in the health hazard area, there are often a lot. So, we will often be quite tightly constrained on what to do.

And Part H, if the customer knows or has some idea of the staff numbers and profile, they’re going to be exposed to this system of operating and maintaining the system. That’s a very useful information and needs to be shared. We don’t want to make the analyst, the contractor, guess. We want them to use appropriate information. So, tell them and make sure you’ve done your homework, that you tell them the right thing to do.

Commentary #1

So, that’s all of the standard. I’ve got four slides now of commentary. And the first one, I just want to really summarize what we’ve talked about and think about the complexity of what we’re being asked to do. First bullet point, we are considering cradle to grave operation and maintenance and disposal. Everything associated with, potentially, quite a complex system. Now, this lines up very nicely with the requirements of Australian law, which require us to do all of this stuff. So, it’s got to be comprehensive.

Second bullet point, we’ve got to think about a lot of things. Death and injury, illness, disability, the effects on and could we infect somebody or contaminate somebody with something that will cause birth defects in their offspring? There’s a wide range of potential vectors of harm that we’re talking about here, and we will probably – for some systems, we will need to bring in some very specialist knowledge in order to do this effectively. And also thinking about reduced job performance – this is one aspect of human factors. This task is going to linking very strongly to whatever human factors program we might.

Thirdly, we’ve got to think about chemical, physical, and biological hazards. So, again, there’s a wide range of stuff to think about there. An example of that is hazmat and the requirements on hazmat are, in most jurisdictions, tend to be very stringent. So, that is going to be done and we need to be prepared to do a thorough job and demonstrate that we’ve done a thorough job and provide all the evidence.

Then we’ve also got ergonomics. Actually, strictly speaking, we’re talking human factors here because it’s a much wider definition than what the definition of ergonomics that I’m used to, which tends to be purely physical effects on a human. Because we’re talking about cognitive and perception and job performance as well and also we’ve got vibration and acoustics. So, again, particular medical effects and stringent requirements. So, a whole heap of other specialists work there.

And operating environment, thinking about the humans that will be exposed. How are we going to manage that? What do we need to specify in order to set up whatever medical monitoring program of the workforce we might have to bring in in the future through life? So, again, potentially a very big, expensive program. We need to plan that properly.

Then finally, radiation. Another controversial topic which gets lots of attention. Very stringent requirements, both in terms of exposure levels and indeed we will often be directed as to how we are to calculate and estimate stuff. It’s another specialist area and it has to be done properly and thoroughly.

Overall, every one of those seven bullet points shows how complex and how comprehensive a good health hazard analysis needs to be. So, to specify this well, to understand what is required and what is needed through life, for the program to meet our legal and regulatory obligations, this is a big task and it needs a lot of attention and potentially a lot of different specialist knowledge to make it work. I flogged that one to death, so I’ll move on.

Commentary #2

Now, as I’ve said before, too, this is an American military standard, so it’s been written to conform to that world. Now in Australia, the requirements of Australian work, health and safety are quite different to the American way of doing things. Whilst we tend to buy a lot of American equipment and there’s a lot of American-style thinking in our military and in our defence industry, actually, Australian law much is much more closely linked to English law. It’s a different legal basis to what the Americans do. So Australian practitioners take note.

It’s very easy to go down the path of following this standard and doing something that will not really meet Australian requirements. It’ll be, “We’ll do some work” and it may be very good work, but when we come to the end and we have to demonstrate compliance with Australian requirements, if we haven’t thought about and explicitly upfront, we’re probably in for a nasty shock and a lot of expensive rework that will delay the program. And that means we’re going to become very, very unpopular very quickly. So, that’s one to avoid in my experience.

So, we will need to tailor task 207 requirements upfront in order to achieve WHS compliance. And the client customer needs to do that and understand that not the – well the contractor needs to. The analysts need to understand that. But the customer needs to understand that first, otherwise, it won’t happen.

Commentary #3

Let’s talk a bit more about tailoring for WHS. For example, there are several WHS codes of practice which are relevant. And just to let you know, these codes of practice cover not only requirements of what you have to achieve, but also, to a degree, how you are to achieve them. So, they mandate certain approaches. They mandate certain exposure standards. Some of them also list a lot of other standards that are not mandated but are useful and informative.

So, we’ve got codes of practice on hazardous manual tasks so avoiding muscular-skeletal injuries. We’ve got several codes of practice on hazardous chemicals. So, we’ve got a COP specifically on risk management and risk assessment of hazardous chemicals, on safety data sheets, on labelling of HAZCHEM in a workplace. We’ve got a COP on noise and hearing loss and also, we have other COPs on specific risks, such as asbestos, electricity and others, depending on what you’re doing. So, potentially there is a lot of regulation and codes of practice that we need to follow.

And remember that COPs are, while they contain regulations, they also are a standard that a court will look to enforce if you get prosecuted. If you wind up in court, the prosecution will be asking questions to determine whether you’ve met the requirements of COP or not. If you can’t demonstrate that you’ve met them, you might have done a whole heap of work and you might be the greatest expert in the world on a certain kind of risk, but if he can’t demonstrate that you’ve met at minimum the requirements of COP – because they are minimum requirements – then you’re going to be in trouble. So, you need to be aware of what those things are.

Then on radiation, we have separate laws outside the WHS. So, we have the Australian Radiation Protection and Nuclear Safety Agency, ARPANSA, and there is an associated act and associated regulations and some COP as well. So, for radiation side, there’s a whole other world that you’ve got to be aware of and associated with all of this stuff are exposure standards.

Commentary #4

Finally, how do we do all of this without spending every dollar in the defence budget and taking 100 years to do it? Well, first of all, we need to set our scope and priorities. So, before we get to Task 207, the client/the customer should be involving end-users and doing a preliminary hazard identification exercise. That should be broad and as thorough as possible. They should also be doing a preliminary hazardous hazard analysis exercise, Task 202, to think about those hazards and risks further.

Also, you should be doing Task 203, which is system requirements hazard analysis. We need to be thinking about what are the applicable requirements for my system from the law all the way down to what specific standards? What codes of practice? What historical norms do we expect for this type of equipment? Maybe there is industry good practice on the way things are done. Maybe as we work through the specifications for the equipment, we will derive further requirements for hazard controls or a safety management system or whatever it might be. That’s a big job in itself.

So, we need to do all three of those tasks, 201, 202, 203, in order to be prepared and ready to focus on those things that we think might hurt us. Might hurt people physically, but also might hurt us in terms of the amount of effort we’re going to have to make in order to demonstrate compliance and assurance. So, that will focus our efforts.

Secondly, when we need to do the specialist analyses and we may not always need to do so. This is where 201, 202, and 203 come in. But where we need to do specialist analyses, we may need to find specialist staff who are competent to do these this kind of unusual or specialist work and do it well. Now, typically, these people are not cheap, and they tend to be in short supply. So, if you can think about this early and engage people early, then you’re going to get better support.

You’re probably going to get a better deal because in my experience if you call in the experts and ask their opinion early on, they’re more likely to come back and help you later. As opposed to, if you ignore them or disregard their advice and then ask them for help because you’re in trouble, they may just ignore you because they’ve got so much work on. They don’t need your work. They don’t need you as a client. You may find yourself high and dry without the specialists you need or you may find yourself paying through the nose to get them because you’re not a priority in their eyes. So do think about this stuff early, I would suggest and do cultivate the specialist. If you get them in early and listen to them and they feel involved, you’re much more likely to get a good service out of them.

So thirdly, try not to do huge amounts of work on stuff that doesn’t really have a credible impact on health. Now, I know that sounds like a statement of the blinking obvious, but because people get so het up about health issues, particularly things like radiation and other hazards that humans can’t see: we dread them. We get very emotional about this stuff and therefore, management tends to get very, very worried about this stuff. And I’ve seen lots of programs spend literally millions of dollars analyzing stuff to death, which really doesn’t make any difference to the safety of people in the real world. Now, obviously, that’s wasted money, but also it diverts attention from those areas that really are going to cause or could cause harm to people through the life of the system.

So, we need to use that risk matrix to understand what is the real level of risk exposure to human beings and therefore, how much money should we be spending? How much effort and priority should we be spending on analyzing this stuff? If the risk is genuinely very low, then probably we just take some standard precautions, follow industry best practice, and leave it at that and we keep our pennies for where they can really make a difference.

Now, having said that, there are some exceptions. We do need to think about accident survivability. So, what stresses are people going to be exposed to if their vehicle is an accident? How do we protect them? How do they escape afterward? Hopefully. How do we get them to safety and treat the injured? And so on and so forth. That may be a very significant thing for your system.

Also post-accident scenarios in terms of – very often a lot of hazardous materials are safely locked away inside components and systems but if the system catches fire or is smashed to pieces and then catches fire, then potentially a lot of that HAZMAT is going to become exposed. Very often materials that pose a very low level of risk, if you set them on fire and then you look at the toxic residue left behind after the fire, it becomes far more serious. So, that is something to consider. What do we do after we’ve had an accident and we need to sort of clean up the site afterward? And so on and so forth.

Again, this tends to be a very specialist job so maybe we need to get in some specialists to give us advice on that. Or we need to look to some standards if it’s a commonplace thing in our industry, as it often is. We learn we learned from bitter experience. Well, hopefully, we learn from bitter experience.

Copyright Statement

So, that’s it from me. I appreciate it’s been a long session, but this is a very complex task and I’ve really only skimmed the surface on this and pointed you at sort of further reading and maybe some principles to look at in more depth. So, all the quotations are from the Mill standard, which is copyright free. But this presentation is copyright of the Safety Artisan.

For More…

And for more information on this topic and others, and for more resources, do please visit www.safetyartisan.com. There are lots of free resources on the website as well, and there’s plenty of free videos to look at.

End: Health Hazard Analysis

So, that is the end of the session. Thank you very much for listening. And all that remains for me to say is thanks very much for supporting the work of the Safety Artisan and tuning into this video. And I wish you every success in your work now and in the future. Goodbye.

Categories
Mil-Std-882E Safety Analysis

Operating & Support Hazard Analysis

In this full-length session, The Safety Artisan looks at Operating & Support Hazard Analysis, or O&SHA, which is Task 206 in Mil-Std-882E. We explore Task 206’s aim, description, scope, and contracting requirements. We also provide value-adding commentary, which explains O&SHA: how to use it with other tasks; how to apply it effectively on different products; and some of the pitfalls to avoid. We refer to other lessons for specific tools and techniques, such as Human Factors analysis methods.

This is the seven-minute-long demo. The full version is about 35 minutes long.

Operating & Support Hazard Analysis: Topics

  • Task 206 Purpose:
    • To identify and assess hazards introduced by O&S activities and procedures;
    • To evaluate the adequacy of O&S procedures, facilities, processes, and equipment used to mitigate risks associated with identified hazards.
  • Task Description (six slides);
  • Reporting (two slides);
  • Contracting (two slides); and
  • Commentary (four slides).

Operating & Support Hazard Analysis: Transcript

Click here for the Transcript

Introduction

Hello everyone and welcome to the Safety Artisan; home of safety engineering training. I’m Simon and today we’re going to be carrying on with our series on Mil. Standard 882E system safety engineering.

Operating & Support Hazard Analysis

Today, we’re going to be moving on to the subject of operating and support hazard analysis. This is, as it says, task 206 under the standard. Operating and support hazard analysis, I’ll just call it O&S or OSHA (also O&SHA) for short. Unfortunately, that will confuse people if I call OSHA. Let’s call it O&S.

Topics for this Session

The purpose of O&S hazard analysis is to identify and assess hazards introduced by those activities and procedures and also to evaluate the adequacy of O&S procedures, processes, equipment, facilities, etc, to mitigate risks that have been already identified. A twofold task but a very big task. And as we’ll see, we’ve got lots of slides today on task description, and reporting, contracting, and commentary. As always, I present the full text as is of the task, which is copyright free, but I’m only going to talk about the things that are important. So, we’re not going to go through every little clause of the standard that would be pointless.

O&S Hazard Analysis (T206)

Let’s get started with the purpose. As we’ve already said, it’s to identify and assess those hazards which are introduced by operational and support activities and procedures and evaluate their adequacy. So, we’re looking at operating the system, whatever it may be- And of course, this is a military standard, so we assume a military system, but not all military systems are weapon systems by any means. Not all are physical systems. So, there may be inventory management systems, management information systems, all kinds of stuff. So, does operating those systems and just supporting them (maintaining them are resupplying them, disposing of them, etc.,) does that create any hazards or introduce any hazards? And how do we mitigate? That’s the purpose of the task.

Task Description (T206) #1

Let’s move on to the task description. Again, we’re assuming a contractor is performing the analysis, but that’s not necessarily the case. For this task, this actually says this typically begins during engineering and manufacturing development, or EMD.  So, we’re assuming an American style lifecycle for a big system and EMD comes after concept and requirements development. So, we are beginning to move into the very expensive stage of development for a system where we begin to commit serious money. It’s suggesting that O&SHA can wait until then which is fine in general unless you’ve identified any particularly novel hazards that will need to be dealt with earlier on. As it says, it should build on design hazard analyses, but we’ll also talk about the case later on when there is no design hazard analyses. And the O&SHA shall identify requirements or alternatives or eliminating hazards, mitigating risks, etc. This is one of those tasks where the human is very important – In fact, dominant to be honest. Both as a source of hazards and the potential victim of the associated risks. A lot of human-centric stuff going on here.

Task Description (T206) #2

As always, we’re going to think about the system configurations. We’re going to think about what we’re going to do with the system and the environment that we’re going to do it in. So, a familiar triad and I know I keep banging on about this, but this really is fundamental to bounding and therefore evaluating safety. We’ve got to know what the system is, what we’re doing with it, and the environment in which we’re doing it. Let’s move on.

Task Description (T206) #3

Again, Human Factors, regulatory requirements, and particularly specified personnel requirements need to be thought of. Particularly for operating and support, we need to take into account the staffing and personnel concept that we have. It’s frighteningly easy to produce a system that needs so much maintenance, for example, or support activity that it is unaffordable. And lots and lots of military systems and, it must be said, government and commercial systems in the past have come in that required enormous amounts of support, which soon proved to be unaffordable or no one would sign up to the commitment required. So, lots of projects have simply died because the system was going to be too expensive to sustain. That’s a key point of what we’re doing with O&S here. It’s not just about health and safety. It’s about health and safety, which is affordable.

We also need to look at unplanned events. So, not just designed in things, but things introduced- It says human errors. Again, I’m going to re-emphasize it’s erroneous human action because human error makes it sound like a human is at fault. Whereas very often it’s the design or the concept or the requirements that are at fault and place unacceptable burdens on the human being. Again, lots of messy systems seen in the past, which didn’t quite work and we just kind of expected the operator to cope. And most of the time they cope and then every so often they have a bad day at the office or a bunch of factors come together and lots of people die. And then we blame the human. Well, it’s not the human’s fault at all. We put them in that position. And as always, we need to look at past- Past evaluations of related legacy systems and support operations. If you have good data about legacy systems or about similar systems that your organization or another organization has operated, then that’s gold dust. So, do make an effort to get hold of that information if you can. Maybe a trade association or some wider pan organization body can help you there.

Task Description (T206) #4

At a minimum, we’ve got to identify activities involving known hazards. This assumes that we’ve done some hazard analysis in the past, which is very important. We always need to do that. I’ll come back to that commentary. Secondly, changes needed in requirements, be they functional requirements – what we want the system to do. Or design requirements, if we put constraints on how the system may do it for whatever it may be, hardware, software, support equipment, whatever to make those hazards and risks more manageable. Requirements for safety features – so requirements for engineered features and devices, equipment, because always, in almost any jurisdiction, we will have a hierarchy of control that recognizes that designed and engineered in safety features are more effective than just relying on people to get it right. And then we’ve also got to communicate to people the hazards associated with the system. Warnings, cautions, and whatever special emergency procedures might be required associated with the system. Again, that’s something that we see reinforced in law and regulations in many parts of the world. This is all good stuff. It’s accepted good practice all across the world.

Task Description (T206) #5

Moving on, we also need to think about how are we going to move the system around and the associated spares and supplies? How are we going to package them, handle them, stole them, transport them? Particularly if there are hazardous materials, etc, etc, involved. That’s the next part, G. Again, training requirements. We’re thinking about a human-centric approach. Whatever we expect people to do, they’ve got to be trained in how to do it. Point I, we’ve got to include everything, whether it’s developmental or non-developmental terms. We can’t just ignore stuff because it’s GFE or it’s off the shelf. It doesn’t mean it can never go wrong. Far from it. Particularly if we are putting stuff together that’s never been put together before in a novel combination or in a novel environment. Something that might be perfectly safe and stable in an air-conditioned office might start to do odd things in a much more corrosive and uncontrolled environment, let’s say.

We need to think about what modes might the system be potentially hazardous when under operative control. Particularly, we might think about degraded modes of operation. So, for whatever reason, a part of the system has gone wrong or the system has got into an operating environment within which it doesn’t operate as well as it could. It’s not in an optimal operating environment or state. The human being in control of it, we’re assuming, has still got to be able to operate the system, even if it’s only to shut it down or to get it back into a safer state or safer environment. We’ve got to think about all of those nuances.

Then because we’re talking about support as well, we need to think about a related legacy systems, facilities and processes which may provide background information. Also, of course, the system presumably will very often be operating alongside other systems or it will be supported by all systems maybe that exist or being procured separately. So, we’ve got to think about all those interactions as well and all those potential contributions. As you can see, this is quite a wide-ranging broadly scoped task.

Task Description (T206) #6

Finally, on this section, the customer/the end-user/or whoever may specify some specific analysis techniques. Very often they will not. So, whoever is doing the analysis, be they a contractor or third party outside agency, needs to make sure that whatever they propose to do is going to be acceptable to the program manager. In the sense that it is going to be compatible and relevant and useful. And then finally, the contractor has got to do some O&SHA at the appropriate time but maybe more detailed data will come along later. In which case that needs to be incorporated and also operational changes.

An absolute classic [situation] with military and non-military systems is; the system gets designed, it goes into test and evaluation and we discover that things- assumptions that were made during development- don’t actually hold up. The real world isn’t like that or whatever it might be and we find we’re making changes- making changes in assumptions. Those need to be factored in which, sadly, is often not done very well. So, that’s an important point to think about. What’s my change control mechanism and how will the people doing the and O&SHA find out about these changes? Because very often it’s easy to assume that everybody knows about this stuff but when you start making assumptions, the truth is that it very often goes adrift.

Reporting (T206) #1

Let’s talk about reporting- Just a couple of slides here. In the reporting, there’s some fairly standard stuff in here, the physical and functional characteristics of the system- that’s important. Again, we might assume that everybody knows what they are, but it’s important to put them in. It may be that the people doing the analysis were given a different system description to the people developing the system, to the people doing the personnel planning, etc. All the different things that have to be brought together, we need to make sure that they join up again. It’s too easy to get that wrong. Reinforcing the point I made on the previous slide, as more detailed descriptions and specifications come in that needs to be supplied when it becomes available and provided.

Hazard analysis methods and techniques. What techniques are we using? Give a description. If you’re doing it to a particular standard, so much the better. Great- that saves a lot of paper. What assumptions that we made? What data, both qualitative and quantitative have we used to support analysis? That all needs to be declared. By the way, one of the reasons is to be declared is that when things change- not if- that’s when these assumptions and the data and the techniques get exposed. So, if there are changes, if we don’t have this kind of information declared, we can’t assess the impact changes. And it gets even more difficult to keep up with what’s going on.

Reporting (T206) #2

And then hazard analysis results. Again, the leading particulars of the results should be recorded in the hazard tracking system, the HTS, or hazard log, or risk register- whatever you want to call it. But there will be more detailed information that we wouldn’t want to clutter up the risk register with and we also need to provide warnings, cautions, and procedures to be included in maintenance manuals, training courses, operator manuals, etc. So, we’re going to or we’re probably going to generate an awful lot of data out of this task and that needs to be provided in a suitable format. Again, whoever the program manager on the client-side, or is the end-user representation, needs to think about this stuff quite early on.

Contracting #1

That leads us neatly on to contracting. Now, this task, in theory, can be specified a little bit down the track, after the program started. In practice, what you find is program managers tried to specify everything up front in a single contract for various reasons.

There are good reasons for doing that sometimes. Also, there are bad reasons but I’m not going to talk about this session. We’ll have a talk about planning your system safety program in another session. There’s a lot of nuances in there to be considered.

Just sticking to this task, identification of functional disciplines – who do we need to get involved in order to do this work properly? It’s likely that the safety team if you have one, may not have relevant operating experience or relevant sustainment experience for this kind of system. If they do, that’s fantastic but that doesn’t negate the read the requirement to get the end-user represented and involved. In fact, that’s a near legal requirement in Australia, for example, and in some other jurisdictions. We need to get the end-users involved. We need the discipline specialist to get involved. Typically, your integrated logistic support team, your reliability people, your maintainability, and your testability people, if you have those disciplines. Or maybe you’re calling them something else, it doesn’t really matter.

We need to know what are the reporting requirements. What, if any, analysis methods and techniques do we desire to be used? Maybe the client or end-user has got to jump through some regulatory hoops and therefore they need specific analysis work and safety results to be done and produced. If that’s the case, then that needs to be specified in the contract. And what data is to be generated in what format? And how is it to be reported on when, etc? Considering the hazard tracking system, etc? And then the client may also select or specify known hazards, known hazardous areas, or other specific items to be examined or excluded because maybe it’s being covered elsewhere or we don’t expect the contractor to be able to do this stuff. Maybe we need to use a specialist organization. Again, maybe a regulator has directed us to do so. So, all of these things need to be thought about when we’re putting together the contract requirements for task 206.

Contracting #2

Again, I say this every time, we need to include all items within the scope of the system and the environment, not just developmental stuff. In fact, these days, maybe the majority of programs that I am seeing are mostly non-developmental. So, we’re taking lots of COTS stuff, GFE components, and putting it all together. That’s all going to be included, particularly integration.

We need to think about legacy and related processes and the hazard analysis associated with them if we can get them. They should be supplied to whoever is doing the work and an analyst should be directed to review them and include lessons learned.

Then, reinforcing the previous point that has a tracking system- How will information reported in this task be correlated with tasks and analyses that are being done maybe elsewhere or by different teams? And the example here is 207 health hazard analysis. I’ll talk a little bit about the linkages between the two later. But it’s quite likely in this sort of area there will be large groups of people thinking about operations and maintenance and support. Very often those groups are very different. Sometimes they don’t even talk to each other. That’s the culture in different organizations. You don’t see airline pilots hanging around with baggage handlers very much, do you, down the pub for whatever reason? Different set of people- they don’t always mix very much. And again, you may also have different specialist disciplines, especially the Human Factors people. Again, you’ve got to tie everything in there. So, there’s going to be lots of interfaces in this kind of task that they’ve got to be managed.

Point I – concept of operations. Yes, that’s in every task. You’ve got to understand what we intend to do with this system or what the end-user intends to do with the system in order to have some context for the analysis.

And then finally, what risk definitions and what risk matrix are we using? If we’re not using the standard 882 matrix, then what are we doing?

Commentary #1

I’ve got four slides of commentary now – a number of things to say about Task 206.

Now, I’ve picked an Australian example. So, Task 206 ties in very neatly with Australian WHS requirements. I suspect Australian WHS requirements have been strongly influenced by American OSHA and system safety practices. In Australia, we are heavily influenced by the US approach. This standard and legal requirements in Australia, and in many other states and territories let’s be honest, do tie in nicely with the standard. Although not always perfectly, you’ve got to remember that. So, we do need to focus on operations and support activities. That’s a big part of WHS, thinking about all relevant activities and cradle to grave – the whole life of the system. We need to think about the working environment, the workplace. We need to think about humans as an integral part of the system, be they operators or maintainers, suppliers, other kinds of sustainers. And we need to be providing relevant information on hazards, risks, warnings, trainings, and procedures, and requirements for PPE, and so on and so forth to workers.

So, task 206 is going to be absolutely vital to achieving WHS compliance in Australia and compliance with health and safety legislation and regulations in many parts of the world. In the US and UK and I would say in virtually all developed nations. So, this is a very important task for achieving compliance with the law and regulations. It needs to get the requisite amount of attention- It doesn’t always. People so often on a program during procurement and acquisition development, the technical system is the sexy thing. That’s the thing that gets all the attention, especially early on. The operating and particularly the support side tends to get neglected because it’s not so sexy. We don’t buy a system to support it after all do we? We buy a system to do a job. So, we get the operators in and we get their input on how to optimize the system to do the job most cost-effectively and with most mission effectiveness that we can get out of it. We don’t often think about support effectiveness. But to achieve WHS compliance or the equivalent this is a very important task so we will almost always need to do it.

Commentary #2

The second item to think about – what is going to be key for the maintenance support side is a technique called Job Safety Analysis or Job Hazard Analysis. I’ve highlighted a couple of sources of information there, particularly I would recommend going to the American www.OSHA.gov site and the guidance that they provide on how to do a job hazard analysis. So, use that or use something else if something different is specified in the jurisdiction you’re working it, then go ahead and use that. But if you don’t have any [guidance] on what to do, this will help you.

This is all about – I’ve got a task to do, whatever it might be doing, how do I do it? Let’s analyse this step-by-step, or at least in reasonable size chunks, thinking about how we do the tasks that need to be done. Now, there’s the operator side, and then, of course, we’re always dealing with human beings working on the system or working with the system. So, we’re going to be seeing potentially a lot of Human Factors type techniques being relevant. And there are lots of tasks that we can think about, Hierarchical Task Analysis and that kind of approach is going to fit in with the Job Hazard Analysis as well. Those are going to link together quite well. There will also be things like workload analysis. Particularly for the operators, if we’re asking the operator to do a lot and to maintain a particular level of concentration or respond rapidly, we need to think about workload and too much workload and too little workload can make things worse.

There are lots of techniques out there, I’m not going to talk about Human Factors here. I’m going to be putting on a series on Human Factors techniques in cooperation with a specialist in that area. So, I’m not going to say more here.

For certain kinds of operators, let’s say, pilots, people navigating a ship and so on, drivers, there will be well-established ways that those operators are trained the way they have to operate. There will often be a legal framework and a regulatory framework that says how they have to operate. And then that may direct a particular kind of analysis to be done or a particular approach to be taken for how operators do their jobs. But equally, there is a vast range of operator roles in industry, in chemical plants. Various specialist operating roles where there’s an industry-specific approach to doing things. Or indeed the general approach may be left up to whoever is developing the system. So, there’s a huge range of approaches here that are going to be largely dictated by the concept of operations and also an awareness of what is relevant law, regulation, and good practice in a particular industry, in a particular situation. That’s where doing your Task 203, your safety requirements analysis really kicks in. It’s a very broad subject we’re covering here. You’ve got to get the specialist in to do it well.

Contracting #3

Now, I mention that these days we’re seeing more and more legacy and COTS systems being used and repurposed. Partly to save time and money. We’re not developing mega systems as often as we used to, particularly in defence, but also in many other walks of life as well. So, we may find ourselves evaluating a system where very little technical hazard analysis has been done because there are no developmental items and it’s even difficult to do analysis on legacy or a COTS system because we cannot get the data to do so. Perhaps we can’t get the data for commercial reasons, contractual reasons.

Or maybe we’ve got a legacy system that was developed in a different jurisdiction and whatever information is available with it just doesn’t fit the jurisdictional regulatory system that we’ve got to work in where we want to operate the system. This is very common. Australia, for example, [acquires] a lot of systems from abroad, which have not been developed in line with how we normally do things.

We could in theory just do Task 206 if there was no developmental hazard analysis to do but that’s not quite true. At a minimum, we will always need to do some Preliminary Hazard Listing and hazard analysis – that’s Tasks 201 and 202 respectively. And we will very definitely need to do some System Requirements Hazard Analysis, Task 203, to understand what we need to do for a particular system in a particular application, operating environment, and regulatory jurisdiction. So, we’re always going to have to do those and we may well have to look at the integration of COTS things and do some system-level analysis. That’s 204. We’re definitely going to need to do the early analyses. In fact, the client and the end-user representatives should be doing 201, 202 and 203 and then we may be in a position to finish things off with 206 for certain systems.

Contracting #4

Now, having said that, I’ve mentioned already that Task 206 can be very broad in scope and very wide-ranging. There’s a danger that we will turn Task 206 into a bottomless pit into which we pour money and effort and time without end. So, for most systems, we cannot afford to just do O&SHA across the board without any discernment or any prioritization.

So, we need to look at those other hazard analyses and prioritize those areas where people could get hurt. Particularly we should be using legacy and historical data here to say “What does – in reality, what does hurt people when looking after these systems or operating systems?” Again, as I’ve said before, in many industries there is a standard industry approach or good practice to how certain systems are operated, and maintained, and supported. So, if there is a standard industry approach available – particularly if we can justify that by available historical data – if that [is as good] as doing analysis, then why not just use the standard approach? It’s going to be easier to make a SFARP or a ALARP argument that way anyway. And why spend the money on analysis when we don’t have to? We could just spend the money on actually making the system safer. So, let’s not do analysis for the sake of doing analysis.

Also, there’s a strong synergy between the later tasks in the 200 series. There’s a strong linkage between this Task 206 and 207, which is Health Hazard Analysis. Also, there can be a strong linkage between Task 210, which is the Environmental Hazard Analysis. So, this trio of tasks focuses on the impact on living things, whether they be human beings or animals and plants and ecosystems and very often there’s a lot of overlap between them. For example, hazardous chemicals that are dangerous for humans are often dangerous for animals and plants and watercourses and so on and so forth. I’ll be talking about that more in the next session on Task 207.

One word of warning, however. Certainly, in Australia, we have got fixated on hazardous chemicals because we’ve had some very high-profile scandals involving HAZCHEM in the past. Now, there’s nothing wrong, of course, with learning from experience and applying rigorous standards when we know things have gone wrong in the past. But sometimes we go into a mindset of analysis for analysis sake. Dare I say, to cover people’s backsides rather than to do something useful. So, we need to focus on whether the presence of a HAZCHEM could be a problem. Whether people get exposed to it, not just that it’s there.

Certain chemicals may be quite benign in certain circumstances, and they only become dangerous after an emergency, for example. There are lots of things in the system that are perfectly safe until the system catches fire. Then when you’re trying to dispose or repair a fire damage system that can be very dangerous, for example. So, we need to be sensible about how we go about these things. Anyway, more on that in the next session.

Copyright Statement

That’s the commentary that I have on Task 206. As we said, it links very tightly with other things and we will talk about those in later sessions. I just like to point out that the “italic text in quotations” is from the Mil. standard. That is copyright free as most American government standards are. However, this presentation and my commentary, etc. are copyright of the Safety Artisan 2020.

For More …

Now, for all lessons and resources, please do visit the www.safetyartisan.com. Now, as you’ll notice, it’s an https – it’s a secure website.

End: Operating & Support Hazard Analysis

So, that is the end of the lesson and it just remains for me to say thank you very much for your time and for listening. And I look forward to seeing you again soon. Cheers.

Categories
Mil-Std-882E Safety Analysis

Functional Hazard Analysis

In this full-length (40-minute) session, The Safety Artisan looks at Functional Hazard Analysis, or FHA, which is Task 208 in Mil-Std-882E. FHA analyses software, complex electronic hardware, and human interactions. We explore the aim, description, and contracting requirements of this Task, and provide extensive commentary on it. (We refer to other lessons for special techniques for software safety and Human Factors.)

This is the seven-minute demo; the full version is 40 minutes long.

Topics: Functional Hazard Analysis

  • Task 208 Purpose;
  • Task Description;
  • Update & Reporting
  • Contracting; and
  • Commentary.

Transcript: Functional Hazard Analysis

Click here for the Transcript

Introduction

Hello, everyone, and welcome to the Safety Artisan; Home of Safety Engineering Training. I’m Simon and today we’re going to be looking at how you analyse the safety of functions of complex hardware and software. We’ll see what that’s all about in just a second.

Functional Hazard Analysis

I’m just going to get to the right page. This, as you can see, functional hazard analysis is Task 208 in Mil. Standard 882E.

Topics for this Session

What we’ve got for today: we have three slides on the purpose of functional hazard analysis, and these are all taken from the standard. We’ve got six slides of task description. That’s the text from the standard plus we’ve got two tables that show you how it’s done from another part of the standard, not from Task 208. Then we’ve got update and recording, another two slides. Contracting, two slides. And five slides of commentary, which again include a couple of tables to illustrate what we’re talking about.

Functional Purpose HA #1

What we’re going to talk about is, as I say, functional hazard analysis. So, first of all, what’s the purpose of it? And in classic 882 style, Task 208 is to perform this functional hazard analysis on a system or subsystem or more than one. Again, as with all the other tasks, it’s used to identify and classify system functions and the safety consequences of functional failure or malfunction. In other words, hazards.

Now, I should point out at this stage that the standard is focused on malfunctions of the system. The truth is in the real world, that lots of software-intensive systems have been involved in accidents that have killed lots of people, even when they’re functioning as intended. That’s one of the short-sightedness of this Mil. Standard is that it focuses on failure. The idea that if something is performing as specified, that either the specification might be wrong or there might be some disconnect between what the system is doing and what the human expects – The way the standard is written just doesn’t recognize that. So, it’s not very good in that respect. However, bearing that in mind, let’s carry on with looking at the task.

Functional HA Purpose #2

We’re going to look at these consequences in terms of severity – severity only, we’ll come back to that – for the purpose of identifying what they call safety-critical functions, safety-critical items, safety-related functions, and safety-related items. And a quick word on that, I hate the term ‘safety-critical’ because it suggests a sort of binary “Either it’s safety-critical. Yes. Or it’s not safety-critical. No.” And lots of people take that to mean if it’s “safety-critical, no,” then it’s got nothing to do with safety. They don’t recognize that there’s a sort of a sliding scale between maximum safety criticality and none whatsoever. And that’s led to a lot of bad thinking and bad behaviour over the years where people do everything they can to pretend that something isn’t safety-related by saying, “Oh, it’s not safety-critical, therefore we don’t have to do anything.” And that kind of laziness kills people is the short answer.

Anyway, moving on. So, we’ve got these SCFs, SCIs, SRFs, SRIs and they’re supposed to be allocated or mapped to a system design architecture. The presumption in this – the assumption in this task is that we’re doing early – We’ll see that later – and that system design, system architecture, is still up for grabs. We can still influence it. Often that is not the case these days. This standard was written many years ago when the military used to buy loads of bespoke equipment and have it all developed from new. That doesn’t happen anymore so much in the military and it certainly doesn’t happen in many other walks of life – But we’ll talk about how you deal with the realities later. And they’re allocating these functions and these items of interest to hardware, software and human interfaces. And I should point out, when we’re talking about all that, all these things are complex. Software is complex, human is complex, and we’re talking about complex hardware. So, we’re talking about components where you can’t just say, “Oh, it’s got a reliability of X, and that’s how often it goes wrong” because those type of simple components that are only really subject to random failure, that’s not what we’re talking about here. We’re talking about complex stuff where we’re talking about systematic failure dominating over random, simple hardware failure. So, that’s the focus of this task and what we’re talking about. That’s not explained in the standard, but that’s what’s going on.

Functional HA Purpose #3

Now, our third slide on purpose; so we use the FHA to identify consequences of malfunction or functional failure, lack of function. As I said just now, we need to do this as early as possible in the systems engineering process to enable us to influence the design. Of course, this is assuming that there is a systems engineering process – that’s not always the case. We’ll talk about that at the end as well. And we’re going to identify and document these functions and items and allocate and it says partition them in the software design architecture. When we say partition, that’s jargon for separate them into independent functions. We’ll see the value of that later on. Then we’re going to identify requirements and constraints to put on the design team to say, “To achieve this allocation in this partitioning, this is what you must do and this is what you must not do”. So again, the assumption is we’re doing this early. There’s a significant amount of bespoke design yet to be done.

Task Description (T208) #1

Moving on to task description. It says the contractor, but whoever’s doing the analysis has to perform and document the FHA, to analyse those functions, as it says, with the proposed design. I talked about that already so we’ll move on.

It’s got to be based on the best available data, including mishap data. So, accident/incident data, if you can get it from similar systems and lessons learned. As I always say in these sessions, this is hard to do, but it’s really, really valuable so do put some effort into trying to get hold of some data or look at previous systems or similar systems. We’re looking at inputs, outputs, interfaces and the consequences of failure. So, if you can get historical data or you can analyse a previous system or a similar system, then do so. It will ultimately save you an awful lot of money and heartache if you can do that early on. It really is worth the effort.

Task Description (T208) #2

At a minimum, we’ve got to identify and evaluate functions and to do that, we need to decompose the system. So, imagine we’ve got this great big system. We’ve got to break it down into subsystems of major components. We’ve got to describe what each subsystem and major component does, its function or its intended function. Then we need a functional description of interfaces and thinking about what connects to what and the functional ins and outs. I guess pretty obvious stuff  – needs to be done.

Task Description (T208) #3

And then we also need to think about hazards associated with, first of all, loss of function. So, no function when we need it. Now, we have degraded functional malfunction and sort of functioning out of time or out of sequence. So, we’ve got different kinds of malfunctions. What we don’t have here is function when not required. So, the system goes active for some reason and does something when it’s not meant to. Now, if we add that third base and we’ve got a functional failure analysis.

Essentially here, we’re talking about a functional failure analysis, maybe something a bit more sophisticated, like a HAZOP. And the HAZOP is more sophisticated because instead of just those three things that can go wrong, we think about we’ve got lots of guide words to help us think about ‘out of time, out of sequence’. So, too early, too late, before intended, after intended, whatever it might be. And there are there variations on HAZOP called computer HAZOP, or CHAZOP, where people have come up with different keywords, different prompt words, to help you think about software in data-intensive systems. So, that’s a possible technique to use here.

And then when we’re thinking about these hazards that might be generated by malfunction, or functional failure in its various forms, we need to think about, “What’s the next step in the mishap sequence? In the accident sequence? And what’s the final outcome of the accident sequence?” And that’s very important for software because software is intangible. It has no physical form. On its own, in isolation, software cannot possibly hurt anyone. So, you’ve got to look at how the software failure propagates through the system into the real world and how it could harm people. So, that’s a very important prompt that that last sentence in yellow there.

Task Description (T208) #4

And we carry on. We need to assess the risk with failure of a function subsystem or component. We’re going to do so using the standard 882 tables, tables one and two, and risk assessment codes in table three, unless we come up with our own tailored versions of those tables and that matrix and that’s all approved. In reality, most people don’t tailor this stuff. They should make it appropriate for the system, but they rarely do.

Table I and II

So just to remind us what we’re talking about, here’s table one and two. Table one is severity categories ranging from catastrophic, which could kill somebody – a catastrophic outcome – down to negligible, where we’re talking cuts and bruises – very, very, very minor injuries.

And then table two, probability levels. We’ve got everything from frequent down to eliminated – There’s no hazard at all because we’ve eliminated. It will never happen in the lifetime of the universe. So, it really is a zero probability. We’ve got frequent down to improbable and then in the standard, we’ve got a definition for these things in words, for a single item and also for a fleet or inventory of those items, assuming that there’s a large number of them. And that’s very useful. That helps us to think about how often something might go wrong per item and per fleet.

Table III

So, that’s tables one and two, we put them together, the severity and the probability to give us table three. As you can see, we’ve got probability down the left-hand side and at the bottom, if we’ve eliminated the hazard, then there is no severity. The hazard is completely eliminated. So, forget about that row. Then everything else we’ve got frequent down to improbable, probability. And we’ve got catastrophic down to negligible. Together those generate the risk assessment code, which is either high, serious, medium or low. That’s the way this standard defines things. Nothing is off-limits. Nothing is perfect except for elimination. We’ve just defined a level of risk and then you have to make up rules about how you will treat these levels of risk. The standard does some of that for you, but usually, you’ve got to work out depending on which jurisdiction you’re in legally, what you’re required to do about different levels of risk.

Now this table on its own, I’ll just mention, is not helpful in a British or Australian jurisdiction where we have to reduce or eliminate risks SOFARP. The table on its own won’t help you do that, because this is just an absolute level of risk. It’s not considering what you could have done to make it better. It’s just saying where we are. It’s a status report.

So, those are your tables one, two and three, as the standard describes them. That’s the overall method and we’re going to do what it says in Section four of the standard. In the main body of the standard, Section four talks about software and complex hardware and how we allocate these things.

Task Description (T208) #5

And then finally, I think on task description, an assessment of whether the functions identified are to be implemented in the design – sorry, of whether the functions are to be implemented in the design and map those functions into the components. And then it says functions allocated to software should be matched to the lowest level of technical design or configuration item. So, if you’ve got a software or hardware configuration item that is further subdivided into sub-items, then you need to go all the way down and see which items can contribute to that function and which can’t.

That’s an important labour-saving device, because if you’ve got  – you could have quite a large configuration item, but actually, only a tiny bit contributes to the hazard. So, that’s the only thing you need to worry about in theory. In reality, partitioning software is not as easy as the standard might suggest. However, if we can do a meaningful partition, then we could and should aim to have as little software safety-related as we possibly can. If nothing else, for cost in order to get the project in on time. So, the less criticality we have in our system, the better.

Task Description (T208) #6

So, we need to assess the software control category for each configuration item that’s been allocated a safety-significant software function (SSSF). Having assigned the SCC, we then have to work at the software criticality index for each of those functions and we’ll talk about how to do that at the end. Then from all of this work, we need to generate a list of requirements and constraints to include in the spec which, if they work, will eliminate the hazard or reduce the risk.

And the standard talks about that these could be in the form of fault tolerance, fault detection, fault isolation, fault annunciation or warning, or fault recovery. Now, this breakdown reveals – basically this is a reliability breakdown. So, in the world of reliability, we talk typically about fault tolerance, fault detection, warning, and recovery. Four things – I mean they split them down to five here. Now, software reliability is highly controversial. So really, this is a bit of a mismatch here. These reliability-based suggestions are not necessarily much use for software, or indeed for people sometimes. You may have to use other more typical software techniques to do this and in fact, the standard does point you to do that. But that’s for another session.

FHA Update & Records #1

So, we’ve done the FHA, or we’re doing the FHA. We’ve got to record it and we’ve got to update it when new information comes through. So, we’ve got to update the FHA as the design progresses or operational changes come in. We’ve got to have a system description of the physical and functional characteristics of the system and subsystems. And of course, for design complex items like software, context is everything. So, this is very important. Again, software in isolation cannot hurt anyone. You’ve got to have the context to understand what the implications might be. If we don’t have that, we’re stuffed pretty much. Then it goes on to say that when further documentation becomes available, more detail that needs to be supplied. So, don’t forget to ask for that in your contract and expect it as well and be ready to deal with it.

FHA Update & Records #2

 Moving on. When it comes to hazard analysis, method and techniques, we need to describe the method and the technique used for the analysis, what assumptions and what data was used in support of the analysis and this statement is pretty much in every single task so I’ll say no more. You’ve heard this before. Then again, analysis results need to be captured in the hazard tracking system and, as I’ve always said, usually the leading details, the top-level details, go in there has a tracking system. The rest of it goes into the hazard analysis report otherwise, you end up with a vast amount of data in your HTS and it becomes unwieldy and potentially useless.

Contracting #1

Contracting – Again, this is a pretty standard clause, or set of clauses, in a Mil. Standard 882 task. So, in our request for proposal and statement of work, we’ve got to ask for Task 208. We’ve got to point the analyst, the contractor, at what we want them to analyse particularly or maybe as a minimum. And what we don’t want to analyse, maybe because it’s been done elsewhere or it’s out of scope for this system.

We need to say what are data reporting requirements are considering Task 106, which is all about hazard tracking system or the hazard log or the risk register, whatever you want to call it. So, what data do we want? What format? What are the definitions, etc.? Because if you’re dealing with multiple contractors or you want data that is compatible with the rest of your inventory, then you’ve got to specify what you want. Otherwise, you’re going to get variability in your data and that’s going to make your life a whole lot harder downstream – Again, this is standard stuff.

And what are the applicable requirements, specifications and standards? Of course, this is an American standard so compliance with specifications, requirements and standards is all because that’s the American system.

Contracting #2

We need to supply the concept of operations, as I’ve said before, with a complex design. Especially software, context is everything. So, we need to know what we’re going to do with the system that the software is sat within. So, this system has got some functions, this is what we’re looking at in Task 208: What are those functions for? How do they to relate with the real world? How could we hurt people? And then if we got any other specific hazard management requirements. Maybe we’re using a special matrix because we’ve decided the standard matrix isn’t quite right for our system. Whatever we’re doing, if we’ve got special requirements that are not the norm for the vanilla standard, that we need to say what they are. Pretty straightforward stuff.

Commentary #1

We’re onto commentary, and I think we’ve got five slides of commentary today.

As it says, functional hazard analysis depends on systems engineering. So, if we don’t have good systems engineering, we’re unlikely to have good functional analysis. So, what do I mean by good systems engineering? I mean, that for the complete system – apart from things that we deliberately excluded for a good reason – but for the complete system we need or functions to be identified, we need those functions to be analysed and allocated correctly in accordance and rigorously and consistently. We need interface analysis, control, and we need the architecture of the design to be determined based on the higher-level requirements, all that work that we’ve done.

Now, if those things are not done or they’re incomplete, or they were done too late to influence the design architecture, then you’re going to have some compromised systems engineering. And these days, because we’re using lots of commercial off the shelf stuff, what you find is that your top-level design architecture is very often determined before you even start because you’ve decided you’re going to have an off the shelf this and you’re going to have a modified off the shelf that and you’re going to put them together in a particular way with a set of business rules, a concept of operations, that says this is how we’re going to use this stuff. And our new system interfaces with some existing stuff and we can’t modify the existing stuff.

So, that really limits what we can do with the design architecture. A lot of the big design decisions have already been taken before we even got started. Now, if that’s the case, then that needs to be recognized and dealt with. I’ve seen those things dealt with well. In other words, the systems engineering has been done recognizing those constraints, those things that that can’t be done. And I’ve seen it done badly in that figuratively speaking, the systems engineering team or the program manager, whoever has just given us a Gallic shrug and gone “Yeah, what the heck, who cares?” So, there’s this the two extremes that you can see.

Now, if the systems engineering is weak or incomplete, then you’re going to get a limited return on doing Task 208. Maybe there are some areas where you can do it on new areas, or maybe you’ve got a new interface that’s got to be worked up and created in order to get these things to talk to each other. Clearly, there is some mileage in doing that. You’re going to get some benefits from doing that in that area. But for the stuff that’s already been done, probably – Well, what what’s the point of doing systems engineering here? What does it achieve? So, maybe in those circumstances, it’s better – Well, in fact, I would say it’s essential to understand where systems engineering is still valid, where you are still going to get some results and where it isn’t. And maybe you just declare that scope; What’s in and out.

Or maybe you take a different approach. Maybe you go “OK, we’re dealing with a predominantly COTS system. We need a different way of dealing with this than the way the Mil. standard 882 assumes.” So, you’re going to have to do some heavy tailoring of the standard because 882 assumes that you’re determining all these requirements predesigned. If that’s not the case, then maybe 882 isn’t for you. Or maybe you just need to recognize you’re going to have to hack it about severely. Which in turn means you’ve got to know what you’re doing fundamentally. In which case the standard really is no longer fulfilling its role of guiding people.

Commentary #2

Moving on. Let’s assume that we are still going to do some Task 208. We’re going to determine some software criticality. We’re also going to determine some criticality for complex hardware. So, things whether it be software in complex electronics, so pre-programmed electronics, whatever that might be. First of all, as we said before, we’re going to determine the software control category and what that’s really saying is how much authority does the software have? And then secondly, we’re going to be looking at severity, which was table one. How severe is the worst hazard or risk that the software could contribute to? And these are illustrated in the next two slides. And we do a session or several sessions on software safety is coming soon. That will be elsewhere. I’m not going to go into massive detail here. I’m just giving you an overview of what the task requires.

Commentary #3: Software Control Categories 1-5

First of all, how do we determine software control category? So, there’s the table from the standard. We’ve got five levels of SCC.

At the top, we’ve got autonomous. Basically, the software does whatever it wants to and there’s no checks and balances.

Secondly, they’re semi-autonomous. The software is there’s one software system performing a function, but there are hardware interlocks and checks. And those hardware interlocks and checks, and whatever else that are not software, can work fast enough to prevent the accident happening. So, they can prevent harm. So, that’s semi-autonomous.

Then we’ve got redundant fault-tolerant where you’ve got an architecture typically with more than one channel, and maybe all channels are software controlled. Maybe there’s diversity in the software and there is some fault-tolerant architecture. Maybe a voting system or some monitoring system saying, “Well, Channel Three’s output is looking a bit dodgy” or “Something gone wrong with Channel two”. I’ll ignore the channel at fault, and I’ll take the good output from the channels that are still working and I’ll use that. So that’s that option. Very common.

Then we’ve got number four, which is influential. So, the software is displaying some information for a human to interpret and to accept or reject.

And then we’ve got five, which is no safety impact at all. Now, the problem is this: because it’s very easy to say, “The software just displays some information, it doesn’t do anything”. So, unless a human does something – so we don’t have to worry about the safety implications of that at all. Wrong! Because the human operator may be forced to rely on the software output by circumstances, there may not be time to do anything else. Or the human may not be able to work out what’s going on without using the software output. Or more typically, the humans have just got used to the software generating the correct information or even they interpret it incorrectly.

And a classic example of that was when the American warship, the USS Vincennes, shot down an airliner and killed three hundred people because the way the system was set up, the supposedly not safety-related radar system was displaying information not associated with the airliner, but associated with the with a military Iranian aircraft. And the crew got mixed up and shot down the airliner. So, that’s a risky one. Even though it’s down at number four, that doesn’t mean it’s without risk or without criticality.

Commentary #4

So, if we have the software control category, and that’s down the right-hand side – sorry down the left-hand side, one to five. And along the top, we have the severity category from catastrophic down to negligible. We can use that to determine the software criticality index, which varies from one most critical down to five least critical. It’s similar to the risk assessment code in the table three coloured matrix that I showed you earlier. So, we’ve made – the writers of the standard have made a determination for us based on some assessment that they’ve done saying, “Well, this is this is how we assess these different criticality levels”. Whether there is actually any real-world evidence supporting this assessment, I don’t know and I’m not sure anybody else does either. However, that’s the standard and that’s where we are.

Commentary #5

And so just to finish up on the commentary. Task 208 is focused on software engineering, also programmable electronics, complex hardware, but typically electronics with software functionality or logic functionality embedded within it. Now if all of that software, all that programmable electronic systems, if they’re all developed already, is there any point in doing Task 208? That’s the first step – it’s got to pass the “So what?” test.

Is it feasible to do 208 and expect to get benefits? If not, maybe you just do system and subsystem hazard analysis. That’s tasks 205 and 204, respectively. And we just look at the complex components and subsystems as a black box and say, “OK, what’s it meant to do? What are the interfaces?” Maybe that would be a better thing to do. Particularly bearing in mind that the software or the complex electronic system could be working perfectly well and we still get an accident because there’s been a misunderstanding of the output. Maybe it’s more beneficial to look at those interfaces and think about, “Well, in what scenarios could the human misunderstand? How do we how do we guard against that?”

It’s also worth saying that some particularly American software development standards, can work well with Mil. Standard 882 because they share a similar conceptual basis. For example, I’ve seen many, many times in the air world, the systems software system safety standard is 882 and the systems software standard is DO-178 (AKA ED12, it’s the same standard, just different labels). Now they work relatively well together because the concept underpinning 178 is very similar to 882. It’s American centric.

It’s all about, you put requirements on the software development and – this is sort of a cookbook approach – the standard assumes that if you use the right ingredients and you mix them up in the right way, then you’re going to get a good result. And that’s a similar sort of concept for 882 and the two work relatively well together, fairly consistently. Also because they’re both American, there’s a great focus on software testing. Certainly, in the earlier versions of DO-178, it’s exclusively focused on software testing. Things like source code analysis and other things – more modern techniques that have come in – they’re not recognized at all in earlier versions of 178 because they just weren’t around.

That focus on testing suits 882, because 882, generates lots of requirements and constraints which you need to test. What it’s not so good at is generating cases where you say, “Well if this goes wrong” or “If we’re at the edge of the envelope where we should be, let’s test for those edge of the envelope cases, let’s test that the software is working correctly when it’s outside of the operating envelope that it should be”. Now, that kind of thinking isn’t so strong in 882, nor in 178. So, there are some limitations there. Good practice, experienced practitioners will overcome those by adding in the smarts that the standards lack. But just to be aware, a standard is not smart. You’ve still got to know what you’re doing in order to get the most out of it.

So, maybe you’re buying software that’s predevelopment or that you’re using – you’re not in the States. You’ve got a European or an Asian Indian supplier or Japanese supplier or whatever. Maybe they’re not using American style techniques and standards. Is that – how well is that going to work with 882? Are they compatible? They might be, but maybe they’re not. So, that requires some thought. If they’re not obviously compatible, then what do you need to do to make that translation and make it work. Or at least understand where the gaps are and what you might do about it to compensate?

And I’ve not talked about data, but it is worth mentioning that with data-rich systems these days – and I heard just the other day, is it two quintillion bytes of data being generated every two days or something ridiculous? That was back in 2017. So, gigantic amounts of data being generated these days and used by computing systems, particularly artificial intelligence systems. So, the rigour associated with that data  – the things that we need to think about on data are potentially just as important as the software. Because if the software is processing rubbish data, you’re probably going to get rubbish results. Or at the very least unreliable results that you can’t trust. So, you need to be thinking about all of those attributes of your data; correct, complete, consistent, etc, etc. I mean, I probably need to do a session on that and maybe I will.

Copyright Statement

That’s the presentation. As you can see, everything in italics and quotes is out of the standard, which is copyright free. But this presentation is copyright of the Safety Artisan.

For More…

And you will find many more presentations and a lot more resources at the website www.safetyartisan.com. Also, you’ll find the paid videos on our Patreon page, which is www.patreon.com/SafetyArtisan or go to Patreon and search for the Safety Artisan.

End

Well, that’s the end of our presentation, and it just remains for me to say thanks very much for listening. Thanks for your time and I look forward to seeing you in the next session, Task 209. Looking forward to it. Goodbye.

End

Categories
Start Here Work Health and Safety

Lessons Learned from a Fatal Accident

Lessons Learned: in this 30-minute video, we learn lessons from an accident in 2016 that killed four people on the Thunder River Rapids Ride in Queensland. The coroner’s report was issued this year, and we go through the summary of that report. In it we find failings in WHS Duties, Due Diligence, risk management, and failures to eliminate or minimize risks So Far As is Reasonably Practicable (SFARP). We do not ‘name and shame’, rather we focus on where we can find guidance to do better.

In 2016, four people died on the Thunder River Rapids Ride.

Lessons Learned: Key Points

We examine multiple failings in:

  • WHS Duties;
  • WHS Due Diligence;
  • Risk management; and
  • Eliminating or minimizing risks So Far As is Reasonably Practicable (SFARP).

Transcript: Lessons Learned from a Theme Park Tragedy

Click here for the Video Transcript

Introduction

Hello, everyone, and welcome to the Safety Artisan: purveyors of fine safety engineering training videos and other resources. I’m Simon and I’m your host and today we’re going to be doing something slightly different. So, there’re no PowerPoint slides. Instead, I’m going to be reading from a coroner’s report from a well-known accident here in Australia and we’re going to be learning some lessons in the context of WHS workplace health and safety law.

Disclaimer

Now, I’d just like to reassure you before we start that I won’t be mentioning the names of the deceased. I won’t be sharing any images of them. And I’m not even going to mention the firm that owned the theme park because this is not about bashing people when they’re down. It’s about us as a community learning lessons when things go wrong in order to fix the problem, not the blame. So that’s what I’d like to emphasize here.

The Coroner’s Report

So, I’m just turning to the summary of the coroner’s report. Basically, the coroner was examining the deaths of four people back in 2016 on what was called the Thunder River Rapids Ride. Or TRRR or TR3 for short because it’s a bit of a mouthful. This was a water ride, as the name implies, and what went wrong was the water level dropped. Rafts, these circular rafts that went down the rapids, went down the chute, got stuck. Another raft came up behind the stuck raft and went into it. One of the rafts tipped over.

These rafts seat six people in a circular configuration. You may have seen them. They’re in – different versions of this ride are in lots of theme parks.

But out of the six, unfortunately, the only two escaped and four people were killed, tragically. So that’s the background. That happened in October 2016, I think it was. The coroner’s report came out a few months ago, and I’ve been wanting to talk about it for some time because it really does illustrate very well a number of issues where WHS can help us do the right thing.

WHS duties

So, first of all, I’m looking at the first paragraph in the summary, the coroner starts off; the design and construction of the TRRR at the conveyor and unload area posed a significant risk to the health and safety of patrons. Notice that the coroner says the design and construction. Most people think that WHS only applies to workplaces and people managing workplaces, but it does a lot more than that. Sections 22 through 26 of the Act talk about the duties of designers, manufacturers, importers, suppliers and then people who commissioned, install, et cetera.

So, WHS supplies duties on a wide range of businesses and undertakings and designers and constructors are key. Now, it’s worth noting that there was no importer here. The theme park, although the TRRR ride was similar to a ride available commercially elsewhere, for some reason, they chose to design and build their own version in Queensland. Don’t know why. Anyway, that doesn’t really matter now. So, there was no importer, but otherwise, even if you didn’t design and construct the thing, if you imported it, the same duties still apply to you.

No effective risk assessment

So, the coroner then goes on to talk about risks and hazards and says each of these obvious hazards posed a risk to the safety of patrons on the ride and would have been easily identifiable to a competent person had one ever been commissioned to conduct a risk and hazard assessment of the ride. So, what the coroner is saying there is, “No effective risk assessment has been done”. Now, that is clearly contrary to the risk management code of practice under WHS and also, of course, that the definition of SFARP, so far as reasonably practicable, basically is a risk assessment or risk management process. So, if you’ve not done effective risk management, you can’t say that you’ve eliminated or minimized risks SFARP, which is another legal requirement. So, a double whammy there.

Then moving on. “Had noticed been taken of lessons learned from the preceding incidents, which were all of a very similar nature …” and then he goes on. Basically, that’s the back end of a sentence where he says, you didn’t do this, you had incidents on the ride, which are very similar in the past, and you didn’t learn from them. And again, with respect to reducing risks SFARP, Section 18 in the WHS Act, which talks about the definition of reasonably practicable, which is the core of SFARP, talks about what ought to have been known at the time. So, when you’re doing a risk assessment or maybe you’re reassessing risk after a modification and this ride was heavily modified several times or after an incident, you need to take account of the available information. And the owners of TRRR the operators clearly didn’t do that. So, another big failing.

The coroner goes on to note that records available with respect to the modifications to the ride are scant and ad hoc. And again, there’s a section in the WHS risk management code of practice about keeping records. It’s not that onerous. I mean, the COP is pretty simple but they didn’t meet the requirement of the code of practice. So, bad news again.

due diligence

And then finally, I’ve got to the bottom of page one. So, the coroner then notes the maintenance tasks undertaken on the ride whilst done so regularly and diligently by the staff, seemed to have been based upon historical checklists which were rarely reviewed despite the age of the device or changes to the applicable Australian standards.

Now, this is interesting. So, this is contravening a different section of the WHS Act. In Section 27, it talks about the duties of officers and effectively that sort of company directors, senior managers. Officers are supposed to exercise due diligence. In the act, due diligence is fairly simple- It’s six bullet points, but one of them is that the officers have to sort of keep up to date on what’s going on in their operation. They have to provide up to date and effective safety information for their staff. They’re also supposed to keep up with what’s going on in safety regulation that’s applicable to their operation. So, I reckon in that one statement from the coroner then there’s probably three breaches of due diligence there to start with.

risk controls lacking

We’ve reached the bottom of page one- Let’s carry on. The coroner then goes on to talk about risk controls that were or were not present and says, “in accordance with the hierarchy of controls, plant and engineering measures should have been considered as solutions to identified hazards”. So in WHS regulations and it’s repeated in the risk code of practice, there’s a thing called the hierarchy of controls. Basically, it says that some types of risk controls are more effective than others and therefore they come at the top of the list, whereas others are less effective and should be considered last.

So, top of the list is, “Can you eliminate the hazard?” If not, can you substitute the hazardous thing for something else that’s less hazardous- or with something else that is less hazardous, I should say? Can you put in engineering solutions or controls to control hazard? And then finally, at the bottom of my list is admin procedures for people to follow and then personal protective equipment for workers, for example. We’ll talk about this more later, but the top end of the hierarchy had just not been considered or not effectively anyway.

a predictable risk

So, the coroner then goes on to say, “raft’s coming together on the ride was a well-known risk, highlighted by the incident in 2001 and again in 2004”. Now actually it says 2004, I think that might be a typo. Elsewhere, it says 2014, but certainly, there were two significant incidents that were similar to the accident that actually killed four people. And it was acknowledged that various corrective measures could be undertaken to, quote, “adequately control the risk of raft collision”. However, a number of these suggestions were not implemented on the ride.

Now, given that they’ve demonstrated the ability to kill multiple people on the ride with a raft collision, it’s going to be a very, very difficult thing to justify not implementing controls. So, given the seriousness of the potential risk, to say that a control is feasible is practicable, but then to say “We’re not going to do it. It’s not reasonable”. That’s going to be very, very difficult to argue and I would suggest it’s almost a certainty that not all reasonably practicable controls were implemented, which means the risk is not SFARP, which is a legal requirement.

Further on, we come back to document management, which was poor with no formal risk register in place. So, no evidence of a proper risk assessment. Members of the department did not conduct any holistic risk assessments of rides with the general view that another department was responsible. So, the fact that risk assessment wasn’t done- That’s a failing. The fact that senior management didn’t knock heads together and say “This has to be done. Make it happen”- That’s also another failing. That’s a failing of due diligence, I suspect. So, we’ve got a couple more problems there.

high-risk plant

Then, later on, the coroner talks about necessary engineering oversight of high-risk plant not being done. Now, under WHS act definitions, amusement rides are counted as high-risk plant, presumably because of the number of serious accidents that have happened with them over the years. The managers of the TRRR didn’t meet their obligations with respect to high-risk plants. So, there are some things that are optional for common garden stuff is mandatory for high-risk plants and those obligations were not met it seems.

And then in just the next paragraph, we reinforce this due diligence issue. Only a scant amount of knowledge was held by those in management positions, including the general manager of engineering, as to the design modifications and passed notable incidents on the ride. One of the requirements of due diligence is that senior management must have a knowledge of their operations, a knowledge of the hazards and risks associated with the operations. So for the engineering manager to be ignorant about modifications and risks associated with the ride, I think is a clear failure of due diligence.

Still talking about engineering, the coroner notes “it is significant that the general manager had no knowledge of past incidents involving rafts coming together on the ride”. Again, due diligence. If things have happened those need to be investigated and learned from and then you need to apply fresh controls if that’s required. And again, this is a requirement. So, this shows a lack of due diligence. It’s also a requirement in the risk management code of practice to look at things when new knowledge is gained. So, a couple more failures there.

no water-level detection, alarm or emergency stop

Now, it said that the operators of the ride were well aware that when one pump failed, and there were two, the ride was no longer able to operate with the water level dropping dramatically, stranding the rafts on the steel support railings. And of course, that’s how the accident happened.

Regardless, there was no formal means by which to monitor the water level of the ride or audible alarm to advise one of the pumps had ceased to operate. So, a water level monitor? Well, we’re talking potentially about a float, which is a pretty simple thing. There’s one in every cistern, in every toilet in Australia. Maybe the one for the ride would have to be a bit more sophisticated than that- A bit industrial grade but basically the same principle.

And no alarm to advise the operators that this pump had failed, even though it was known that this would have a serious effect on the operation of the ride. So, there’re multiple problems here. I suspect you’ll be able to find regulations that require these things. Certainly, if you looked at the code of practice on plant design because this counts as industrial plants, it’s a high-risk plant, so you would expect very high standards of engineering controls on high-risk plants and these were missing. More on that later.

In a similar vein, the coroner says “a basic automated detection system for the water level would have been inexpensive and may have prevented the incident from occurring”. So basically, the coroner is saying this control mechanism would have been cheap so it’s certainly reasonably practicable. If you’ve got a cheap control that will prevent a serious injury or a death, then how on earth are you going to argue that it’s not reasonable to implement it? The onus is on us to implement all reasonably practical controls.

And then similarly, the lack of a single emergency stop on the ride, which was capable of initiating a complete shutdown of all the mechanisms, was also inadequate. And that’s another requirement from the code of practice on plant design, which refers back to WHS regulations. So, another breach there.

human factors

We then move on to a section where it talks about operators, operators’ account of the incident, and other human factors. I’m probably going to ask my friend Peter Bender, who is a Human Factors specialist, to come and do a session on this and look at this in some more detail, because there are rich pickings in this section and I’m just going to skim the surface here because we haven’t got time to do more. And the coroner says “it’s clear that these 38 signals and checks to be undertaken by the ride operators were excessive, particularly given that the failure to carry out anyone could potentially be a factor which would contribute to a serious incident”. So clearly, 38 signals and checks distributed between two ride operators, because there was no one operator in control of the whole ride- that’s a human factors nightmare for a start- but clearly, the work designed for the ride was poor. There is good guidance available from Safe Work Australia on good work design so there’s really no excuse for this kind of lapse.

And then the coroner goes on to say, reinforcing this point about the ride couldn’t be safely controlled by a human operator. The lack of engineering controls on a ride of this nature is unjustifiable. Again, reinforcing the point that risk was not SFARP because not all reasonably practicable controls had been implemented. Particularly controls at the higher end of the hierarchy of controls. So, a serious failing there.  

(Now, I’ve got something that I’m going to skip, actually, but – It’s a heck of a comment, but it’s not actually relevant to WHS.)

training and competence

We’re moving on to training and competence. Those responsible for managing the ride whilst following the process and procedure in place – and I’m glad to see you from a human practice point of view that the coroner is not just trying to blame the last person that touched it. He’s making a point of saying the operators did all the right stuff. Nevertheless, they were largely not qualified to perform the work for which they were charged.

The process and procedures that they were following seemed to have been created by unknown persons. Because of the poor record-keeping, presumably who it is safe to assume lacked the necessary expertise. And I think the coroner is making a reasonable assumption there, given the multiple failings that we’ve seen are in risk management, in due diligence, in record-keeping, in the knowledge of key people, et cetera, et cetera.

It seems that the practice at the park was simply to accept what had always been done in terms of policy and procedure. And despite changes to safety standards and practices happening over time, because this is an old ride, only limited and largely reactionary consideration was ever given to making changes, including training, providing to staff. So, reactionary -bad word. We’re supposed to predict risk and prevent harm happening. So, multiple failures on due diligence here and on staff training, providing adequate staff training, providing adequate procedures, et cetera.

The coroner goes on to say, “regardless of the training provided at the park, it would never have been sufficient to overcome the poor design of the ride. The lack of automation and engineering controls”. So, again, the hierarchy of controls was not applied, relatively cheap, engineering controls not used, placing an undue burden on the operator. Sadly, this is all too common and in many applications. This is one of the reasons they are not naming the ride operators or trying to shame them because I’ve seen this happen in so many different places. It wouldn’t be fair to single these people out.

‘incident free’ operations?

Now we have a curious, a curious little statement in paragraph 1040. The coroner says “submissions are made that there was a 30-year history of incident-free operation of the ride”. So, what it looks like is that the ride operators, management, trying to tell the coroner that they never had an incident on the ride in 30 years, which sounds pretty impressive, doesn’t it, at face value. But of course, the coroner already knew or discovered later on that there had been incidents on the ride. In fact, there have been two incidents that were very similar to the fatal accident.

Now, on the surface, this looks bad, doesn’t it? It looks like the ride management were trying to mislead the coroner. I don’t actually think that’s the case because I’ve seen that many organizations do poor incident reporting, poor incident recording, and poor learning from experience from incidents that it doesn’t surprise me that the senior management were not aware of incidents on their ride. Unfortunately, it’s partly human nature. Nobody likes to dwell on their failures or think about nasty things happening, and nobody likes to go to the boss saying we need to shut down a moneymaking ride. Don’t forget, this was a very popular ride. We need to shut down a moneymaking ride in order to spend more money-making modifications to make it safer. And then management turns around and say, “Well, nobody’s been hurt. So, what’s the problem?”

And again, I’ve seen this attitude again and again, even on people operating much more sophisticated and much more dangerous equipment than this. So, whilst this really does look bad- the optics are not good, as they like to say. I don’t think there’s actually a conspiracy going on here. I think it’s just stupid mistakes because it’s so common. Moving on.

standards

Now the coroner goes on to talk about standards not being followed, particularly when standards get updated over time. Bearing in mind this ride was 30 years old. The coroner states “it is essential that any difference in these standards are recognized and steps taken to ensure any shortfalls with a device manufactured internationally is managed”. Now, this is a little bit of an aside, because as I’ve mentioned before, the TRRR was actually designed and manufactured in Australia. Albeit not to any standards that we would recognize these days. But most rides were not and this highlights duties of importers. So, if you import something from abroad, you need to make sure that it complies with Australian requirements. That’s a requirement, that’s a duty under WHS law. We’ll come back to this in just a moment.

(We’ll skip that [comment] because we’ve done training and competency to death.)

the role of the regulator

So, following on about the international standards, the coroner also has a crack at the Queensland regulator, who I won’t name, and says “the regulator draws my attention to the difficulties arising when we’re requiring all amusement devices to comply with Australian standards. This difficulty is brought about by the fact that most amusement devices are designed and manufactured overseas, predominantly based on European standards”. Now, in the rest of the report, the coroner has a good old crack at the regulator. (If you’re Irish, a crack means a bit of fun. I’m not talking about a bit of fun.)

The coroner sticks the boot into the regulator for being pretty useless. And sadly, that’s no surprise in Australia. So basically, the regulator said, “Oh, it’s all too difficult!” And you think, “Well, it’s your job, actually, so why haven’t you done it properly?”

But being a little bit more practical, if you work in an industry where a lot of stuff is imported and let’s face it, that’s pretty common in Australia, you’ve got two choices. You can either try and change Australian standards so that they align better to the standards of the kit where you’re getting the stuff from in your industry, or maybe the regulators say could say, “Okay, this is a common problem across the industry. We will provide some guidance that tells you how to make that transition from the international standards to Australian standards and what we as the regulator consider acceptable and not acceptable”. And then that really helps the industry to do the right thing and to be consistent in terms of operation and enforcement.

So, the regulator is letting the people who they regulate know this is the standard that is required of you, this is what you have to do. And that’s really the job of a good regulator. So, the fact that the regulator in this particular case just hadn’t bothered to do so over a period of some decades, it would seem, doesn’t really say a lot for the professionalism of the regulator. And I’m not surprised that the coroners decided to have a go at them.

Summary

So, we’ve been through just over 20 comments, I think. I mean, I actually had 24/25 in total, but I skipped a few because they were a bit repetitive and it’s interesting to note that there were two major comments on failure to conduct designer duties and that kind of thing. Seven on risk management, four on SFARP, although of course, all the risk management ones also affects SFARP, and five on due diligence. So, there’re almost 20 significant breaches there and I wasn’t even really trying to pick up everything the coroner said. And bearing in mind, I was only reading from the summary. I didn’t bother reading the whole report because it’s pages and pages and pages.

And the lesson that we can draw from all of this friends, is not to bash the people who make mistakes, but to learn lessons for ourselves. How could we do better? And I think the lesson is everything that we need to do has been clearly set out in the WHS Act, in the WHS regulations. Then there’re codes of practice that give us guidance in particular areas and our general responsibilities and these codes of practice also guide us on to what could should be considered, SFARP, for certain hazards and risks. Then there’s also some fantastic guidance, documentation and information available from Safe Work Australia. On, for example, human factors and good work design and so on and so forth.

So, there’s lots of really good, really readable information out there and it’s all free. It’s all available on that wonderful thing we call the Internet. So, there really is no excuse for making basic mistakes like this and killing people. It’s not that difficult. And a lot of the safety requirements are not that onerous. You don’t have to be a rocket scientist to read them and understand them. A lot of the requirements are basic, structured, common sense. So, the lesson from this awful accident is it doesn’t have to be this way. We can do much better than that quite easily and if we don’t and something goes wrong, then the law will be after us.

looking ahead

It will be interesting to see – I believe that the WorkSafe Queensland are now investigating to see whether they’re going to bring any prosecutions. It should be said that the police investigated and didn’t bring any prosecutions against individuals. I don’t know if Queensland has a corporate manslaughter act. I wouldn’t think so based on the fact that they’ve not prosecuted anybody, but you don’t need to find an individual guilty of gross negligence, manslaughter for four WHS to take effect. So, I suspect that in due course, we will see the operators of the theme park probably cop a significant fine and maybe some of their directors and senior managers will be going to jail. That’s how serious these and how numerous these breaches are. You really don’t need to dig very deep to see what’s gone wrong and to see the legal obligations have not been met.

Since this video was recorded the TRRR owners have been charged with three offences under WHS law. They pleaded guilty and were fined $4.5M.

End of Lessons Learned

Back to the ‘Work Health & Safety‘ and ‘Start Here‘ Topics Pages.

Categories
Mil-Std-882E Safety Analysis

System Hazard Analysis

In this 45-minute session, The Safety Artisan looks at System Hazard Analysis, or SHA, which is Task 205 in Mil-Std-882E. We explore Task 205’s aim, description, scope, and contracting requirements. We also provide value-adding commentary, which explains SHA – how to use it to complement Sub-System Hazard Analysis (SSHA, Task 204) in order to get the maximum benefits for your System Safety Program.

This is the seven-minute-long demo. The full video is 47 minutes long.

System Hazard Analysis: Topics

  • Task 205 Purpose [differences vs. 204];
    • Verify subsystem compliance;
    • ID hazards (subsystem interfaces and faults);
    • ID hazards (integrated system design); and
    • Recommend necessary actions.
  • Task Description (five slides);
  • Reporting;
  • Contracting; and
  • Commentary.
Transcript: System Hazard Analysis

Introduction

Hello, everyone, and welcome to the Safety Artisan, where you will find professional, pragmatic, and impartial safety training resources and videos. I’m Simon, your host, and I’m recording this on the 13th of April 2020. And given the circumstances when I record this, I hope this finds you all well.

System Hazard Analysis Task 205

Let’s get on to our topic for today, which is System Hazard Analysis. Now, system hazard analysis is, as you may know, is Task 205 in the Mil. Standard 882E system safety standard.

Topics for this Session

What we’re going to cover in this session is purpose, task description, reporting, contracting and some commentary – although I’ll be making commentary all the way through. Going to the back to the top, the yellow highlighting with this and with task 204, I’m using the yellow highlighting to indicate differences between 205 and 204 because they are superficially quite similar. And then I’m using underlining to emphasize those things that I want to really bring to your attention and emphasize. Within task 205, purpose. We’ve got four purposes for this one. Verify subsistent compliance and recommend necessary actions – fourth one there. And then in the middle of the sandwich, we’ve got identification of hazards, both between the subsystem interfaces and faults from the subsystem propagating upwards to the overall system and identifying hazards in the integrated system design. So, quite different emphasis to 204 which was really thinking about subsystems in isolation. We’ve got five slides of task description, a couple on reporting, one on contracting – nothing new there – and several commentaries.

System Requirements Hazard Analysis (T205)

Let’s get straight on with it. The purpose, as we’ve already said, there is a three-fold purpose here; Verify system compliance, hazard identification and recommended actions, and then, as we can see in the yellow, the identifying previously unidentified hazards is split into two. Looking at subsystem interfaces and faults and the integration of the overall system design. And you can see the yellow bit, that’s different from 204 where we are taking this much higher-level view, taking an inter subsystem view and then an integrated view.

Task Description (T205) #1

On to the task description. The contract has got to do it and documented, as usual, looking at hazards and mitigations, or controls, in the integrated system design, including software and human interface. It’s very important that we’ll come onto that later. All the usual stuff about we’ve got to include COTS, GOTS, GFE and NDI. So, even if stuff is not being developed, if we’re putting together a jigsaw system from existing pieces, we’ve still got to look at the overall thing. And as with 204, we go down to the underlined text at the bottom of the slide, areas to consider. Think about performance, and degradation of performance, functional failures, timing and design errors, defects, inadvertent functioning – that classic functional failure analysis that we’ve seen before. And again, while conducting this analysis, we’ve got to include human beings as an integral component of the system, receiving inputs, and initiating outputs.  Human factors were included in this standard from long ago.

Task Description (T205) #2

Slide two. We’ve got to include a review of subsystem interrelationships. The assumption is that we’ve previously done task 204 down at a low level and now we’re building up to task 205. Again, verification of system compliance with requirements (A.), identification of new hazards and emergent hazards, recommendations for actions (B.), but Part C is really the new bit. We are looking at possible independent, dependent, and simultaneous events (C.) including system failures, failures of safety devices, common cause failures, and system interactions that could create a hazard or increase risk. And this is really the new stuff in 205 and we are going to emphasize in the commentary, you’re going to look very carefully at those underlying things because they are key to understanding task 205.

Task Description (T205) #3

Moving on to Slide 3, all new stuff, all in yellow. Degradation of the system or the total system (D.), design changes that affect subsystems (E.). Now, I’ve underlined this because what’s the constant in projects? It’s change. You start off thinking you’re going to do something and maybe the concept changes subtly or not so subtly during the project. Maybe your assumptions change the schedule changes, the resources available change. You thought you were going to get access to something, but it turns out that you’re not. So, all these things can change and cause problems, quite frankly, as I am sure we know. So, we need to deal with not just the program as we started out, but the program as it turns out to be – as it’s actually implemented. And that’s something I’ve seen often go awry because people hold on to what they started out with, partly because they’re frightened of change and also because of the work of really taking note changes. And it takes a really disciplined program or project manager to push back on random change and to control it well, and then think through the implications. So, that’s where strength of leadership comes in, but it is difficult to do.

Moving on now. It says effects of human errors (F.) in the blue, I’ve changed that. Human error implies that the human is at fault, that the human made a mistake. But very often, we design suboptimal systems and we just expect the human operator to cope. Whether it’s fair or unfair or unreasonable, it results in accidents. So, what we need to think about more generally is erroneous human action. So, something has gone wrong but it’s not necessarily the humans’ fault. Maybe the system has induced the human to make an error. We need to think very carefully about.

Moving on, determination (G.), potential contribution of all those components in G. 1. As we said before, all the non-developmental stuff. G.2, have design requirements in the specifications being satisfied? This standard emphasizes specifications and meeting requirements, we’ve discussed that in other lessons. G.3 and whether methods of system implementation have introduced any new hazards. Because of course, in the attempted to control hazards, we may introduce technology or plant or substances that themselves can create problems. So, we need to be wary of that.

Task Description (T205) #4

Moving on to slide four. Now, in 205.2.2, the assumption here is that the PM has specified methods to be used by the contractor. That’s not necessarily true, the PM may not be an expert in this stuff. While they may for contractual or whatever reasons have decided we want the contractor to decide what techniques to use. But the assumption here is that the PM has control and if the contractor decides they want to do something different they’ve got to get the PM’s authority to do that. This is assuming, of course, that the this has been specified in the contract.

And 205.2.3, whichever contractor is performing the system hazard analysis, the SHA, they are expected to have oversight of software development that’s going to be part of their system. And again, that doesn’t happen unless it’s contracted. So, if you don’t ask for it, you’re not going to get it because it costs money. So, if the ultimate client doesn’t insist on this in the contract and police it to be fair because it’s all very well asking for stuff. If you never check what you’re getting or what’s going on, you can’t be sure that it’s really happening. As an American Admiral Rickover once said, “You get the safety you inspect”. So, if you don’t inspect it, don’t expect to get anything in particular, or it’s an unknown. And again, if anything requires mitigation, the expectation in the standard is that it will be reported to the PM, the client PM this is and that they will have authority. This is an assumption in the way that the standard works. If you’re not going to run your project like that, then you need to think through the implications of using this standard and manage accordingly.

Task Description (T205) #5

And the final slide on task description. We’ve got another reminder that the contractor performing the SHA shall evaluate design changes. Again, if the client doesn’t contract for this it won’t necessarily happen. Or indeed, if the client doesn’t communicate that things have changed to the contractor or the subcontractors don’t communicate with the prime contractor then this won’t happen. So, we need to put in place communication channels and insist that these things happen. Configuration control, and so forth, is a good tool for making sure that this happens.

Reporting (T205) #1

So, if we move on to reporting, we’ve got two slides on this. No surprises, the contractor shall prepare a report that contains the results from the analysis as described. First, part A, we’ve got to have a system description. Including the physical and functional characteristics and subsystem interfaces. Again, always important, if we don’t have that system description, we don’t have the context to understand the hazard analysis that had been done or not being done for whatever reason. And the expectation is that there will be reference to more detailed information as and when it becomes available. So maybe detailed design stuff isn’t going to emerge until later, but it has to be included. Again, this has got to be required.

Reporting (T205) #2

Moving onto parts B and C. Part B as before we need to provide a description of each analysis method used, the assumptions made, and the data used in that analysis. Again, if you don’t do this, if you don’t include this description, it’s very hard for anybody to independently verify that what has been done is correct, complete, and consistent. And without that assurance, then that’s going to undermine the whole purpose of doing the analysis in the first place.

And then part C, we’ve got to provide the analysis results and at the bottom of this subparagraph is the assumption. The analysis results could be captured in the hazard tracking system, say the hazard log, but I would only expect the sort of leading to be captured in that hazard log. And the detail is going to be in the task 205 hazard analysis report, or whatever you’re calling it. We’ve talked about that before, so I’m not going to get into that here.

Contracting

And then the final bit of quotation from the standard is the contracting. And again, all the same things that you’ve seen before. We need to require the task to be completed. It’s no good just saying apply Mil. Standard 882E because the contractor, if they understand 882E, they will tailor it to suit selves, not the client. Or if they don’t understand 882E they may not do it at all, or just do it badly. Or indeed they may just produce a bunch of reports that have got all the right headings in as the data item description, which is usually supplied in the contract, but there may be no useful data under those headings. So, if you haven’t made it clear to the contractor, they need to conduct this analysis and then report on the results – I know it sounds obvious. I know this sounds silly having to say this, but I’ve seen it happen. You’ve got a contractor that does not understand what system safety is.

(Mind you, why have you contracted them in the first place to do this? You should know that you should have done your research, found out.)

But if it’s new to them, you’re going to have to explain it to them in words of one syllable or get somebody else to do it for them. And in my day job, this is very often what consultancies get called in to do. You’ve got a contractor who maybe is expert building tanks, or planes, or ships, or chemical plants, or whatever it might be, but they’re not expert in doing this kind of stuff. So, you bring in a specialist. And that’s part of my day job.

So, getting back to the subject. Yes, we’ve got to specify this stuff. We’ve got to specify it early, which implies that the client has done quite a lot of work to work this all out. And again, the client may above the line, as we say, say engage a consultant or whoever to help them with this, a specialist. We’ve got to include all of the details that are necessary. And of course, how do you know what’s necessary, unless you’ve worked it out. And you’ve got to supply the contractor, it says concept of operations, but really supplying the contractor with as much relevant data and information as you can, without bogging them down. But that context is important to getting good results and getting a successful program.

Illustration

I’ve got a little illustration here. The supposition in the standard in Task 205 is we’ve got a number of subsystems and there may be some other building blocks in there as well. And some infrastructure we’ve going to have probably some users, we’re going to have an operating environment, and maybe some external systems that our system, or the system of interest, interfaces with or interacts with in some way. And that interaction might be deliberate, or it might be just in the same operating environment at night. And they will interact intentionally or otherwise.

Commentary – Go Early

With that picture in mind, let’s think about some important points. And the first one is to get 205, get some 205-work done early. Now, the implication in the standard by the numbering and when you read the text is that subsystem hazard analysis comes first. You do those hexagonal building blocks first and then you build it up and task 205 comes after the subsystem hazard analysis. You thought, “Well, you’ve already got the SHHAs for each subsystem and then you build the SHA on top”. However, if you don’t do 205 early, you’re going to lose an opportunity to influence the design and to improve your system requirements. So, it’s worth doing an initial pass of 205 first, top-down, before you do the 204 hexagons and then come back up and redo 205. So, the first pass is done early to gain insight, to influence the design, and to improve your requirements, and to improve, let’s say, the prime contractor’s appreciation and reporting of what they are doing. And that’s really, dare I say, a quick and dirty stab at 205 could be quite cheap and will probably the payback/the return on investment should be large if you do it early enough. And of course, act on the results.

And then the second part is more about verifying compliance, verifying those as required interfaces, and looking at emergent stuff, stuff that’s emerged – the devil’s in the detail as the saying goes. We can look at the emerging stuff that’s coming out of that detail and then pull all that together and tidy up it up and look for emergent behaviour.

Commentary – Tools & Techniques

Looking at tools and techniques, most safety analysis techniques that we use look at single events or single failures only in isolation. And usually, we expect those events and failures to be independent. So, there’re lots of analyses out there. Basic fault tree analysis, event tree analysis. Well, event tree is slightly different in that we can think about subsequent failures, but there’re lots of basic techniques out there that will really only deal with a single failure at a time. However, 205.2.1C requires us to go further. We’ve got to think about dependent simultaneous events and common cause failures. And for a large and complex system, each of those can be a significant undertaking. So, if we’re doing task 205 well, we are going to push into these areas and not simply do a copy of task 204, but at a higher level. We’re now really talking about the second pass of 205. The previous, quick and dirty, 205 is done. The task 204 on the subsystems is done. Now we’re pulling it all together.

Dependent & Simultaneous Events

Let’s think about independent simultaneous events. First, dependent failures. Can an initial failure propagate? For example, a fire could lead to an explosion or an explosion could lead to a fire. That’s a classic combination. If something breaks or wears could be as simple as components wearing and then we get debris in the lubrication system. Could that – could the debris from component wear clog up the lubrication system and cause it to fail and then cause a more serious seizure of the overall system? Stuff like that. Or there may be more subtle functional effects. For example, electric effects, if we get a failure in an electrical system or even non-failure events that happen together. Could we get what’s called a sneak circuit? Could we get a reverse flow of current that we’re not expecting? And could that cause unexpected effects? There’s a special technique we’re looking at called sneak circuits analysis. That’s sneak, SNEAK, go look it up if you’re interested. Or could there be multiple effects from one failure? Now, I’ve already mentioned fire. It’s worth repeating again. Fire is the absolute classic. First, the effects of fire. You’ve got the fire triangle. So, to get fire, we need an inflammable substance, we need an ignition source, and we need heat. And without all three, we don’t get a fire. But once we do get a fire, all bets are off, and we can get multiple effects. So, we recall, you might remember from being tortured doing thermodynamics in class, you might remember the old equation that P1V1T1 equals P2V2T2. (And I’ve put R2 that for some reason, so sorry about that.)

What that’s saying is, your initial pressure, volume and temperature multiplied together, P1V1T1, is going to be the same as your subsequent pressure, volume and temperature multiply together, P2V2T2. So, what that means is if you dramatically increase the temperature say, because that’s what a fire does, then your volume and your pressure are going to change. So, in an enclosed space we get a great big increase in pressure, or if we’re in an unenclosed space, we’re going to get an increase in volume in a [gas or] fluid. So, if we start to heat the [gas or] fluid, it’s probably going to expand. And then that could cause a spill and further knock-on effects. And fire, as well as effect making pressure and volume changes to the fluids, it can weaken structures, it makes smoke, and produces toxic gases. So, it can produce all kinds of secondary hazardous effects that are dangerous in themselves and can mess up your carefully orchestrated engineering and procedural controls. So, for example, if you’ve got a fire that causes a pressure burst, you can destroy structures and your fire containment can fail. You can’t send necessarily people in to fix the problem because the area is now full of smoke and toxic gas. So, fire is a great example of this kind of thing where you think, “Well, if this happens, then this really messes up a lot of controls and causes a lot of secondary effects”. So, there’s a good example, but not the only one.

And then simultaneous events, a hugely different issue. What we’re talking about here is we have got undetected, or latent, failures. Something has failed, but it’s not apparent that it’s failed, we’re not aware, and that could be for all sorts of reasons. It could be a fatigue failure. We’ve got something that’s cracked, or it could be thermal fatigue. So, lots of things that can degrade physical systems, make them brittle. For example, an odd one, radiation causes most metals to expand and neutron bombardment makes them brittle. So, it can weaken things, structure and so forth. Or we might have a safety system that has failed, but because we’ve not called upon it in anger, we don’t notice. And then we have a failure, maybe the primary system fails. We expect the secondary system to kick in, but it doesn’t because there’s been some problem, or some knock-on effect has prevented the secondary system from kicking in. And I suspect we’ve all seen that happen.

My own experience of that was on a site I was working on. We had a big electricity failure, a contractor had sawed through the mains electricity cable or dug through it. And then, for some unknown reason, the emergency generators failed to kick in. So, that meant that a major site where thousands of people worked had to be evacuated because there was no electricity to run the computers. Even the old analogue phones failed after a while. Today, those phones would be digital, probably voice over IP, and without electricity, they’d fail instantly. And eventually, without power for the plumbing, the toilets back up. So, you’re going to end up having to evacuate the entire site because it’s unhygienic. So, some effects can be very widespread. Just because you had a late failure, and your backup system didn’t kick in when you expected it to.

So how can we look at that? Well, this is classic reliability modelling territory. We can look at meantime between failures, MTBF, and meantime to repair (MTTR) and therefore we could work out what the exposure time might be. We can work out, “What’s the likelihood of a latent failure occurring?” If we’ve got an interval, presumably we’ve going to test the system periodically. We’ve got to do a proof test. How often do we have to do the proof test to get a certain level of reliability or availability when we need the system to work? And we can look at synchronous and asynchronous events. And to do that, we can use several techniques. The classic ones, reliability, lock diagrams and fault tree analysis. Or if we’ve got repairable systems, we can use Markov chain modelling, which is very powerful. So, we can bring in time-dependent effects of systems failing at certain times and then being required, or systems failing and being repaired, and look at overall availability so that we can get an estimate of how often the overall system will be available. If we look at potential failures in all the redundant constituent parts. Lots of techniques there for doing that, some of them quite advanced. And again, very often this is what safety consultants, this is what we find ourselves doing so.

Common Cause Failures

Common cause failure, this is another classic. We might think about something very obvious and physical, maybe we get debris, maybe we’ve got three sets of input channels guarded by filters to stop debris getting into the system, but what if debris blocks all the filters so we get no flow? So, obvious – I say obvious – often missed sources of sometimes quite major accidents. Or let’s say something more subtle, we’ve got three redundant channels, or a number of redundant channels, in an electronic system and we need two out of three to work, or whatever it might be. But we’ve got the same software working each channel. So, if the software fails systematically, as it does, then potentially all three channels will just fail at the same time.

So, there’s a good example of non-independent failures taking down a system that on paper has a very high reliability but actually doesn’t. Once you start considering common cause failure or common mode analysis. So, really what we would like is we would like all redundancy to be diverse if possible. So, for example, if we wanted to know how much fuel we had left in the aeroplane, which is quite important if you want the engines to keep working, then we can employ diverse methods. We can use sensors to measure how much fuel is in the tanks directly and then we can cross-check that against a calculated figure where we’ve entered, let’s say, how much fuel was in the tanks to start with. And then we’ve been measuring the flow of fuel throughout the flight. So, we can calculate or estimate the amount of fuel and then cross-check that against the actual measurements in the tanks. So, there’s a good diverse method. Now, it’s not always possible to engineer a diverse method, particularly in complex systems. Sometimes there’s only really one way of doing something. So, diversity kind of goes out of the window in such an engineered system.

But maybe we can bring a human in.

So, another classic in the air world, we give pilots instruments in order to tell them what’s going on with the aeroplane, but we also suggest that they look out the window to look at reality and cross-check. Which is great if you’re not flying a cloud or in darkness and there are maybe visual references so you can’t necessarily cross-check. But even things like system failures, can the pilot look out the window and see which propeller has stopped turning? Or which engine the smoke and flames coming out of? And that might sound basic and silly, but there have been lots of very major accidents where that hasn’t been done and the pilots have shut down the wrong engine or they’ve managed the wrong emergency. And not just pilots, but operators of nuclear power plants and all kinds of things. So, visual inspection, going and looking at stuff if you have time, or take some diverse way of checking what’s going on, can be very helpful if you’re getting confusing results from instrument readings or sensor readings.

And those are examples of the terrific power of human diversity. Humans are good at taking different sensory inputs and fusing them together and forming a picture. Now, most of the time they fuse the data well and they get the correct picture, but sometimes they get confused by a system or they get contradictory inputs and they get the wrong mental model of what’s going on and then you can have a really bad accident. So, thinking about how we alert humans, how we use alarms to get humans attention, and how we employ human factors to make sure that we give the humans the right input, the right mental picture, mental model, is very important. So, back to human factors again, especially important, at this level for task 205.

And of course, there are many specialist common cause failure analysis techniques so we can use fault trees. Normally in a fault tree when you’ve got an and gate, we assume that those two sub-events are independent, but we can use ‘beta factors’ (they’re called) to say, “Let’s say event a and event b are not independent, but we think that 50 percent or 10 percent of the time they will happen at the same time”. So, you can put that beta factor in to change the calculation. So, fault trees can cope with non-independent fate is providing you program the logic correctly. You understand what’s going on. And maybe if there’s uncertainty on the beta factors, you must do some sensitivity modelling on the tree with different beta factors. Or you run multiple models of the tree, but again, we’re now talking quantitative techniques with the fault tree, maybe, or semi-quantitative. We’re talking quite advanced techniques, where you would need a specialist who knows what they do in this area to come up with realistic results, that sensitivity analysis. The other thing you need to do is if the sensitivity analysis gives you an answer that you don’t want, you need to do something about that and not just file away the analysis report in a cupboard and pretend it never happened. (Not that that’s ever happened in real life, boys and girls, never, ever, ever. You see my nose getting longer? Sorry, let’s move on before I get sued.)

So other classic techniques. Zonal hazard analysis, it looks at lots of different components in a compartment. If component A blows up, does it take out everything else in that compartment? Or if the compartment floods, what functionality do we lose in there? And particularly good for things like ships and planes, but also buildings with complex machinery. Big plant where you’ve got different stuff in different locations. There’re also things called particular risk analysis where you think of, and these tend to be very unusual things where you think about what a fan blade breaks in a jet engine. Can the jet engine contain the fan blade failure? And if not, where you’ve got very high energy piece of metal flying off somewhere – where does that go? Does that embed itself in the fuselage of the aeroplane? Does it puncture the pressure hull of the aeroplane? Or, as has sadly happened occasionally, does it penetrate and injure passengers? So, things like that, usually quite unusual things that are all very domain or industry specific. And then there are common mode analysis techniques and a good example of a standard that incorporates those things is ARP 4761. This is a civil aircraft standard which looks at those things quite well, for example, there are many others.

Summary

In summary, I’ve emphasized the differences between Task 205 and 204. So, we might do a first pass 205 and 204 where we’re essentially doing the same thing just at different levels of granularity. So, we might do the whole system initially 205, one big hexagon, and then we might break down the jigsaw and do some 204 at a more detailed level. But where 205 is really going to score is in the differences between 204. So instead of just repeating, it’s valuable to repeat that analysis at a higher-level, but really if we go to diversify if we want success. So, we need to think about the different purpose and timing of these analyses. We need to think about what we’re going to get out of going top-down versus bottom-up, different sides of the ‘V’ model let’s say.

We need to think about the differences of looking at internals versus external interfaces and interactions, and we need to think of appropriate techniques and tools for all those things – and, of course, whether we need to do that at all! We will have an idea about whether we need to do that from all the previous analysis. So, if we’ve done our PHI or PHA, we’ve looked at the history and some simple functional techniques, and we’ve involved end-users and we’ve learnt from experience. If we’ve done our early tasks, we’re going to get lots of clues about how much risk is present, both in terms of the magnitude of the risk and the complexity of the things that we’re dealing with. So, clearly, we’ve got a very complex thing with lots of risks where we could kill lots of people, we’re going to do a whole lot more analysis than for a simple low-risk system. And we’re going to be guided by the complexity and risks and the hot spots where they are and go “Clearly, I’ve got a particular interface or particular subsystem, which is a hotspot for risk. We’re going to concentrate our effort there”. If you haven’t done the early analysis, you don’t get those clues. So, you do the homework early, which is quite cheap and that helps you. Direct effort, the best return on investment.

The Second major bullet point, which I talk about this again and again. That the client and end-user and/or the prime contractor need to do analysis early in order to get the benefits and to help them set requirements for lower down the hierarchy and pass relevant information to the sub-contractors. Because the sub-contractors, if you leave them in isolation, they’ll do a hazard analysis in isolation, which is usually not as helpful as it could be. You get more out of it if you give them more context. So really, the ultimate client, end-user, and probably the prime as well, both need to do this task, even if they’re subcontracting it to somebody else. Whereas, maybe the sub-system hazard analysis 204 could be delegated just down to the sub-system contractors and suppliers. If they know what they’re doing and they’ve got the data to do it, of course. And if they haven’t, there’s somebody further up the food chain on the supply chain may have to do that.

And lastly, 204 and 205 are complimentary, but not the same. If you understand that and exploit those similarities and differences, you will get a much more powerful overall result. You’ll get synergy. You’ll get a win-win situation where the two different analyses complement, reinforce each other. And you’re going to get a lot more success probably for not much more money and effort time. If you’ve done that thinking exercise and really sought to exploit the two together, then you’re going to get a greater holistic result.

Copyright

So, that’s the end of our session for today. Just a reminder that I’ve quoted from the Mil. Standard 882, which is copyright free, but the contents of this presentation are copyright Safety Artisan, 2020.

For More …

And for more lessons and more resources, please do visit www.safetyartisan.com and you can see the videos at www.patreon.com/safetyartisan.

End

That’s the end of the lesson on system hazard analysis task 205. And it just reminds me to say thanks very much for watching and look out for the next in the series of Mil. Standard 882 tasks. We will be moving on to Task 206, which is Operating and Support Hazard Analysis (OSHA), a quite different analysis to what we’ve just been talking. Well, thanks very much for watching and it’s goodbye from me.

The End

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Mil-Std-882E Safety Analysis

Sub-System Hazard Analysis

In this video lesson, The Safety Artisan looks at Sub-System Hazard Analysis, or SSHA, which is Task 204 in Mil-Std-882E. We explore Task 204’s aim, description, scope, and contracting requirements. We also provide value-adding commentary and explain the issues with SSHA – how to do it well and avoid the pitfalls.

This is the seven-minute demo, the full video is 40-minutes’ long.

Topics: Sub-System Hazard Analysis

  • Preamble: Sub-system & System HA.
  • Task 204 Purpose:
    • Verify subsystem compliance;
    • Identify (new) hazards; and
    • Recommend necessary actions.
  • Task Description (six slides);
  • Reporting;
  • Contracting; and
  • Commentary.
Transcript: Sub-System Hazard Analysis

Introduction

Hello, everyone, and welcome to the Safety Artisan, where you will find professional, pragmatic, and impartial instruction on all things system safety. I’m Simon – I’m your host for today, as always and it’s the fourth of April 22. With everything that’s going on in the world, I hope that this video finds you safe and well.

Sub-System Hazard Analysis

Let’s move straight on to what we’re going to be doing. We’re going to be talking today about subsystem hazard analysis and this is task 204 under the military standard 882E. Previously we’ve done 201, which was preliminary hazard identification, 202, which is preliminary hazard analysis, and 203, which is safety requirements hazard analysis. And with task 204 and task 205, which is system has analysis, we’re now moving into getting stuck into particular systems that we’re thinking about, whether they be physical systems or intangible. We’re thinking about the system under consideration and I’m really getting into that analysis.

Topics for this Session

So, the topics that we’re going to cover today, I’ve got a little preamble to set things in perspective. We then get into the three purposes of task 204. First, to verify compliance. Secondly, to identify new hazards. And thirdly, to recommend necessary actions. Or in fact, that would be recommend control measures for hazards and risks. We’ve got six slides of task description, a couple of slides on reporting, one on contracting, and then a few slides on some commentary where I put in my tuppence worth and I’ll hopefully add some value to the basic bones of the standard. It’s worth saying that you’ll notice that subsystem is highlighted in yellow and the reason for that is that the subsystem and system hazard analysis tasks are very, very similar. They’re identical except for certain passages and I’ve highlighted those in yellow. Normally I use a yellow highlighter to emphasize something I want to talk about. This time around, I’m using underlining for that and the yellow is showing you what these different for subsystem analysis as opposed to system. And when you’ve watched both sessions on 204 and 205, I think you’ll see the significance of why I’ve done.

Preamble – Sub-system & System HA

Before we get started, we need to explain the system model that the 882 is assuming. If we look on the left-hand side of the hexagons, we’ve got our system in the centre, which we’re considering. Maybe that interfaces with other systems. They work within operating environment; hence we have the icon of the world, and the system and maybe other systems are there for a purpose. They’re performing some task; they’re doing some function and that’s indicated by the tools. We’re using the system to do something, whatever it might be.

Then as we move to the right-hand side, the system is itself broken down into subsystems. We’ve got a couple here. We’ve got sub-system A and B and then A further broken down into A1 and A2, for example. There’s some sort of hierarchy of subsystems that are coming together and being integrated to form the overall system. That is the overall picture that I’d like to bear in mind while we’re talking about this. The assumption in the 882, is we’re going to be looking at this subsystem hierarchy bottom upwards, largely. We’ll come on to that.

System Requirements Hazard Analysis (T204)

Purpose of the task, as I’ve said before, it’s threefold. We must verify subsystem compliance with requirements. Requirements to deal with risk and hazards. We must identify previously unidentified hazards which may emerge as we’re working at a lower level now. And we must recommend actions necessary. That’s further requirements to eliminate all hazards or mitigate associated risks. We’ll keep those three things in mind and that will keep coming up.

Task Description (T204) #1

The first of six slides on the task description. Basically, we are being told to perform and document the SSHA, sub-system hazard analysis. And it’s got to include everything, whether it be new developments, COTS, GOTS, GFE, NDI, software and humans, as we’ll see later. Everything must be included. And we’re being guided to consider the performance of the subsystem: ‘What it is doing when it is doing it properly’. We’ve got to consider performance degradation, functional failures, timing errors, design errors or defects, and inadvertent functioning – we’ll come back to that later. And while we’re doing analysis, we must consider the human as a component within the subsystem dealing with inputs and making outputs. If, of course, there is an associated human. We’ve got to include everything, and we’ve got to think about what could go wrong with the system.

Task Description (T204) #2

The minimum that the analysis has got to cover is as follows. We’ve got to verify subsystem compliance with requirements and that is to say, requirements to eliminate hazards or reduce risks. The first thing to note about that is you can’t verify compliance with requirements if there are no requirements. if you haven’t set any requirements on the subsystem provider or whoever is doing the analysis, then there’s nothing to comply with and you’ve got no leverage if the subsystem turns out to be dangerous. I often see it as it gets missed. People don’t do their top-down systems engineering properly; They don’t think through the requirements that they need; and, especially, they don’t do the preliminary hazard identification and analysis that they need to do. They don’t do Task 203, the SRHA, to think about what requirements they need to place further down the food chain, down the supply chain. And if you haven’t done that work, then you can’t be surprised if you get something back that’s not very good, or you can’t verify that it’s safe. Unfortunately, I see that happen often, even on exceptionally large projects. If you don’t ask, you don’t get, basically.

We’ve got two sub-paragraphs here that are unique to this task. First, we’ve got to validate flow down of design requirements. “Are these design requirements valid?”, “Are they the right requirements?” From the top-level spec down to more detailed design specifications for the subsystem. Again, if you haven’t specified anything, then you’ve got no leverage. Which is not to say that you have to dive into massive detail and tell the designer how to do their job, but you’ve got to set out what you want from them in terms of the product and what kind of process evidence you want associated with that product.

And then the second sub-paragraph, you’ve got to ensure design criteria in the subsystem specs have been satisfied. We need to verify that they’re satisfied, and that V and V of subsystem mitigation measures or risk controls have been included in test plans and procedures. As always, the Mil. standard 882 is the American standard, and they tend to go big on testing. Where it says test plans and procedures that might be anything – you might have been doing V and V by analysis, by demonstration, by testing, by other means. It’s not necessarily just testing, but that’s often the assumption.

Task Description (T204) #3

We must also identify previously unidentified hazards because we are now down at a low level of detail in a subsystem and stuff probably will emerge at that level that wasn’t available before. First, number one, we’ve got to ensure the implementation of subsystem design requirements and controls. And ensure that those requirements and controls have not introduced any new hazards, because very often accidents occur. Not because the system has gone wrong – the system is working as advertised – but the hazards with normal operation maybe just weren’t appreciated and guarded against or we just didn’t warn the operators that something might happen that they needed to look out for. A common shortfall, I’m afraid.

And number two, we’ve got to determine modes of failure down to component failure and human errors, single points of failure, common-mode failures, effects when failures occur in components, and from functional relationships. “What happens if something goes wrong over on this side of the system or subsystem and something else is happening over here?” What are those combinations? What could result? And again, we’ve got to consider hardware and software, including all non-developmental type stuff, and faults, and occurrences. Again, I see very often, buyers/purchases don’t think about the off the shelf stuff in advance or don’t include it. And then sometimes also you see contractors going “This is off the shelf, so we’re not analysing it.” Well, the standard requires that they do analyse it to the extent practicable. And they’ve got to look at what might go wrong with all of this non-development to stuff and integrate the possible effects and consider. That’s another common gotcha, I’m afraid. we do need to think about everything, whether it’s developmental or not.

Task Description (T204) #4

And then part C, recommending actions necessary to eliminate hazards if we can. Very often we can’t, of course, and we have to mitigate. We must reduce or minimize the associated risk of those hazards. In terms of the harm that might come to people. We’ve got to ensure that system-level hazards, it says attributed. Maybe we believe when we did the earlier analysis that the subsystem could contribute to a higher-level hazard, or maybe we’ve allocated some failure budget to this particular subsystem, which it has got to keep to if we’re going to meet the higher-level targets. You can imagine lots of these subsystems all feeding up a certain failure rate and different failure modes. And overall, when you pull it all together, we may have to meet some target or reduce the number of failures in their propagation upwards in order to manage hazards and risks. We’ve got to make sure that we’ve got adequate mitigation controls of these potential hazards are implemented in the design.

If we think back to the hierarchy, we prefer to fix things in the design, eliminate the hazard if possible, or make changes to the design to eliminate or reduce the hazard, rather than just rely on human beings to catch the problem and deal with it further downstream. It’s far more effective and cheaper, in the long run, to fix things in design they are more effective controls. Certainly, in this standard in Australian law, and in the UK and elsewhere, you will find either regulations or law or codes of practice or recognized and accepted good practice that says, “You should do this”. It’s a very, very common requirement and we should pretty much assume that we have to do this.

Task Description (T204) #5

Interesting clause here in 2.2, it says if no specific hazard analysis techniques are directed or the contractor wants to take a different route to what is directed, then they’ve got to obtain approval from the program manager. If the PM (Project Manager) hasn’t specified analysis techniques, and they may not wish to, they may just wish to say you’ll do whatever analysis is required in order to identify hazards and mitigate them. But in many industries, there are certain ways of doing things and I’ve said before in previous lessons, if you don’t specify that you want something, then contractors will very often cut the safety program to the bone in order to be the cheapest bid. the customer will get what they prioritize. If the customer prioritizes a cheap bid and doesn’t specify what they want, then they will get the bare minimum that the contractor thinks they can get away with. If you don’t ask, you don’t get – Becoming a theme that isn’t it?

Task Description (T204) #6

Let’s move on to 2.3. Returning to software, we’ve got to include that. The software might be developed separately, but nevertheless, the contractor performing the SSHA shall monitor the software development, shall obtain data from each phase of the software development process in order to evaluate the contribution of the software to the subsystem hazard analysis. There’s no excuse for just ignoring the software and treating it as a black box. Of course, very often these days the software is already developed. It’s a GFE or NDI item, but there still should be evidence available or you do a black-box analysis of the subsystem that the software is sitting in. Again, if the software developer reports any identified hazards, they’ve got to be reported to the program manager in order to request appropriate direction.

This assumes a level of interaction between the software developers right up the chain to the program manager. Again, this won’t happen unless the program manager directs it and pays for it. If the PM doesn’t want to pay for it, then they are either going to have to take a risk on not knowing about the functionality of the software that’s hidden within the subsystem. Or they’re going to deal with it some other way, which is often not effective. The PM needs to do a lot of work upfront in order to think what kind of problems there might be associated with a typical subsystem of whatever kind it is we’re dealing with. And think about “How would I deal with the associated risks?” “What’s the best way to deal with them in the circumstances?” If I’m buying stuff off the shelf and I’m not going to get access to hazard analysis or other kinds of evidence, how am I going to deal with them? Big questions.

And then 2.4, the contractor shall update the SSHA following changes, including software design changes. Again, we can’t just ignore those things.  That’s slide six out of six. Let’s move on to reporting.

Reporting (T204) #1

The first slide, contractor’s got to prepare a report that contains results from the task, including within the system description, physical and functional characteristics of the system, a list of the subsystems, and a detailed description of the subsystem being analysed, including its boundaries. And from other videos, you’ll know how much and how often I emphasize knowing where the boundaries are because you can’t really do effective safety analysis and safety management on an unbounded system. It just doesn’t work. There’s a requirement here for quite a lot of information reference to more detailed descriptions as they become available. The standard says they shall be supplied. That’s a lot of information that probably texts and pictures of all sorts of stuff and that’s going to need to go into a report. And typically, we would expect to see a hazard analysis report or a HAR with this kind of information in it. Again, if the PM/customer doesn’t specify that HAR, then they’re not going to get it and they’re not going to get textual information that they need to manage the overall system.

Reporting (T204) #2

So, if we move on to parts B and C of the reporting requirement. We’ve got to describe hazard analysis methods and techniques, provide a description of each method and the technique used, and a description of the assumptions made. And it says for each qualitative or quantitative data. This is another area that often gets missed. If you don’t know what techniques have been used and you don’t know the assumptions that almost certainly that subsystem analyser will have to make because they probably don’t have visibility in the rest of the system. If you don’t have that information, it becomes very difficult to verify the hazard analysis work and to have confidence in it.

And the hazard analysis results. Content and format vary. Something else the PM is going to think about and specify upfront. Then results should be captured the hazard tracking system. Now, usually, this hazard tracking system is hazard log. It might be a database, a spreadsheet or even a word document, or something like that. And usually, in the hazard tracking system, we have the leading particulars. We don’t always have, in fact, we shouldn’t have, every little piece of information in the hazard tracking system because it will quickly become unwieldy. Really, we want the hazard log to have the leading particulars of all the hazards, causes, consequences and controls. And then the hazard log should refer out to that hazard analysis report or other reports and data, whatever they’re called, other records.

If we go back up, this reemphasizes the kind of detail that’s here in 2.5 A. That really shouldn’t be going in the hazard log. That should be going in a separate report which the hazard log/the hazard tracking system refers to. Otherwise, it all gets that unwieldy.

Contracting

I’ve said repeatedly the PM needs to think about this and ask for that.

Contracting; The standard assumes that the information in A to H below is specified way up front in the request for proposal. That’s not always possible to do in full detail, but nevertheless, you’ve got to think about these things really early and include them in the contractual documentation. And again, if your if you’re running a competition, by the time you get to the final RFP, you need to make sure that you’re asking for what you really need. maybe run a preliminary expression of interest or pre-competition exercise in order to tease out, detect. We’ve got to impose task 204 (A.) as a requirement. We may have to specify which people we want to involve, which functional specialists, which discipline specialists (B.). We want to get involved to address this work. Identification of subsystems to be analysed (C.). Well, if you don’t know what the design is upfront, we can’t always do that, but you could say all.

You may specify desired analysis methodologies and techniques (D.). And again, that’s largely domain dependent. We tend to do safety in certain ways in different worlds, in the air world is done in a particular way. in the maritime world, it’s a different way. With Road or Off-Road Vehicle, it’s done in a particular way, etc, etc., whatever it might be. Chemical plant, whatever. If they’re known hazards, hazardous areas or other specific items be examined or excluded (E.) because they’re covered adequately elsewhere. The PM or the client has got to provide technical data on all those non-development developmental items (F.), particularly if they’re specifying that the contractor will use them. If the client says “You will use this. You will use these tires, therefore, this data with these tires” or whatever it might be, you’re going to – we want a system that’s going to use to standardized spares of standardized fuel or whatever it might be or is maintainable by technicians and mechanics with these standard skill sets. There may be all sorts of reasons for asking or forcing contracts to do certain things, in which case the purchaser is responsible for providing that data.

And again, many purchases forget to do that entirely or do it very badly, and then that can cripple a safety program. What’s the concept of operations (G.). What are we going to do with this stuff? What’s the context? What’s the big picture? That’s important. And any other specific requirements (H.). What risk matrix? What risk definitions are we using on this program? Again, important otherwise, different contractors do their own thing, or they do nothing at all. And then the client must pick up pieces afterwards, which is always time-consuming and expensive and painful. And it tends to happen at the back of a program when you’re under time pressure anyway. It’s never a happy place to be. do make sure that clients and purchases that you’ve done your homework and specified this stuff upfront, even if it turns out to be not the best thing you could have specified, it’s better to have an 80 percent solution that’s pretty standard and locked down.

Commentary #1

That’s the wording that’s in the task with some commentary by myself. Now some additional commentary. It says right up front, areas to consider include performance, performance degradation, functional failures, timing errors, design errors or defects and inadvertent function. What we have here basically is a causal analysis, there will be some simple techniques that you can use to identify this kind of stuff. Something like a functional failure analysis or a failure modes effects analysis, which is like an FFA, but an FMEA requires design to work on. And, FMEA, a variant of FME is FMECA, where we include the criticality of the failure as it possibly propagates out the hierarchy of the system.

These sorts of techniques will think about what could go wrong, no function when required, inadvertent function – the subsystem functions when it’s not supposed to – and incorrect function, and there’s often multiple versions of incorrect function. considering all of those causes, all of those failure modes and if we’re doing a big safety program on something quite critical, very often the those identified faults and failures and failure modes will feed into the bottom of a fault tree where we have a hierarchical build-up of causation and we look at how redundancy and mitigation and control measures mitigate those low-level failures and hopefully prevent them from becoming full-blown incidents and accidents.

And these techniques, particularly the FFA in the FMEA, are also good for hazard identification and for investigating performance and non-compliance issues. you can apply an FFA and FMEA those type of techniques to a specification and say “We’ve asked for this. What could happen if we get what we ask for?” What could go wrong? And, what could go wrong with these requirements?

Commentary #2

Now, the second part that I’ve chosen to highlight a consideration of the human within a subsystem and this is important. Traditionally, it’s not always been done that well. Human factors, I’m glad to say is becoming more prominent and more used both because in many, many systems, human is a key component, is a key player in the overall system. And in the past, we have tended to build systems and then just expect the human operator and maintainer to cope with the vicissitudes of that system. maybe the system isn’t that well designed in terms of it is not very usable, its performance depends on being lovingly looked after and tweaked and maybe systems are vulnerable to human error, and even induce human error. We need to get a lot better at designing systems for human use.

So, we could use several techniques. We could use a HAZOP, a hazard analysis operability study to consider information flows to and from the human. There are lots of specialist human factors analyses out there. And I’m hoping to run a series of human factors sessions, interviewing a very knowledgeable colleague of mine but more on that later. that will come in due course. We’ll look at those specialist human analysis techniques. But there’s been a couple of conceptual models around for quite a long time, about 20 years now at least, for how to think about humans in the system.

Human-System Models

So, we’ve got a 5M model and the SHELL model. I’m just going to briefly illustrate those. Now, both models are taken from the US Federal Aviation Authority System Safety Handbook, which dates to the end of 2000. These have been around a long time and they were around before the year 2000, and they’re quite long in the tooth.

We’ve got the SHELL model, which considers our software, hardware, environments and live-ware – the human. And there’s quite a nice checklist on Wikipedia for things to consider. We’re considering all the different interfaces between those different elements. That’s at the hyperlink you can see at the bottom of the slide.

Then on the right-hand side, we’ve got the 5M model and apologies for the gendered language. Where the five Ms are the man/the human, the machine, management, the media – and the media is the environment for operating and maintenance environment – and then in the middle is the mission. the humans, the machines, the systems, and the management come together in order to perform a mission within a certain environment. that’s another very useful way of conceptualizing our contribution of humans and interaction between human and system. Human operators usually are maintainers, frontline staff, and management, all in a particular operating environment and environmental context and how they come together to accomplish the mission or the function of the system, whatever it might be.

Now a word of caution, on this. It’s very possible to spend gigantic sums of money on human analysis. very often we tend to target it at the most critical points and we very often target it at the operator, particularly for those phases of operation where the operator must do things in a limited amount time. the operator will be under pressure and if they don’t take the right action within a certain time, something could go wrong. we do tend to target this analysis in those areas and tend to spend money hopefully in a sensible and targeted way.

Commentary #3

My final slide on the additional commentary. The other things we’ve talked about for this task, compliance checking. We should get a subsystem specification. If we don’t get a subsystem specification, well, what are the expectations on the subsystem? Are they documented anywhere? Is it in the consent to box? Is there an interface requirement document or are there interface control documents for other systems that or subsystems that interface with our subsystem – anywhere where we can get information. if we have a subsystem spec, a bunch of functional requirements say, early on we could do a functional failure analysis of those functional requirements. we can do this work really quite early if we need to and think about, “Well, what interfaces are expected or required from our subsystem?” versus “What is our subsystem actually do?” any mismatches that could give rise to problems.

So, this is a type of activity where we’re looking for continuity and we’re looking for coherence across the interface. And we’re looking for things to join up. And if they don’t join up or they’re mismatched, then there’s a potential problem. And, as we look down into the subsystem, are there any derived safety requirements from above that says this subsystem needs to do this or not do that in order to manage a hazard? Those are important to identify.

Again, if it’s not been done probably the subsystem contractor won’t do it because it’s extra expense. And they may well truly believe that they don’t need to. We’re all proud of the things that we do, and we feel sometimes emotionally threatened if somebody suggests a piece of kit might go wrong and it does blind people to potential problems.

If going the other way, we are a higher-level authority where a system prime contractor or something and we’ve got to look at the documentation from a subsystem supplier. Well, we might find out some information from sales brochures or feature lists, or there might be a description of the benefits or the functions of the system with its outputs. We hopefully should be able to get hold of some operating and maintenance manuals. And very often those manuals will contain warnings and cautions and say, “You must look after the piece of cake by doing this”. I’m thinking the gremlins now “Don’t feed it after midnight or get it wet” otherwise bad things will happen. Sorry about that, slightly fatuous example, but a good illustration, I think. And ideally, if there’s any training materials associated with the piece of kit, is there a training needs analysis that shows how the training was developed? It’s very often in a TNA if it’s done well, there’s lots of good information in there. Even if it’s not quite for the same application that weighs in the piece of kit for, you could learn a lot from that kind of stuff

And finally, if all else fails, if you’ve got a legacy piece of kit, then you can physically inspect it. And if you can take it apart, put it back together again – do so. You might discover there’s asbestos in it. You might discover that lithium batteries or whatever it might be, fire hazards, flammable materials, toxic materials, you name it. there’s a lot of ways that we can get information about the subsystem. Ideally, we ask for everything upfront. Say, you know, if there’s any hazardous chemicals in there, then you must provide the hazard sheets and the hazard data in accordance with international or national standards and so on and so forth. But if you can’t get that or you haven’t asked for it, there are other ways of doing it, but they’re often time-consuming and not the optimal way of doing it.

So again, do think about what you need upfront and do ask for it. And if the contractor can’t supply exactly what you want, what you need, you then have to decide whether you could live with that, whether you could use some of these alternative techniques or whether you just have to say, “No, thanks. I’ll go to another supplier of something similar”. And I may have to pay more for it, but I’ll get a better-quality product that actually comes with some safety evidence that means I can actually integrate it and use it within my system. Sometimes you do have to make some tough decisions and the earlier we do those tough decisions the better, in my experience.

Copyright Statement

So that’s all the technical content. Just to say that all the text that’s in italics and in speech marks is from the standard, which is copyright free. But this presentation, and especially all the commentary and the added value, is copyright at the Safety Artisan 2020.

For More …

And if you want more videos like this, rest in the 882 series and other resources on safety topics, you can find them at the website www.safetyartisan.com. And you can also go to the safety artisan page at Patreon. that’s www.pateron.com and search for Safety Artisan – all one word.

End

So, that’s the end of the presentation and it just remains for me to say, thanks very much for watching and supporting the Safety Artisan. And I’ll be doing Task 205 system hazard analysis next in the series, look forward to seeing you again soon. Bye-bye, everyone.

End: Sub-System Hazard Analysis

You can find a free pdf of the System Safety Engineering Standard, Mil-Std-882E, here.

Categories
Work Health and Safety

Guide to the WHS Act

This Guide to the WHS Act covers many topics of interest to system safety and design safety specialists, this full-length video explains the Federal Australian Work Health and Safety (WHS) Act (latest version, as of 14 Nov 2020). Brought to you by The Safety Artisan: professional, pragmatic, and impartial.

This is the four-minute demo of the full, 44-minute-long video.

Recap: In the Short Video…

which is here, we looked at:

  • The Primary Duty of Care; and
  • Duties of Designers.

Topics: Guide to the WHS Act

In this full video we will look at much more…

  • § 3, Object [of the Act];
  • § 4-8, Definitions;
  • § 12A, Exclusions;
  • § 18, Reasonably Practicable;
  • § 19, Primary Duty of Care;
  • § 22-26, Duties of Designers, Manufacturers, Importers, Suppliers & those who Install/Construct/Commission;
  • § 27, Officers & Due Diligence;
  • § 46-49, Consult, Cooperate & Coordinate;
  • § 152, Function of the Regulator; and
  • § 274-276, WHS Regulations and CoP.

Transcript: Guide to the WHS Act

Click here for the Transcript

Hi everyone and welcome to the Safety Artisan where you will find instructional videos like this one with professional, pragmatic and impartial advice which we hope you enjoy. I’m Simon and I’m recording this on the 13th of October 2019. So today we’re going to be talking about the Australian Federal Work Health and Safety Act and call it an unofficial guide or system or design safety practitioners whatever you want to call yourselves because I’m looking at the WHS Act from the point of view of system safety and design safety.

 As opposed to managing the workplace although it does that as well. Few days ago, I recorded a short video version of this and in the short video we looked at the primary duty of care and the duty particularly we look at the duty of designs. And so, we spent some time looking at that and that video is available on the freight on petrol on the safety artisan page at Patreon.com. It’s available at safetyartisan.com and you can watch it on YouTube. So just search for safety artisan on YouTube.

Topics

So, in this video, we’re going to look at much more than that. I say selected topics we’re not going to look at everything in the WHS Act as you can see there are several hundred sections of it. We’ll be here all day. So, what we’re going to look at are things that are relevant to systems safety to design safety. So, we look very briefly at the object of the act, at what it’s trying to achieve. Just one slight of definitions because there’s a lot of exclusions because the Act doesn’t apply to everything in Australia.

 We’re going to look at the Big Three involved. So really the three principles that will help us understand what the act is trying to achieve is:

  • what is reasonably practicable. That phrase that I’ve used several times before.
  • What is the primary duty of care so that sections 18 and 19. And if we jump to
  • Section 27 What are or who are officers and what does due diligence mean in a WHS setting?

So, if I step back one section 22 to 26 you know the duties of various people in the supply chain.  We cover that in the short session. So, go ahead and look at that and then moving on. There are requirements for duty holders to consult cooperate and coordinate and then a brief mention of the function of the regulator. And finally, the WHS Act enables WHS regulations and codes of practice. So we’re just mentioned that so those are the topics we’re going to cover quite a lot to get through. So that’s critical.

Disclaimer

So, first this is a disclaimer from the website from the federal legislation site and it does remind people looking at the site that the information put up there is for the benefit of the public and it’s free of charge.

 So, when you’re looking at this stuff you need to look at the relevance of the material for your purposes. OK, I’m looking at the Web site it is not a substitute for getting legal or appropriate professional advice relevant to your particular circumstances. So quick disclaimer there. This is just a way a website with general advice I think we’ll get we’ll get them and hence this video is only as good as the content that’s being present okay.

The Object of the Act

So, the object of the act then as you can say I’m quoting from it because I’m using quotation marks, so the main object of the act is to provide a balanced and nationally consistent framework for the health and safety of workers and workplaces.

 And that’s important in Australia because Australia is a federated state. So, we’ve got states and territories and we’ve got the federal government or the Commonwealth as it’s usually known and the laws all those different bodies do not always line up. In fact, sometimes it seems like the state and territories delight in doing things that are different from each other and different from the Commonwealth. And that’s not particularly helpful if you’re trying to you know operate in Australia as a corporation or you know you’re trying to do something big and trying to invest in the country.

 So, the WHS act of a model WHS Act was introduced to try and harmonize all this stuff. And you’ll see some more about that on the website. By the way and I’ve missed out on some objectives. As you can see, I’m not doing one subset B to H go to have a look at it online. But then in Section 2 The reminder is the principle of giving the highest level of protection against harm to workers and other persons as is reasonably practicable. Wonderful phrase again which will come back to okay.

Definitions

 Now there are lots of definitions in the act. And it’s worth having a look at them particularly if you look at the session that I did on system safety concepts, I was using definitions from the UK standard. Now I did that for a reason because that set of definitions was very well put together. So it was ideal for explaining those fundamental concepts where the concepts in Australia WHS are very different so if you are operating in Australian jurisdiction or you want to sell into an Australian jurisdiction do look at those definitions and actually being aware of what the definitions are will actually save you a lot of hassle in the long run.

 Now because we’re interested systems safety practitioners of introducing complex systems into service. I’ve got the definitions here of plant structure and substance. So basically, plant is any machinery equipment appliance container implement or to any component of those things and anything fitted or connected to any of those things. So, they go going for pretty a pretty broad definition. But bearing in mind we’re talking about plants we’re not talking about consumer goods. We’re not talking about selling toasters or electric toothbrushes to people. OK. There’s other legislation that covers consumer goods.

 Then when it comes to structure again, we’ve got anything that is constructed be fixed or movable temporary or permanent. And it might include things on the ground towers and masks underground pipelines infrastructure tunnels and mining any components or parts thereof. Again, a very broad definition and similarly substance any natural or artificial substance in whatever form it might be. So again, very broad and as you might recall from the previous session a lot of the rules for designers’ manufacturers, importers and suppliers cover plant structure and substances. So hence that’s why I picked just those three definitions out of the dozens there.

Exclusions

 It’s worth mentioning briefly exclusions: what the Act does not apply to. So, first, the Act does not apply to commercial ships basically. So, in Australia, the Federal legislation covering the safety of people in the commercial maritime industry is the Occupational Health and Safety Act (Maritime Industry) 1993, which is usually known as “OSHMI” applies to commercial vessels, so WHS does not. And the second exclusion is if you are operating an offshore petroleum or greenhouse gas storage platform and I think it’s more than three nautical miles offshore.

 But don’t take my word for that if you’re in that business go and check with the regulator NOPSEMA then this act the Offshore Petroleum and Greenhouse Gas Storage Act 2006 applies or OPGGS for short. So, if you’re in the offshore oil industry then you’ve got a separate Commonwealth act plot but those are the only two exceptions. So, where Commonwealth law applies the only things that WHS. does not apply to is commercial ships and offshore platforms I mentioned state and territory vs. Commonwealth. All the states and territories have adopted the model WHS system except Victoria which so far seems to be showing no interest in adopting WHS.

 Thanks, Victoria, for that. That’s very helpful! Western Australia is currently in process of consultation to adopt WHS, but they’ve still got their current OH&S legislation. So just note that there are some exclusions there. OK so if you’re in those jurisdictions then WHS does not apply. And of course, there are many other pieces of legislation and regulation that cover particular kinds of risk in Australia. For example, there’s a separate act called ARPANS that covers ionizing a non-ionizing radiation.

There are many other acts that cover safety and environmental things. Let’s go back one when I’m talking about those specific acts. They only apply to specific things whereas WHS act is a general Act applies to everything except those things that it doesn’t like to write move on.

So Far As is Reasonably Practicable

Okay now here we come to one of these three big ticket items and I’ve got two slides here. So, in this definition of reasonably practicable when it comes to ensuring health and safety reasonably practicable means doing what you are reasonably able to do to achieve the high standards of health safety in place.

 Considering and weighing up all the relevant matters; including, say, the first two we need to think about the likelihood of a hazard or risk. How likely is this thing to occur this potential threat to human health? And what’s the degree of harm that might result from the hazard or risk. So, we’ve got a likelihood and degree of harm or severity. So, if we recall the fundamental definition of risk is that it’s though it’s the factor of those two things taken together. So, this first part we’re thinking about what is the risk?

 And it’s worth mentioning that hazard is not defined in the Act and risk is very loosely defined. So, the act is being deliberately very broad here. We’re not taking a position on or style of approach to describing risks, so to the second part.

Having thought about the risk now we should consider what the person PCBU or officer, whoever it might be, ought reasonably to know about the hazard or risk and the ways of eliminating or minimizing the risks. So, what we should know about the risk and the ways of dealing with it of mitigating it of controlling and then we’ve got some more detail on these ways of controlling the risk.

 We need to think about the availability and suitability of ways to eliminate or minimize the risk. Now I’m probably going to do a separate session on reasonably practicable because there is a whole guidebook on how to do it. So, we’ll go through that and at some stage in the future and go through that step by step about how you determine availability and suitability et cetera. And so, once you get into it it’s not too difficult. You just need to follow the guidelines which are very clear and very well laid out.

 So having done all of those things, after assessing the extent of the risk and the available ways of controlling it the we can then think about the cost associated with those risk controls and whether the cost of those controls is grossly disproportionate to the risk. As we will see later, in the special session, if the cost is grossly disproportionate to the risk reduction then it’s probably not reasonable to do it. So, you don’t necessarily have to do it but we will step back and just look at the whole thing.

So, in a and b we’re looking at the likelihood and severity of the risk so and we’re (quantifying or qualitatively) assessing the risk. We’re thinking about what we could do about it, how available and suitable are those risk controls, and then putting it all together. How much will it cost to implement those risk controls and how reasonably practicable to do so. So what we have here is basically a risk assessment process that leads us to a decision about which controls we need to implement in order to achieve that ‘reasonably practicable’ statement that you see in so many parts of the act and indeed it’s also in the definition itself.

 So, this is how we determine what is reasonably practicable. We follow a risk assessment process. There is a risk assessment Code of Practice, which I will do a separate session on, which gives you a basic minimum risk assessment process to follow that will enable us to decide what is reasonably practicable. Okay, quite a big topic there. And as I say we’ll come back and do a couple more sessions on how to determine reasonably practical, so moving on to the primary duty of care we covered in the short session.

The Primary Duty of Care

 So I’m not really going to go through this again [in detail] but basically our primary duty is to ensure so far as is reasonably practicable the health and safety of workers, whether we’ve engaged them whether we’ve got somebody else to engage them or whether we are influencing or directing people carrying out the work. We have a primary duty of care if we’re doing any of those things. And secondly, it’s worth mentioning that the person conducting a business or undertaking the PCBU must ensure the health and safety of other people. Say, visitors to the workplace are members of the public who happen to be near the workplace.

 And of course, bearing in mind that this law applies to things like trains and aircraft if you have an accident with your moving vehicle or your plant you could put people in danger – in the case of aeroplanes anywhere in Australia and beyond. So, it’s not just about the work, the workers in the workplace. With some systems, you’ve got a very onerous responsibility to protect the public depending on what you’re doing. Now for a little bit more detail that we didn’t have in the short session. When we say we must ensure health and safety we’re talking about the provision and maintenance of a safe work environment or safe plant structures or safe systems of work talking about safe use handling and storage of structures and substances.

 We’re talking about adequate facilities for workers that are talking about the provision of information, training, instruction or supervision. Those workers and finally the health of workers and conditions of the workplace are monitored if need be for the purpose of preventing illness or injury. So, there should be some general monitoring of health and safety-related incidents. And if you’re dealing with certain chemicals or are you intentionally exposing people to certain things you may have to conduct special monitoring looking for contamination or poisoning of those people whatever it may be. So, you’ve got quite a bit of detail there about what it means to carry out the primary duty of care.

 And this is all consistent with the duties that we’ve talked about on designers, manufacturers, importers, and suppliers and for all these things there are codes of practice giving guidance on how to do these things. So, this whole work health and safety system is well thought through, put together, in that the law says you’ve got to do this. And there are regulations and codes of practice giving you more information on how you can fulfil your primary directive and indeed how you must fulfill your primary duty.

 And then finally there’s a slightly unusual part for at the end and this covers the special case where workers need to occupy accommodation under the control of the PCBU in order to get the job done. So you could imagine if you need workers to live somewhere remote and you provided accommodation then there are requirements for the employer to take care of those workers and maintain those premises so that they not exposed to risks.

 That’s a big deal because she might have a remote plant, especially in Australia which is a big place and not very well populated. You might be a long way away from external help. So if you have an emergency on-site you’re going to have to provide everything (not just an emergency you need to do that anyway) but if you’ve got workers living remotely as often happens in Australia you’ve got to look after those workers in a potentially very harsh environment.

And then finally it’s worth mentioning that self-employed persons have got to take care of their own health and safety. Note that a self-employed person is a PCBU, so even self-employed people have a duty of care as a PCBU.

The Three Duties

OK, sections 22 to 26. Take that primary duty of care and elaborate it for designers and manufacturers, importers and suppliers and for those installing constructing or commissioning plant substances and structures. And as we said in the free session all of those roles all of the people BCBS is doing that have three duties they have to ensure safety in a workplace and that includes you know designing and manufacturing the thing and ensuring that it’s safe and meets Australian regulations and obligations.

 We have a duty to test which actually includes doing all the calculations analysis and examination that’s needed to demonstrate safety and then to provide needed information to everybody who might use or come into contact with the system so those three duties apply consistently across the whole supply chain. Now we spent some time talking about that. We’re going to move on OK, so we are halfway through. So, a lot to take in. I hope you’re finding this useful and enjoying this. Let’s move on. Now this is an interesting one.

Officers of the PCBU

Officers of the PCBU have additional duties and an officer of the PCBU might be a company director. That’s explicitly included in the definition. A senior manager somebody who has influence. Offices of the PCBU must exercise due diligence. So basically, the implied relationship is you’ve got a PCBU, you’ve got somebody directing work whether it be design work manufacturing operating a piece of kit whatever it might be. And then there are more senior people who are in turn directing those PCBUs (the officers) so the officers must exercise due diligence to ensure that the PCBUs comply with their duties and obligations.

Sections 2 to 4 cover penalties for offices if they fail. I’m not going to discuss that because as I’ve said elsewhere on the Safety Artisan website, I don’t like threatening people with penalties because I actually think that results in poor behavior, it actually results in people shirking and avoiding their duties rather than embracing them and getting on with it. If you frighten people or tell them what’s going to happen to them, they get it wrong. So, I’m not going to go there. If you’re interested you can look up the penalties for various people, which are clearly laid out. We move on to Section 5.

Due Diligence

 We’re now talking about what is due diligence in the context of health and safety. OK, I need to be precise because the term due diligence appears in other Australian law in various places meaning various things, but here this is the definition of due diligence within the WHS context. So, we’ve got six things to do in order to demonstrate due diligence.

So, officers must acquire and keep up to date with knowledge of work health and safety matters obligations and so forth. Secondly, officers must gain an understanding of the nature of the operations of the piece and risks they control.  So, if you’re a company director you need to know something about what the operation does. You cannot hide behind “I didn’t know” because it’s a legal requirement for you to do it. So that closes off a whole bunch of defenses in court. You can’t plead ignorance because ignorance is, in fact, illegal and you’ve got to have a general understanding of the hazards and risks associated with those operations. So, you don’t necessarily have to be up on all the specifics of everything going on in your organization but whatever it is that your organization does. You should be aware of the general costs and risks associated with that kind of business.

Now, thirdly, we are moving on basically C D E and F refer to appropriate resources and processes, so the officers have got to ensure that PCBUs have available and use appropriate resources and processes in order to control risks. OK so that says you’ve got to provide those resources and processes and there is supervision, or some kind of process or requirement to say, yep, we put in let’s say a safety management system that ensures people do actually use the stuff that they are supposed to use in order to keep themselves safe.

 And that’s very relevant of course because often people don’t like wearing, for example, protective personal protective equipment because it’s uncomfortable or slows you down, so the temptation is to take it off. Moving on to part D we’re still on the appropriate processes; we must have appropriate processes for receiving and considering information on incidents, hazards and risks. So again, we’ve got to have something in place that keeps us up to date with the incidents, hazards and risks in our own plants and maybe similar plants in the industry and, we need a process to respond in a timely way to that information.

 So, if we discover that there is a new incident or hazard that you didn’t previously know about. We need to respond and react to that quickly enough to make a difference to the health and safety of workers. So again as another that sort of works in concert with part B doesn’t it. In part A and B we need to keep up to date on the risks and what’s going on in the business and part A, we need to ensure that the PCBU has processes for compliance with any duty or obligation and follows them again to provide that stuff.

In the system safety world, often the designers will need to provide the raw material that becomes those processes. Or maybe if we’re selling the product, we sell a product with the instruction manual with all the processes that could be required.

And then finally the officers must verify the provision and use of these resources and processes that we’ve been talking about in C D an E. So, we’ve got a simple six-point program that comprises due diligence, but as you can see it’s very to the point and it’s quite demanding. There’s no shirking this stuff or pretending you didn’t know and it’s I suspect it’s designed to hang Company directors who neglect and abuse their workers and, as a result, harm happens to them.

But I mean ultimately let’s face it this is all good common-sense stuff. We should be doing this anyway. And in any kind of high-risk industry we should have a safety management system that does all of this and more. These are only the minimum required for all industries and all undertakings in Australia. OK let’s move away from the big stick. Let’s talk about some sort of cozy, softer stuff.

Consult, Cooperate and Coordinate

If you are a duty holder, if you’ve got a duty of care to people as a PCBU or an officer, you must consult, cooperate and coordinate your activities with all other offices and bases be used.

You have a duty in relation to the same matter. So perhaps you are a supplier of kit and you get information from the designer or the manufacturer with the updates on safety or maybe they inform you of problems with the kit. You must pass that on. Let’s imagine you’re introducing a complex system into service. There are going to be lots of different stakeholders, and you all must work together in order to meet WHS obligations. So, there’s no excuse or trying to ask the buck to other people.

That’s not going to work if you haven’t actively managed the risk, as you are potentially already doing something illegal and again, we won’t talk about the penalties of this. We’re just talking about the good things we’re expected to do. So, we’re trying to keep it positive. And you’ve got a duty to consult with your workers who either carry out work or who are likely to be directly affected by what’s going on and the risks. Now, this is a requirement that procedures in Sections 2 and 3, but of course we should be consulting with our workers because they’ve often got practical knowledge about controlling risks and what is available and suitable to do so, which we will find helpful.

So, consulting workers is not only a duty it’s actually a good way of doing business and doing business efficiently so moving on to section 152.

The Regulator

There are several sections about the regulator, but to my mind, they don’t add much. So, we’re just going to talk about Section 152, which is the functions of a regulator and the regulator has got several functions. So, they give advice and make recommendations to the relevant minister or Commonwealth Minister of the government. They monitor and enforce compliance with the act.

 They provide advice and information to duty holders and the community they collect analyse and publish statistics. They’re supposed to foster a co-operative, consultative relationship in the community to promote and support education and training and to engage in and promote and coordinate the sharing of information. And then finally they’ve got some legal duties with courts and industrial tribunals, and here’s the catch-all, any other function conferred on the regulator by the Act. If we look at the first six the ones that I’ve highlighted there are a number of regulators in Australia and because of the complexity of our federal government system, we’ve got.

 It’s not always clear which regulator you need to deal with and not all regulators are very good at this stuff. I have to say having worked in Europe and America and Australia, for example on Part D. Australian regulators are not very good at analyzing and publishing statistics in general. Usually, if you want high-quality statistics from a regulator, you’re usually better off looking at a European regulator in your industry or an American regulator. The Aussie ones don’t seem to be very good at that, in general.

There are exceptions. NOPSEMA, for example in the offshore world, are particularly good. But then you would expect because of the inherent dangers of offshore operations. Otherwise, I’ve not been that impressed with some of the regulators. The exception to that is Safe Work Australia. So, if you’re looking for advice and information, statistics, education and training and sharing of information then Safe Work Australia is your best bet. Now ironically Safe Work Australia is not a regulator.

Safe Work Australia

They are a statutory authority and they created, in consultation with many others I might say, they created a model WHS Act the model regulations and the Model Codes practice. So, if you go on their website you will find lots of good information on there and indeed I tend to look at that in order to find information to post on safety artisan. So, they’ve got some good WHS information on there. But of course, the wherever you go look at their site you must bear in mind that they are not the regulator of anything or anyone. So, for you’ve also got to go and look at the find the relevant regulator to your business or undertaking and you’ve got to look at what your regulator requires you to do.

 Very often when it comes to looking at guidance your best bet is safe work Australia okay.

Regulations and Codes of Practice

I’ve mentioned regulations and codes of practice. Basically, these sections of the act enable those codes of practice and regulations so the Minister has power to approve Commonwealth codes of practice and similarly state and territory ministers can do the same for their versions of WHS. This is very interesting and we’ll come back to relook at codes of practice in another session. An approved code of practice is admissible in court as evidence, it’s admissible as the test of whether or not a duty or obligation under the WHS Act has been complied with.

 And basically, the implication of this is that you are ignorant of codes of practice at your peril because if something goes wrong then codes of practice are what you will be judged against at minimum. So that’s a very important point to note and we’ll come back to that on another session.

Next, Codes of Practice and then regulation-making powers. For some unknown reason to me, the Governor-General may authorize regulations. I mean that doesn’t really matter. The codes of practice and the regulations are out there, and the regulations are quite extensive.  I think six hundred pages. So, there’s a lot of stuff in there. And again, we’ll do a separate session on WHS regulations soon OK.

That’s All Folks!

I appreciate we’ve covered quite a lot of ground there but of course, you can watch the video as many times as you like and go and look at the Act online. Mentioning that all the information I’ve shown you is pretty much word for word taken from the federal register of legislation and I’m allowed to do that under the terms of the license.

Creative Commons Licence

 And it’s one of those terms I have to tell you that I took this information yesterday on the 12th of October 2019. You should always go to that website to find the latest on Commonwealth legislation (and indeed if you’re working on it state or territory jurisdiction you should go and see the relevant regulator’s legislation on their site). Finally, you will find more information on copyright and attribution at the SafetyArtisan.com website, where I’ve reproduced all of the requirements, which you can check. At the Safety Artisan we’re very pleased to comply with all our obligations.

Now for more on this video, you may have seen it on Patreon on the Safety Artisan page or you may have seen it elsewhere, but it is for sure available Patreon.com/SafetyArtisan. Okay. So, thank you very much for listening and all that remains for me to do is to sign off and say thanks for listening and I look forward to presenting another session to you in a month’s time. Take care.

Back to the WHS Topic Page.