Categories
Blog software safety

Software Safety Principles Conclusions and References

Software Safety Principles Conclusions and References is the sixth and final blog post on Principles of Software Safety Assurance. In them, we look at the 4+1 principles that underlie all software safety standards. (The previous post in the series is here.)

Read on to Benefit From…

The conclusions of this paper are brief and readable, but very valuable. It’s important for us – as professionals and team players – to be able to express these things to managers and other stakeholders clearly. Talking to non-specialists is something that most technical people could do better.

The references include links to the standards covered by the paper. Unsurprisingly, these are some of the most popular and widely used processes in software engineering. The other links take us to the key case studies that support the conclusions.

Content

We outline common software safety assurance principles that are evident in software safety standards and best practices. You can think of these guidelines as the unchanging foundation of any software safety argument because they hold true across projects and domains.

The principles serve as a guide for cross-sector certification and aid in maintaining comprehension of the “big picture” of software safety issues while evaluating and negotiating the specifics of individual standards.

Conclusion

These six blog posts have presented the 4+1 model of foundational principles of software safety assurance. The principles strongly connect to elements of current software safety assurance standards and they act as a common benchmark against which standards can be measured.

Through the examples provided, it’s also clear that, although these concepts can be stated clearly, they haven’t always been put into practice. There may still be difficulties with their application by current standards. Particularly, there is still a great deal of research and discussion going on about the management of confidence with respect to software safety assurance (Principle 4+1).

[My own, informal, observations agree with this last point. Some standards apply Principle 4+1 more rigorously, but as a result, they are more expensive. As a result, they are less popular and less used.]

Standards and References

[1] RTCA/EUROCAE, Software Considerations in Airborne Systems and Equipment Certification, DO-178C/ED-12C, 2011.

[2] CENELEC, EN-50128:2011 – Railway applications – Communication, signaling and processing systems – Software for railway control and protection systems, 2011.

[3] ISO-26262 Road vehicles – Functional safety, FDIS, International Organization for Standardization (ISO), 2011

[4] IEC-61508 – Functional Safety of Electrical / Electronic / Programmable Electronic Safety-Related Systems. International Electrotechnical Commission (IEC), 1998

[5] FDA, Examples of Reported Infusion Pump Problems, Accessed on 27 September 2012,

http://www.fda.gov/MedicalDevices/ProductsandMedicalProcedures/GeneralHospitalDevicesandSupplies/InfusionPumps/ucm202496.htm

[6] FDA, FDA Issues Statement on Baxter’s Recall of Colleague Infusion Pumps, Accessed on 27 September 2012, http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm210664.htm

[7] FDA, Total Product Life Cycle: Infusion Pump – Premarket Notification 510(k) Submissions, Draft Guidance, April 23, 2010.

[8] “Report on the Accident to Airbus A320-211 Aircraft in Warsaw on 14 September 1993”, Main Commission Aircraft Accident Investigation Warsaw, March 1994, http://www.rvs.unibielefeld.de/publications/Incidents/DOCS/ComAndRep/Warsaw/warsaw-report.html  Accessed on 1st October 2012.

[9] JPL Special Review Board, “Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions”, Jet Propulsion Laboratory”, March 2000.

[10] Australian Transport Safety Bureau. In-Flight Upset Event 240Km North-West of Perth, WA, Boeing Company 777-2000, 9M-MRG. Aviation Occurrence Report 200503722, 2007.

[11] H. Wolpe, General Accounting Office Report on Patriot Missile Software Problem, February 4, 1992, Accessed on 1st October 2012, Available at: http://www.fas.org/spp/starwars/gao/im92026.htm

[12] Y.C. Yeh, Triple-Triple Redundant 777 Primary Flight Computer, IEEE Aerospace Applications Conference pg 293-307, 1996.

[13] D.M. Hunns and N. Wainwright, Software-based protection for Sizewell B: the regulator’s perspective. Nuclear Engineering International, September 1991.

[14] R.D. Hawkins, T.P. Kelly, A Framework for Determining the Sufficiency of Software Safety Assurance. IET System Safety Conference, 2012.

[15] SAE. ARP 4754 – Guidelines for Development of Civil Aircraft and Systems. 1996.

Software Safety Principles: End of the Series

This blog post series was derived from ‘The Principles of Software Safety Assurance’, by RD Hawkins, I Habli & TP Kelly, University of York. The original paper is available for free here. I was privileged to be taught safety engineering by Tim Kelly, and others, at the University of York. I am pleased to share their valuable work in a more accessible format.

Meet the Author

My name’s Simon Di Nucci. I’m a practicing system safety engineer, and I have been, for the last 25 years; I’ve worked in all kinds of domains, aircraft, ships, submarines, sensors, and command and control systems, and some work on rail air traffic management systems, and lots of software safety. So, I’ve done a lot of different things!

Principles of Software Safety Training

Learn more about this subject in my course ‘Principles of Safe Software’ here.

My course on Udemy, ‘Principles of Software Safety Standards’ is a cut-down version of the full Principles Course. Nevertheless, it still scores 4.42 out of 5.00 and attracts comments like:

  • “It gives me an idea of standards as to how they are developed and the downward pyramid model of it.” 4* Niveditha V.
  • “This was really good course for starting the software safety standareds, comparing and reviewing strengths and weakness of them. Loved the how he try to fit each standared with4+1 principles. Highly recommend to anyone that want get into software safety.” 4.5* Amila R.
  • “The information provides a good overview. Perfect for someone like me who has worked with the standards but did not necessarily understand how the framework works.” 5* Mahesh Koonath V.
  • “Really good overview of key software standards and their strengths and weaknesses against the 4+1 Safety Principles.” 4.5* Ann H.
Categories
Blog System Safety

Safety Concepts Part 2

In this 33-minute session, Safety Concepts Part 2, The Safety Artisan equips you with more Safety Concepts. I look at the basic concepts of safety, risk, and hazard in order to understand how to assess and manage them.

Exploring these fundamental topics provides the foundations for all other safety topics, but it doesn’t have to be complex. The basics are simple, but they need to be thoroughly understood and practiced consistently to achieve success. This video explains the issues and discusses how to achieve that success.

Highlights of Safety Concepts, Part 2 video.

Get the full-length Lesson as part of the FREE Triple Learning Bundle.

Safety Concepts Part 2: Topics

  • Risk & Harm;
  • Accident & Accident Sequence;
  • (Cause), Hazard, Consequence & Mitigation;
  • Requirements / Essence of System Safety;
  • Hazard Identification & Analysis;
  • Risk Reduction / Estimation;
  • Risk Evaluation & Acceptance;
  • Risk Management & Safety Management; and
  • Safety Case & Report.

Safety Concepts Part 2: Transcript

Click Here for the Transcript

Hi everyone, and welcome to the safety artisan where you will find professional, pragmatic, and impartial advice on safety. I’m Simon, and welcome to the show today, which is recorded on the 23rd of September 2019. Today we’re going to talk about system safety concepts. A couple of days ago I recorded a short presentation (Part 1) on this, which is also on YouTube.  Today we are going to talk about the same concepts but in much more depth.

In the short session, we took some time picking apart the definition of ‘safe’. I’m not going to duplicate that here, so please feel free to go have a look. We said that to demonstrate that something was safe, we had to show that risk had been reduced to a level that is acceptable in whatever jurisdiction we’re working in.

And in this definition, there are a couple of tests that are appropriate that the U.K., but perhaps not elsewhere. We also must meet safety requirements. And we must define the Scope and bound the system that we’re talking about a Physical system or an intangible system like a computer program. We must define what we’re doing with it and what it’s being used for. And within which operating environment within which context is being used.  And if we could do all those things, then we can objectively say – or claim – that the system is safe.

Topics

We’re going to talk about a lot more Topics. We’re going to talk about risk accidents. The cause has a consequence sequence. They talk about requirements and. Spoiler alert. What I consider to be the essence of system safety. And then we’ll get into talking about the process. Of demonstrating safety, hazard identification, and analysis.

Risk Reduction and estimation. Risk Evaluation. And acceptance. And then pulling it all together. Risk management safety management. And finally, reporting, making an argument that the system is safe supporting with evidence. And summarizing all of that in a written report. This is what we do, albeit in different ways and calling it different things.

Risk

Onto the first topic. Risk and harm.  Our concept of risk. It’s a combination of the likelihood and severity of harm. Generally, we’re talking about harm. To people. Death. Injury. Damage to help. Now we might also choose to consider any damage to property in the environment. That’s all good. But I’m going to concentrate on harm to people. Because usually, that’s what we’re required to do. By the law. And there are other laws covering the environment and property sometimes. That. We’re not going to talk.  just to illustrate this point. This risk is a combination of Severity and likelihood.

We’ve got a very crude. Risk table here. With a likelihood along the top. And severity. Downside. And we might. See that by looking at the table if we have a high likelihood and high severity. Well, that’s a high risk. Whereas if we have Low Likelihood and low severity. We might say that’s a low risk. And then. In between, a combination of high and low we might say that’s medium. Now, this is a very crude and simple example. Deliberately.

You will see risk matrices like this. In. Loads of different standards. And you may be required to define your own for a specific system, there are lots of variations on this but they’re all basically. Doing this thing and we’re illustrating. How do we determine the level of risk. By that combination of severity. And likely, I think a picture is worth a thousand words. Moving online to the accident. We’re talking about (in this standard) an unintended event that causes harm.

Accidents, Sequences and Consequences

Not all jurisdictions just consider accidental events, some consider deliberate harm as well. We’ll leave that out. A good example of that is work health and safety in Australia but no doubt we’ll get to that in another video sometime. And the accident sequences the progression of events. That results in an accident that leads to an. Now we’re going to illustrate the accident sequence in a moment but before we get there. We need to think about cousins.  here we’ve got a hazardous physical situation or state of a system. Often following some initiating event that may lead to an accident, a thing that may cause harm.

And then allied with that we have the idea of consequences. Of outcomes or an outcome. Resulting from. An. Event. Now that all sounds a bit woolly doesn’t it, let’s illustrate that. Hopefully, this will make it a lot clearer. Now. I’ve got a sequence here. We have. Causes. That might lead to a hazard. And the hazard might lead to different consequences. And that’s the accident. See. Now in this standard, they didn’t explicitly define causes.

Cause, Hazard, and Consequence

They’re just called events. But most mostly we will deal with causes and consequences in system safety. And it’s probably just easier to implement it. Whether or not you choose to explicitly address every cause. That’s often an optional step. But this is the accident Sequence that we’re looking at. These sorts of funnels are meant to illustrate the fact that they may be many causes for one hazard. And one has it may lead to many consequences on some of those consequences. Maybe. No harm at all.

We may not actually have an accident. We may get away with it. We may have a. Hazard. And. Know no harm may befall a human. And if we take all of this together that’s the accident sequence. Now it’s worth reiterating that just because a hazard exists, it does not necessarily lead to harm. But to get to harm, we must have a hazard; a hazard is both necessary and sufficient. To lead to harmful consequences. OK.

Hazards: an Example

And you can think of a hazard as an accident waiting to happen. You can think of it in lots of different ways, let’s think about an example, the hazard might be. Somebody slips. Okay well while walking and all. That slip might be caused by many things it might be a wet surface. Let’s say it’s been raining, and the pavement is slippery, or it might be icy. It might be a spillage of oil on a surface, or you’d imagine something slippery like ball bearings on a surface.

So, there’s something that’s caused the surface to become slippery. A person slips – that’s the hazard. Now the person may catch themselves; they may not fall over. They may suffer no injury at all. Or they might fall and suffer a slight injury; and, very occasionally, they might suffer a severe injury. It depends on many different factors. You can imagine if you slipped while going downstairs, you’re much more likely to be injured.

And younger, healthy, fit people are more likely to get over a fall without being injured, whereas if they’re very elderly and frail, a fall can quite often result in a broken bone. If an elderly person breaks a bone in a fall the chances of them dying within the next 12 months are quite high. They’re about one in three.

So, the level of risk is sensitive to a lot of different factors. To get an accurate picture, an accurate estimate of risk, we’re going to need to factor in all those things. But before we get to that, we’ve already said that hazards need not lead to harm. In this standard, we call it an incident, where a hazard has occurred; it could have progressed to an accident but didn’t, we call this an incident. A near miss.

We got away with it. We were lucky. Whatever you want to call it. We’ve had an incident but no he’s been hurt. Hopefully, that incident is being reported, which will help us to prevent an actual accident in the future.  That’s another very useful concept that reminds us that not all hazards result in harm. Sometimes there will be no accident. There will be no harm simply because we were lucky, or because someone present took some action to prevent harm to themselves or others.

Mitigation Strategies (Controls)

But we would really like to deliberately design out or avoid Hazards if we can. What we need is a mitigation strategy, we need a measure or measures that, when we put them into practice, reduce that risk. Normally, we call these things controls. Again, now we’ve illustrated this; we’ve added to the funnels. We’ve added some mitigation strategies and they are the dark blue dashed lines.

And they are meant to represent Barriers that prevent the accident sequence from progressing towards harm. And they have dashed lines because very few controls are perfect, you know everything’s got holes in it. And we might have several of them. But usually, no control will cover all possible causes, and very few controls will deal with all possible consequences.  That’s what those barriers are meant to illustrate.

That idea that picture will be very useful to us later. When we are thinking about how we’re going to estimate and evaluate risk overall and what risk reduction we have achieved. And how we talk about justifying what we’ve done is good. That’s a very powerful illustration. Well, let’s move on to safety requirements.

Safety Requirements

Now. I guess it’s no great surprise to say that requirements, once met, can contribute directly to the safety of the system. Maybe we’ve got a safety requirement that says all cars will be fitted with seatbelts. Let’s say we’ll be required to wear a seatbelt.  That makes the system safer.

Or the requirement might be saying we need to provide evidence of the safety of the system. And, the requirement might refer to a process that we’ve got to go through or a set kind of evidence that we’ve got to provide. Safety requirements can cover either or both of these.

The Essence of System Safety

Requirements. Covering. Safety of the system or demonstrating that the system is safe. Should give us assurance, which is adequate confidence or justified confidence. Supported with evidence by following a process. And we’ll talk more about the process. We meet safety requirements. We get assurance that we’ve done the right thing. And this really brings us to the essence of what system safety is, we’ve got all these requirements – everything is a requirement really – including the requirement. To demonstrate risk reduction.

And those requirements may apply to the system itself, the product. Or they may provide, or they may apply to the process that generates the evidence or the evidence. Putting all those things together in an organized and orderly way really is the essence of system safety, this is where we are addressing safety in a systematic way, in an orderly way. In an organized way. (Those words will keep coming back). That’s the essence of system safety, as opposed to the day-to-day task of keeping a workplace safe.

Maybe by mopping up spills and providing handrails, so people don’t slip over. Things like that. We’re talking about a more sophisticated level of safety. Because we have a more complex problem a more challenging problem to deal with. That’s system safety. We will start on the process now, and we begin with hazard identification and analysis; first, we need to identify and list the hazards, the Hazards and accidents associated with the system.

We’ve got a system, physical or not. What could go wrong? We need to think about all the possibilities. And then having identified some hazards we need to start doing some analysis, we follow a process. That helps us to delve into the detail of those hazards and accidents. And to define and understand the accident sequences that could result. In fact, in doing the analysis we will very often identify some more hazards that we hadn’t thought of before, it’s not a straight-through process it tends to be an iterative process.

Risk Reduction

And ultimately what we’re trying to do is reduce risk, we want a systematic process, which is what we’re describing now. A systematic process of reducing risk. And at some point, we must estimate the risk that we’re left with. Before and after all these controls, these mitigations, are applied. That’s risk estimation.  Again, there’s that systematic word, we’re going to use all the available information to estimate the level of risk that we’ve got left. Recalling that risk is a combination of severity and likelihood.

Now as we get towards the end of the process, we need to evaluate risk against set criteria. And those criteria vary depending on which country you’re operating in or which industry we’re in: what regulations apply and what good practice is relevant. All those things can be a factor. Now, in this case, this is a U.K. standard, so we’ve got two tests for evaluating risk. It’s a systematic determination using all the available evidence. And it should be an objective evaluation as far as we can make it.

Risk Evaluation

We should use certain criteria on whether a risk can be accepted or not. And in the U.K. there are two tests for this. As we’ve said before, there is ALARP, the ‘As Low As is Reasonably Practicable’ test, which says: Have we put into practice all reasonably practicable controls? (To reduce risk, this is a risk reduction target). And then there’s an absolute level of risk to consider as well. Because even if we’ve taken all practical measures, the risk remaining might still be so high as to be unacceptable to the law.

Now that test is specific to the U.K, so we don’t have to worry too much about it. The point is there are objective criteria, which we must test ourselves or measure ourselves against. An evaluation that will pop out the decision, as to whether a further risk reduction is necessary if the risk level is still too high. We might conclude that are still reasonably practicable measures that we could take. Then we’ve got to do it.

We have an objective decision-making process to say: have we done enough to reduce risk? And if not, we need to do some more until we get to the point where we can apply the test again and say yes, we’ve done enough. Right, that’s rather a long-winded way of explaining that. I apologize, but it is a key issue and it does trip up a lot of people.

Risk Acceptance

Now, once we’ve concluded that we’ve done enough to reduce risk and no further risk reduction is necessary, somebody should be in a position to accept that risk.  Again, it’s a systematic process, by which relevant stakeholders agree that risks may be accepted. In other words, somebody with the right authority has said yes, we’re going to go ahead with the system and put it into practice, implement it. The resulting risks to people are acceptable, providing we apply the controls.

And we accept that responsibility.  Those people who are signing off on those risks are exposing themselves and/or other people to risk. Usually, they are employees, but sometimes members of the public as well, or customers. If you’re going to put customers in an airliner you’re saying yes there is a level of risk to passengers, but that the regulator, or whoever, has deemed [the risk] to be acceptable. It’s a formal process to get those risks accepted and say yes, we can proceed. But again, that varies greatly between different countries, between different industries. Depending on what regulations and laws and practices apply. (We’ll talk about different applications in another section.)

Risk Management

Now putting all this together we call this risk management.  Again, that wonderful systematic word: a systematic application of policies, procedures, and practices to these tasks. We have hazard identification, analysis, risk estimation, risk evaluation, risk reduction & risk acceptance. It’s helpful to demonstrate that we’ve got a process here, where we go through these things in order. Now, this is a simplified picture because it kind of implies that you just go through the process once.

With a complex system, you go through the process at least once. We may identify further hazards when we get into Hazard Analysis and estimating risk. In the process of trying to do those things, even as late as applying controls and getting to risk acceptance. We may discover that we need to do additional work. We may try and apply controls and discover the controls that we thought were going to be effective are not effective.

Our evaluation of the level of risk and its acceptability is wrong because it was based on the premise that controls would be effective, and we’ve discovered that they’re not, so we must go back and redo some work. Maybe as we go through, we even discover Hazards that we hadn’t anticipated before. This can and does happen, it’s not necessarily a straight-through process. We can iterate through this process. Perhaps several times, while we are moving forward.

Safety Management

OK, Safety Management. We’ve gone to a higher level really than risk because we’re thinking about requirements as well as risk. We’re going to apply organization, we’re going to apply management principles to achieve safety with high confidence. For the first time, we’ve introduced this idea of confidence in what we’re doing. Well, I say the first time, this is insurance isn’t it? Assurance, having justified confidence, or appropriate confidence because we’ve got the evidence. And that might be product evidence too we might have tested the product to show that it’s safe.

We might have analyzed it. We might have said well we’ve shown that we follow the process that gives us confidence that our evidence is good. And we’ve done all the right things and identified all the risks.  That’s safety management. We need to put that in a safety management system, we’ve got a defined organizational structure, and we have defined processes, procedures, and methods. That gives us direction and control of all the activities that we need to put together in combination to effectively meet safety requirements and safety policy.

And our safety tests, whatever they might be. More and more now we’re thinking about top-level organization and planning to achieve the outcomes we need. With a complex system, a complex operating environment, and a complex application.

Safety Planning

Now I’ll just mention planning. Okay, we need a safety management plan that defines the strategy: how we’re going to get there, how are we going to address safety. We need to document that safety management system for a specific project. Planning is very important for effective safety. Safety is very vulnerable to poor planning. If a project is badly planned or not planned at all, it becomes very difficult to Do safety effectively, because we are dependent on the process, on following a rigorous process to give us confidence that all results are correct.  If you’ve got a project that is a bit haphazard, that’s not going to help you achieve the objectives.

Planning is important. Now the bit of that safety plan that deals with timescales, milestones, and other date-related information. We might refer to it as a safety program. Now being a UK Definition, British English has two spellings of program. The double-m-e-version of programme. Applies to that time-based progression, or milestone-based progression.

Whereas in the US and in Australia, for example, we don’t have those two words we just have the one word, ‘program’. Which Covers everything: computer programs, a program of work that might have nothing to do with or might not be determined by timescales or milestones. Or one that is. But the point is that certain things may have to happen at certain points in time or before certain milestones. We may need to demonstrate safety before we are allowed to proceed to tests and trials or before we are allowed to put our system into service.

Demonstrating Safety

We’ve got to demonstrate that Safety has been achieved before we expose people to risk.  That’s very simple. Now, finally, we’re almost at the end. Now we need to provide a demonstration – maybe to a regulator, maybe to customers – that we have achieved safety.  This standard uses the concept of a safety case. The safety case is basically, imagine a portfolio full of evidence.  We’ve got a structured argument to put it all together. We’ve got a body of evidence that supports the argument.

It provides a Compelling, Comprehensible (or understandable), and valid case that a system is safe. For a given application or use, in a given Operating environment.  Really, that definition of what a safety case is, harks back to that meaning of safety.  We’ve got something that really hits the nail on the head. And we might put all of that together and summarise it in a safety case report. That summarises those arguments and evidence, and documents progress against the Safe program.

Remember I said our planning was important. We started off by saying that we need to do this, that the other in order to achieve safety. Hopefully, in the end, in the safety report, we’ll be able to state that we’ve done exactly that. We did do all those things. We did follow the process rigorously. We’ve got good results. We’ve got a robust safety argument. With evidence to support it. In the end, it’s all written up in a report.

Documenting Safety

Now that isn’t always going to be called a safety case report; it might be called a safety assessment report or a design justification report. There are lots of names for these things. But they all tend to do the same kind of thing, where they pull together the argument as to why the system is safe. The evidence to support the argument, document progress against a plan or some set of process requirements from a standard or a regulator or just good practice in industry to say: Yes, we’ve done what we were expected to do.

The result is usually that’s what justifies [the system] getting past that milestone. Where the system is going into service and can be used. People can be exposed to those risks, but safely and under control.

Everyone’s a winner, as they say!

Copyright – Creative Commons Licence

Okay. I’ve used a lot of information from a UK government website. I’ve done that in accordance with the terms of its creative commons license, and you can see more about that here. We have complied with that, as we are required to, and to say to you that the information we’ve supplied is under the terms of this license.

Safety Concepts Part 2: More Resources

And for more resources and for more lessons on system safety. And other safe topics. I invite you to visit the safety artisan.com website  Thanks very much for watching. I hope you found that useful.

We’ve covered a lot of information there, but hopefully in a structured way. We’ve repeated the key concepts and you can see that in that standard. The key concepts are consistently defined, and they reinforce each other. In order to get that systematic, disciplined approach to safety, that’s what we need.

Anyway, that’s enough for me. I hope you enjoyed watching it and found that useful. I look forward to talking to you again soon. Please send me some feedback about what you thought about this video and also what you would like to see covered in the future.

Thank you for visiting The Safety Artisan. I look forward to talking to you again soon. Goodbye.

Safety Concepts Part 1 defines the meaning of ‘Safe’, and it is free. Get the full-length Lesson as part of the FREE Triple Learning Bundle.

Meet the Author

Learn safety engineering with me, an industry professional with 25 years of experience, I have:

•Worked on aircraft, ships, submarines, ATMS, trains, and software;

•Tiny programs to some of the biggest (Eurofighter, Future Submarine);

•In the UK and Australia, on US and European programs;

•Taught safety to hundreds of people in the classroom, and thousands online;

•Presented on safety topics at several international conferences.