Session 9: Verification and Validation

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: The focus of this lecture is design verification and validation. Other concepts including design tesing and technical risk management and flight readiness review are also introduced.

Instructor: Olivier de Weck

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

OLIVIER DE WECK: OK so let's start on the material. We're following the V-model and the empty boxes are getting fewer and fewer every week here. So we're on the right side moving up toward lifecycle management and operations. And today's topic is via V&V, verification validation. And I also want to discuss the FRR as one of the key milestones, and that's the flight readiness review. If you're not building something that flies, this is your launch to market, right launch to market review. Are you ready to launch your project or product to market?

The outline is-- we have quite a few things to cover. So first of all, I want to drill into verification and validation. What's the difference? What is their role? What's their position in the lifecycle? Then we'll spend quite a bit of time on the issue of testing. What kind of testing is done? Why is it done? We'll talk about aircraft testing, flight testing, we'll talk about spacecraft testing, but also some caveats. Testing is not always the-- it's not free of challenges and difficulties.

Then I want to talk about technical risk management, which is often covered in classes on project management, but I think it's essential here as well as a system engineer to have a good grasp on technical risk management. So I'll cover the risk matrix, the Iron Triangle, cost, schedule, scope, and risk, and then I added a small section on system safety, which is a very important topic as well. And we'll finish up with discussing the FRR, the flight readiness review.

So the readings related to this lecture are sections 5.3 and 5.4 in the handbook, the System Engineering Handbook, and then there's a couple of appendices there, Appendix E And I that are very, very helpful. And at least one of these I'll mention in the lecture. Plus one of the papers, and this is a paper about a decade old-- and I know a lot of work has been done on this since then, but this is a paper by Professor Leveson. Nancy Leveson was a colleague of mine here at MIT who's really an expert in systems safety. So the system safety model that she's developed is the subject of that paper.

OK, so let's talk about V&V, verification and validation, and how they fit together. You've seen this diagram before, but I just want to talk through it in some detail again. So the idea is you start your project, you're undertaking, in the upper left. You do your stakeholder analysis. Who are the stakeholders, the customers, beneficiaries, the regulators, the suppliers, the partners on the project? So you really have to do a good job doing your stakeholder analysis then in order to write your requirements.

And I do have to say I've been very pleased, especially with your assignment A-2. You really dug into those 47 requirements for canned set. You grouped them, you scrubbed them, you did a great job. And the idea is that for each of these requirements, you also have target values. There are certain thresholds or target values that have to be achieved. And then you actually do the development. You do the conceptual design, the detailed design, and that's written here as functional deployment. In other words, especially for the functional and the performance requirements, how will you actually implement those, embody those, in technology, hardware, software, and so forth. So that's your intended function, your concept, and then you're implemented design solution.

Now the question is, do you actually satisfy A, the requirements, and do you satisfy your stakeholders? And it is possible that you satisfy the requirements, but not the stakeholders. So the way to think about this is we're going to close the loop. In fact, we're going to close two loops. An inner loop-- and I put testing here. Testing is really one of the ways to verify whether you meet requirements. There's other ways too, but testing is often the most important. So we close this inner loop. So what we ask is-- we test our implemented solution, our implemented design, and ask the question, did we deliver? Did we actually satisfy the requirements as they were written? Do we satisfy the requirements as they were written? And are these attainable? Were these requirements attainable?

So this loop here, this inner loop, is the verification loop. You verify whether you design as implemented satisfies the requirements as written. That's what verification is. And then there's an outer loop where you take your implemented design solution, and you essentially bring it all the way back to the stakeholders. And usually that also means you're employing it in a realistic environment, like in the environment that the stakeholders will actually use the system, not in a pristine lab environment. And you have the stakeholders try out your system and see whether they're satisfied, whether this meets their original intent.

You remember the CONOPS, concept of operations? Can you actually do the CONOPS the way you had envisioned it in a realistic environment? And that's what we call validation. And that's the outer loop. You see the difference? So a lot of people who don't know system engineering who have never been exposed to this when they hear verification and validation, they think it's basically two different words for the same thing. It is it is different, it's not the same thing. And then if you successfully verify and validate, you end the SE process and you deliver, which is good.

So this is something I pulled out from the handbook, which is the differences between verification and validation. And I'm just summarizing this here. So one way to ask is, was the end product realized right? Meaning, did you do the right thing? Or did you implement it correctly? So verification is often done during development, so you verify components, subsystems, you check if the requirements are met. Typically, verification is done in a laboratory environment, or on a test stand, or some environment that allows you to very carefully control the test conditions. And verification tends to be component and subsystem centric.

OK, and then validation is the question was the right end product realized? Did you actually build the right thing? Did you deploy the right solution? This is often done-- so validation focuses more on during or after system integration. It's typically done in a real or simulated mission environment. You check if your stakeholder intent is met. And it's often done using the full-up system. It's difficult to do validation on a subsystem basis alone. Typically, validation implies you've got to use the whole system to do it. Or, you basically use dummy subsystems. You basically replace the actual subsystems you're going to have with something temporary so that you can go back to the stakeholder and give them as close to the real experience as they'll have with the actual system.

OK, so that's essentially the distinction here. So I want to do a quick concept question on this to see whether this point, this distinction, came across. So here's a link, SE9VV, these are all caps. And what I'm listing here is different test activities or different type of activities. And I'd like you to check the box here whether you think this is verification, whether this is validation, or you're not sure.

All right, testing and handling of a new car in snow conditions in Alaska. 90% of you said this is validation, and I would agree with this. So many car companies, I think all car companies, once the vehicle has been finished essentially, the design, it doesn't go to market right away. There's a very extensive-- usually it's at least six months of field testing of a new vehicle. And you go to the desert where it's very sandy and hot, test your air conditioning systems right at the limit. And then you go to really cold climates. In Europe they go up to Sweden and Norway. And here we tend to go up to Michigan, Minnesota. And so the idea is that you really utilize the vehicle in a extreme environment, but that's realistic of actual operation. So I agree with this.

Frontal crash test in the lab. Most of you said it's verification, and I would agree with. So those very standardized crash tests that you've seen in some of the commercials. The vehicle is prepared, instrumented, you have the crash test dummies. And then you have, typically, there's at least three different kinds of crash tests. There's frontal, there's side impact-- like the T crash-- and then there's a rollover tests, that are more and more standardized. And the test conditions in these are there they're highly stylized tests. They're very prescribed exactly the speeds, the angles, everything is prescribed. And real accidents, the variability of conditions is much, much bigger than in these tests. So I agree because the test conditions are so tightly defined and constrained, this is verification, not validation.

Testing of a new toy in a kindergarten. OK, so here you're-- and this is the toy companies, and they essentially-- before, again, they make a big million or billion dollar decision to mass manufacture toys, they will actually have kids play with them in realistic environments. And so I agree with that this is validation. Vehicle emission testing. Obviously, this was the big Volkswagen scandal that we had. So basically, what this cheating that happened is essentially software that was embedded in the vehicle, such that when the vehicle experienced exactly the test conditions of these drive cycles that are very well known, very well defined, the vehicle would internally switch or reconfigure to a verification mode and really emphasize low emissions at the expense of fuel economy. And as soon as the vehicle would detect that it's in a more general driving conditions, it would essentially switch that mode off.

So verification, again, is on a dynamo in the lab. Satellite vibration testing on a shake table. We'll talk about this. This is often-- we refer to this as shake and bake in the spacecraft business. The spectra, the load spectra, that are put into these shaking tables are, again, very stylized and different for each launch vehicle. So here we can debate a little bit whether-- is it closer to validation, because the actual test conditions are so much adapted to each launch vehicle. But I agree, this is primarily a verification activity.

And then the field testing of the Google glasses. So you basically produce an initial batch of your product and then you give it to like lead users. You have them try it and give you feedback. This is much closer to validation. So I think by and large your answers here are very good. And the real distinguishing factor is whether this activity happens in a lab, in a very controlled environment, under stylized conditions, or whether you're actually going out in the field, in a realistic mission environment with real users or real potential users that are not especially knowledgeable or not specially trained about the system. So, very good job. I think most of you really understand that distinction.

OK. So let me talk briefly about the-- yes, please, Veronica?

AUDIENCE: Got it today. OK, are there any has that really bridged the gap that really could be seen as both. So I'm thinking about products in particular that are to be used in a lab setting where you have a very specific kind of user, where meeting the requirements is more about how the tool is employed, and I see the user in that sense is part of the system. So I'm wondering if in a more clinical sense there's an action that is both validation and verification.

OLIVIER DE WECK: Yeah, and so you said clinical. So I think there are situations where the distinction is not as sharp, not as clear. You said clinical, so I think in hospital, you know for medical equipment, like if you think about surgical equipment and things like this, where it's very hard to really-- it's very hard to do verification in a stylized way. The only way to really check it is to have the equipment embedded and used in a pilot study, for example, in a hospital. And in that case, because the human is so involved, and it's not a general consumer product, but it's really a tool for specialists, the only way to really check your requirements is to actually embed it in a realistic environment to begin with.

So whenever it's very difficult to design, very specific, isolated tests where you can check for each of these requirements one by one, you almost have to move straight into validation. And I think in medical equipment that's often the case. Yeah, go ahead.

AUDIENCE: Would you say that for a spacecraft really true validation isn't possible? You have to recreate the conditions in some way, in some kind of a laboratory.

OLIVIER DE WECK: I think it depends on the novelty of the spacecraft. If you're launching something like a standard communications satellite where you've launched dozens before, and you know the actual pitfalls and the operating conditions, you've experienced failure modes in the past and eradicated most of them, I do think that you can do a lot in verification. But then I'll show you one example of a spacecraft we've actually talked about before, where there's going to be a lot of residual risk. And the first time it's deployed, people are going to sweat, because there's still a lot of unknowns to be resolved.

OK, so let's look at the product verification process in particular. This is from the-- so this is the product verification process from the NASA System Engineering Handbook. So what are the inputs? The end product to be verified. So you have to have the artifact that you're going to verify. The specified requirements baseline, you need this as a reference. What are you going to check against? The product verification plan which is essentially your test plan. What test cases are you going to run? How long are you going to run them? How many repetitions will you do?

And then product verification enabling products, which would be test software, test equipment, things that are not part of the product itself, but our enabling of the verification process. You then, essentially, go through this process, and what are the outputs? The verified end product, the product verification results, so these would be test protocols, things like this, product verification report, and then product verification, any other work products that come out of it. So you could have, for example, discrepancy reports. You failed some tests. Well, that would be an important output. And then the question is, is this significant enough that you have to go redesign or retest? Or is it a minor issue that you can waive essentially to move to the next stage?

So let me just give you a quick example here from my own experience about this verified end product. One of the things on the Swiss F18 program that we did is not just by airplanes and equipment, but also models of the plane itself. In particular, finite element models, very detailed finite element models of the structure. And these models were very expensive. Like, some of these models were millions of dollars. And so I got a phone call, I was a liaison engineer at the time in St. Louis. I got a phone call from Switzerland saying, this is crazy. How can we be charged millions of dollars for this particular set of models?

And I said, yeah, that seems pretty expensive, so I'm going to go negotiate this. And so I started negotiating, and I guess either I'm a bad negotiator or it was really, really clear why these were so expensive. The reason these models were so expensive because they were on the right side not on the left side. So every one of these models that we were purchasing had been verified using actual physical tests. So every location was guaranteed under the load conditions to produce a stress and strain prediction at that location that was guaranteed to be correct within plus or minus 5%.

So the model had been very carefully calibrated and tuned against physical reality as opposed to a finite element model that's just anybody can make a model and put some load cases and boundary conditions on. And you don't know how closely does this mean. So there's a huge difference in value between a product, a model that has gone through verification where at the end of it there's actually a report, there's a protocol, there are data that says all these features, all these requirements that you had against it have actually been checked. This is a certified product. And that's the main reason for the price, because the actual process of verification is very, very resource intensive.

So even though when you look at it physically, you might say, I can't tell the difference between pre-verification and post-verification because physically it's the same. But in actuality, there's a huge difference because once it's been verified and certified against a set of requirements, it's a much more valuable asset. Does that make sense? So keep that in mind when you think about these products on the right side.

Now what are the types of verification? So tests we'll talk about, so you're physically testing. But there's other ways to do it through analysis, through demonstration, and through inspection. So analysis essentially means you're doing a calculation with a mathematical calculation or a simulation that satisfies you that this requirement is met. And you're doing this with the input parameters into the simulation are as accurate as possible based on the physical reality of the system you have. But for whatever reason, either because you don't have the funds for it, or you can't simulate the operating conditions well enough, you have to do it through analysis.

Demonstration essentially means you're operating the system, you're demonstrating the functions that you need, but you don't necessarily have a lot of instrumentation on the system. And you don't certainly do destructive testing. In other words, a demonstration simply means you're operating the system as intended and demonstrating physically that it performs its purpose. Inspection essentially means you are physically inspecting the artifact either visually inspecting-- there's also a lot of techniques called NDI, nondestructive inspection through with X-rays or eddy current sensors. You're checking for the lack of manufacturing flaws or [INAUDIBLE], whatever it is. But inspection essentially is you're not physically operating the system, but you're inspecting the artifact to make sure that it satisfies a certain set of requirements.

And then testing typically means that you're putting a stimulus into the system. You're operating the system under some test conditions. You're recording data, which you then analyze in terms of comparing that to your prediction or expected behaviors. So these are analysis, demonstration, inspection, and tests. They are all different ways of verification. Yeah?

AUDIENCE: How do I know when I'm supposed to use more than one type at the same time? I mean in and or or.

OLIVIER DE WECK: Yeah, that's a good point. There's no real general rule of this, but in general, I would say the more crucial, the more critical a particular requirement is to the operation of the system, the more intense the verification will be. Whether that's just using one of these types, you know you just run more tests or more different tests, or you doing a combination of inspection and testing. There's no there's no general rule in terms of two out of three, or two out of four, but the purpose of the V&V plan, the verification and validation plan is-- and you did a little bit of this in A2. You did a little bit of thinking into how would we actually verify this requirement.

The purpose of a V&V plan is to say for each requirement which of these four methods are we going to use for verification, and then actually write down each test that you're going to perform, what kind of equipment you'll use, what kind of test conditions. It's a lot of work. In fact, I think it's fair to say that the people that do this kind of work, verification and validation, are typically different people than the people that do the writing of the requirements or that do the actual design work. This is a pretty specialized activity and the people are a little different.

If you've met people who would do testing or quality inspection, they're quite different. It's a different mind set up. Go ahead.

AUDIENCE: According to what's happening in ESA, actually this ADIT will be imposed in the specification prior to your proposal. And it will give you to a minimum requirements to just test against. And they will give you a rough matrix for every requirement line, whether it's [? ABIT, ?] and then you have to answer with a validation intense plan, usually, unless your agency and you are defining the specification and you have to do it, and it's mostly based on experience. And it's some people that have really lots of knowledge that then make these specifications. But I think for all of you engineers here in the next 10 years, you'll be just hoping that not so many of these ADITS in the specs you will get, because you will have to answer as part of the specification actually.

OLIVIER DE WECK: Yeah, and what--

AUDIENCE: I'll just demonstrate it.

OLIVIER DE WECK: Well, and the point you're making, Voelcker, is that this is a contractual requirement. This is not optional.

AUDIENCE: [INAUDIBLE]

OLIVIER DE WECK: Yeah.

AUDIENCE: It's not optional and it has to be followed, the pricing, right in front of [INAUDIBLE]

OLIVIER DE WECK: Yes, very good points. So the outputs of all of this are discrepancy reports, if there's any discrepancy reports, waivers, the verified product itself, and then the compliance documentation, which is essentially your test protocols, et cetera, et cetera. And as you can imagine now if you had 47 requirements with can set, and then by the end-- [? Uonna ?], what would you say the number of requirements that we ended up with at the end in A2 were closer to 100, right? Most people were around 80, 90. OK, now imagine-- the good thing is if you can do tests that actually check multiple requirements at once, that's a good thing.

If you can do tests that help you verify multiple requirements through the same tests, you can save some money. But the whole testing strategy, the contractual requirements that Volcker was mentioning, it's a big, big, big deal. It's a really critical part of system engineering. OK. So in terms of the lifecycle phases where this fits in, most of this activity happens during phase D.

So you remember this was the NASA lifecycle model, and so phase D is system assembly, integration, test, and launch. So much of the testing that we talked about happens during phase D. So the system has been fully designed, it's been assembled, it's been integrated the way we talked about last week, and now you're really putting this system through its paces. And so this phase D is intense, it's expensive. And if something goes wrong, it sends you back to the drawing board often. And you have to do-- you have to figure out whether a failed test, a failed verification, is a showstopper. If it is, then you have to redesign the system, you have to retest.

But if it's a minor thing, then you might be able to request a waiver and you can say, OK, we didn't achieve this requirement, or we failed this test, but we think it's a minor issue. And instead of holding up the program, we're going to get a waiver for it, which means you get an exemption essentially, and you can move on. And whether or not a waiver can or cannot be granted is a big deal and that goes under risk management, which we'll talk about in a few minutes. Yes?

AUDIENCE: I had a question on-- so if you're a system integrator, and you created statements of work for other people to procure a large optic, or something, they go and build that [INAUDIBLE] requirements, and then they do all the verification and testing, and then they provide all that documentation. And then like from my experience with anything like procurement out stuff, that comes back in your in house, and you go to assemble it, and you essentially do a lot of that verification and testing again to double check that supplier. There is a little bit of a conflict of interest, obviously, with that supplier doing the work and also verifying their own work. Is there any good way to get around that? So it seems like a very expensive process.

OLIVIER DE WECK: So my experience-- basically, you're talking about separation of powers. The good suppliers, they will have internally separation of powers. In other words, the people that are doing the testing and the Q&A, they're usually people who really enjoy finding mistakes and faults. And that's why when I'm saying-- I'm trying to be diplomatic when I said it, but people even in software, people who do software verification, they love to find bugs. They love to find problems, because that's their job. And so the good suppliers, it's in their own self-interest not to do shortcuts.

Now if you don't trust that, and you do all your same testing and Q&A again, that's a duplication of effort. The way I've seen that done effectively is that you, as a customer, say you're going to buy the subsystem or the engine for, example, from it. What you do is you send liasion people, you send representatives, who are knowledgeable people to the supplier while the testing is being done. And so they're present when the testing has been done. They're very involved with it. And therefore, you don't have to do it twice. So there are ways around this.

OK, so what I did here is just search for the word test and the list of milestones. And where does it come up? So the first time it really comes up in a major fashion is at the CDR. OK, so let me just read this to you. So the CDR demonstrates the maturity of the design and is appropriate to support proceeding with full scale fabrication, assembly, integration, and tests. So in other words, even at the CDR you're blessing the final design, you should say something at the CDR about how the testing will be done. In fact, test planning often is way before the CDR. But at the CDR, you should you should really talk about the testing.

Then we have so-called TRR, which is a test readiness review. And so for each major test, you would have a separate TRR. The TRR ensures that the test article, hardware software, the facilities, the support personnel, the test procedures are ready for testing data acquisition reduction, meaning data post-processing and control. And then at the system acceptance review, the SAR, that's when you essentially transfer the ownership of the asset. And that's at the SAR, at the system acceptance review, that you're going to review, not just the product and its documentation, but all the test data, the analyses that support verification.

So at CDR, you say this is the testing we'll do. At the test review itself, you say everything is ready for the tests to happen, and then you do them. And at the system requirements review you look backwards and you say, what tests actually happen? What's the documentation? What were the results? Are we ready to own the asset now? Does that does that make sense?

OK, so with that in mind, let's talk about testing itself. What kind of testing there are, and so testing is one of the four methods of verification. And it's the one that we often spend the most money on. So this is from the handbook, section 5.3. This is basically an alphabetic list of the testing that we typically do. And I'll just highlight a few here, and then I have a group exercise for you. So aerodynamic testing, burn-in testing, which is often done with electronics. Make sure that you use you burn-in your electronics, you get them running at the right conditions, drop testing, pressure testing, pressure limits, thermal testing, G-loading, human factors testing, thermal testing, manufacturing random defects-- that's when you do nondestructive inspection-- thermal cycling, vibration testing, and so forth. So this is 20 or 30 types of testing. And then within each there's even subtypes, so there's a lot of different-- and there's a whole industry actually that is primarily focused on providing test equipment, sensors, data logging equipment. It's a big industry not just in aerospace, but throughout.

OK, so I'd like to do a little turn to your partner exercise. And the question is I want to ask you what kind of testing have you been involved in the past. And if this was like in product design, product development, that's fine. If it was for an internship, but even at the University itself, if you did some experimental work and experimental testing as part of research, that's fine too. You can talk about that too. So what kind of testing have you been involved in the past? What was the purpose of the testing? What were the challenges? What went well? What were the results? Maybe if it didn't go well, talk about that too.

All right, good. So let's see, we're going to go back and forth. So who wants to start here at MIT? Who has a good story to tell? Go ahead.

AUDIENCE: So I worked on the ground station side of the Lunar Laser Communication Demonstration program that recently flew. So I was involved in assembling an integration, but also doing verification testing in the lab and at our field site, but then we did validation when we be moved out to the field site in New Mexico. So I was involved in the whole process and it was neat to see. And we had to go into the clean room a few times to adjust the optics, because we saw that there weren't meeting requirements.

OLIVIER DE WECK: The reflector that was left by the astronauts, are you using the reflector that was left on the surface?

AUDIENCE: No, this was-- we were using just like [INAUDIBLE] like optical alignment stuff in the lab. And then when we were out at the field site, we were utilizing guidestars to align optics.

OLIVIER DE WECK: So what was-- what went well? Was there a big difference between indoor and outdoor? What surprised you in these tests?

AUDIENCE: Yeah, so once you can do the alignment of the individual telescopes which were 20 inches in diameter, you can do that well in the laboratory, but then every time you assemble and disassemble the system, you change the alignment of them relative to each other. So there was a lot of attention paid to making sure that we could replicate the alignment to a certain extent. So that was very difficult in a laboratory setting to get done, but once we did that, we were able to have fair confidence that when we were out in the field that we could match that.

OLIVIER DE WECK: Very good, very good. What about EPFL?

AUDIENCE: Well, I did an internship in an aluminum-roll product factory. So basically, I was doing natural science there. And there was a whole bunch of tests to do. And well, all the tests was about heat treatments and different tempering.

And actually, the alloy that was already produced in the factory was not the best of what we can have of it. And it [? applied ?] to change the [INAUDIBLE] with the heat treatment for a few seconds, naturally, on the line of production. Adding this amount of time was totally critical because it was continuous.

And the rolled aluminum, if it spends a bit more time in the oven, it would melt. And that's a bit like for the Swiss plane, actually. Like I discussed with my boss, saying that we should maybe change the original power meter. But at the end, it was really critical to change something on the line because it could have cost a lot, like in the modification of the oven or the general machine.

PROFESSOR: So were those tests successful? Were these heat-treatment changes eventually implemented on the line? Or did the tests reveal that it would be too difficult?

AUDIENCE: Unfortunately, I don't know because I finished my internship before.

PROFESSOR: OK. Well, you should find out whether it worked out in the end. Very good. Back to MIT, any other examples people want to mention? Test experiences? Yes please, go ahead.

AUDIENCE: We bought the CASA-295. It's a small cargo aircraft. And we get we got involved in the development of its simulator.

It was pretty different, the simulator. Because as the aircraft has no fly-by wiring, it is pretty light. So lots of hydraulics to de-motion.

And they brought the flight model from the factory. And we applied the flight model to the simulator. But it was not real enough. So we had to go for flying, like 60 test flying points. And we have to go back to the simulator to apply these points to tailor the simulator to meet reality.

PROFESSOR: I see. So the purpose of this testing-- because the plane itself had already been certified, it sounds like.

AUDIENCE: Yes.

PROFESSOR: It's Spanish, right? Spanish airplane?

AUDIENCE: Yes.

PROFESSOR: You tested it specifically to get flight dynamics and other data to then tune the simulator to be more reflective of reality.

AUDIENCE: Exactly. Because the flight model from the factory was not close to reality at all,

PROFESSOR: Very, very cool, very interesting. So different purpose, of not testing for certification of the first airplane, because it had already been certified, but to get the simulator to be matching more closely.

AUDIENCE: It was development for the simulator. Because it was sold afterwards as a type delta simulator. So it was the development of the simulator.

PROFESSOR: OK. Great, thank you for that example. This all sounds pretty good. Anybody involved in test failures? You know, things that didn't go well? Yes, [? Narik? ?]

AUDIENCE: Well, it was an interesting experience. We were designing a wind turbine that we were 3-D printing in undergrad. And we had certain requirements on the wind turbine. And we were supposed to test in the wind tunnel afterwards.

What happened was that the wind turbine matched our performance prediction fairly closely. But the generator and the electrical power system that the test operators consisted of wasn't designed to handle the current that we were outputting. So we caused a small fire.

PROFESSOR: OK.

[LAUGHS]

So this was the test equipment itself?

AUDIENCE: The test equipment.

PROFESSOR: Not the artifact you were testing failed--

AUDIENCE: Yeah.

PROFESSOR: --but the test equipment around it, because overload.

AUDIENCE: The interesting point was that we had no control over the test equipment. It was managed by the university. So within the requirements that they gave us, the power output possible didn't match what they had.

PROFESSOR: OK, great. Great example. So the test equipment and the test artifact need to be matched to the test conditions. Excellent, good.

I do hope that you those of you that have not had a lot of test experience, that you get to experience it. It's a lot of work, slow, tedious. But in many cases, despite modeling and simulation, there's still a big role to play for actual testing.

OK, so let's talk about aircraft testing. Typically we distinguish between ground testing and flight testing. Weights and balance, I had some experience with this on the F-18 program. You think this is the most trivial testing there could possibly be, you just put an airplane on a scale and that's it.

Well, it turns out it's actually more involved than you think. First of all, airplanes are very big. They're heavy, multi-tons. And typically it's not just one scale. You have several scales you put on the landing gear.

So the scales need to be properly calibrated. If you have differences in calibration of the different scales, you have an issue. You need to determine the mass. Not just the mass, but the CG.

And then the most difficult thing to experimentally determine, at least in a 1G field, is the inertia matrix, if you need to experimentally get the inertia matrix. Do you remember your Ixx, Ixy, Iyy? The inertia matrix is tricky because you typically then have to suspend the airplane.

And just the presence of the cables and the suspension will pollute the real inertia matrix. And you have to subtract out the effect of the suspension system. So something that seems super trivial, weights and balances-- you just stand on the scale in the morning, there it is-- is actually very tricky. And there are people, that's all they do. They do weights and balance testing for spacecraft, aircraft. And it's basically a science.

Engine testing, I'll show you some pictures. This is done in what's called the Hush House. So Hush House is heavily insulated. You run an engine through all of its test conditions, its operational conditions. And then you integrate it into the airplane and you run it outdoors.

Fatigue testing, this has been a big issue on the Swiss F-18. But in general, making sure that the airplane can satisfy all the static and dynamic structural load conditions. Avionics checkout, this is very, very involved.

As we get more and more displays, mission-control computers, all of the avionic suite needs to be checked out. Essentially every function, every button, every menu item needs to be tested. And the tricky thing is interactions among different pieces of avionics. So you can't just test each box in isolation.

You also have to look at the interactions of different pieces of avionics, the flight control software, or the flight software that is loaded in each of these avionics boxes needs to be in the right configuration. It's a very big combinatorial challenge to do avionics checkout these days.

And then finally, pre-flight testing. So this is everything you can do on the ground, run the engines, taxi with the airplanes, basically turn all the equipment on, turn it off, do the cycling. You could do a lot of testing before you actually fly.

Flight testing itself falls into different categories. So flight performance testing, rate of climb, range, can you meet each point in your prescribed performance envelope? Stability and control, this is where test pilots typically earn their living putting airplanes into stall conditions, recovering from stalls, trimming.

Flutter testing is a big deal. So flutter is a phenomenon whereby at high speeds you have a coupling between the structural deformations of the airplane and the actual excitation of, for example, the wings. Flutter can be very dangerous. If you hit a resonance at high speed you can actually destroy the airplane because of an instability.

So flutter testing is also very, very interesting and very tricky. And then finally, this is primarily for military airplanes, weapons testing, both guns, missiles, bombs, live fire testing-- sometimes also using airplanes that are towed-- simulated targets, and then LO stands for a Low Observability.

So this is essentially all the new generation of military airplanes have measures to reduce their radar signature, or even make them invisible or quasi-invisible to radar. And you know, a lot of this stuff is classified. But actually checking that an airplane is invisible on radar or has truly low observability, there's a lot of testing involved in that. And that's also quite expensive and very involved.

So let me show you just some pictures that I've collected over the years. This is a wind tunnel test model. This is a model that was developed as part of the F-18 program. This is about 1995, vintage.

This model, it's a subsonic wind tunnel model. And you can see in yellow, you have all these probes and radomes and things like this. So it's basically to check whether any modifications you make to the airplane will affect its performance in airflow.

This is for wind tunnel testing. This model, by the way, just building this model is about half a million dollars. It's very accurate. It's very precise. This is a picture-- yes? Go ahead?

AUDIENCE: Is that model full scale or half scale?

PROFESSOR: No it's, I want to say, like 1/8 scale, something like this. Yeah. OK, here's the Hush House that I was talking about.

This is in St. Louis. So you can see that the airplane is not painted yet. And only one engine at a time. So the engine is being, in this case, with full afterburner, you can see the airplane itself is secured with these chokes here. And there's load cells in these chokes.

So as you fire up the engine, you can measure the thrust by the load cells that are attached to these chokes here. You also see that there's these cables running in and out of the airplane. So all the sensors, everything, and the engine is put through its full different operating profiles. And a lot of sensor data is recorded to make sure that the engine responds appropriately, it has the right thrust for the right throttle setting, the fuel consumption, all the temperatures in the engine, that everything is nominal, essentially.

Live Fire Testing, this is a Maverick missile being fired, an air-to-ground missile. As you can imagine, there are special ranges and test sites for doing this kind of work. So in the US, one of the most well known as China Lake, out in California.

You have to reserve months and sometimes years ahead. So if you want to do like a live fire test campaign, you have to reserve the range at least 18 months to 2 years ahead of time. Because a lot of other services, a lot of other programs are using the same facilities.

In Europe, it's a little harder. Definitely in Switzerland, because the country is so small and dense and highly populated, you can't test live missiles in Switzerland. You can do guns, air to ground. But in order to do missile testing, typically that's done here in the US, or in a more limited fashion, in Scandinavia, like in Sweden, in northern Sweden, there are some test ranges up there.

This is the most expensive kind of testing you can do. So a single test like the one shown here, a single test like this will probably cost several million dollars. Not just the airplane and the weapon itself, but all the test procedures, the protocols, airplanes that observe it from all kinds of angles. It's very, very involved. And because it's so expensive, you will typically only do it for something new that you haven't done before. Either a new weapon, or a new weapon integrated on a new platform and so forth.

And obviously, it's very interesting. But it's very involved. Yes?

AUDIENCE: Are they just testing accuracy, Or that the two things work together? What are they looking for, really?

PROFESSOR: The first thing you look for is does the weapon fire? So do you have all the electronics? All the signals? The wire bundles? Did you get it right?

Is there an end-to-end functionality? That's number one. Number two, safety. Does the weapon separate properly from the aircraft? The worst thing that can happen to you is if you release the weapon and it collides with the airplane.

And so you can see, the various angles and release conditions are very tightly prescribed. And so there's a separation, has to be proper. And then the third, of course, is accuracy.

So within each of these tests, there are multiple sub-objectives that you would test for. But safety always comes first. OK, any questions? This was a little bit military-aviation heavy.

If you're testing, whether it's a CASA airplane or a new Airbus or Boeing commercial airplane, many, many months of testing. They actually fly the routes. You'll fly New York to Singapore, to London.

You would actually fly the real routes. You would record fuel consumption, a lot of parameters. Some of this testing is not very exciting. It's many, many, many, many hours in the air.

But the key is that you have a lot of instruments and sensors during these tests that you may not have during regular flight operations, to really make sure there's no surprises. The airplane flies at least as good as the requirements that you promised your customers. And even then, when you think about what happened to the Dreamliner, the 787 had a lot of battery problems. Because they used a lot of lithium ion batteries. There were overheating issues.

Some of these problems didn't show up in testing. They only showed up in early operations, once you had a fleet going. So it's not a guarantee because you're doing a lot of testing that you're going to catch all the problems. But you want to catch as many as you can. Yes?

AUDIENCE: So my question was about the risk posture for larger airliners, for Boeing, the Airbuses. So for military aircraft, there is an escape method for the pilot. But for these larger aircraft, how much analysis do they do before they decide to go ahead and put a person inside? They do fly by wire beforehand? Is that possible. for such large aircraft?

PROFESSOR: So that's where the ground testing, pre-flight testing becomes very important. So you basically taxi for many, many hours. All the flight control surfaces, all the engine, you have the Hush House testing. So you essentially try to do as much as you can on the ground before you do the maiden flight.

All right, let's move to spacecraft. And it's kind of a similar thing. You can distinguish the ground testing versus on-orbit testing.

So the ground testing is really not that different, weights and balance. The biggest thing is if your satellite is heavier than the launch capability of the launcher, you have a real problem. So the mass constraint is even tighter in spacecraft.

Then, a lot of testing on antenna and communications. This is typically done in anechoic chambers in the near field, and then later in the far field. Vibration testing, that's the shake part. Thermal and vacuum-chamber testing, that's the bake part.

And then you also have pre-launch testing, so off-pad and on-pad. Off-pad testing is the satellite or the spacecraft has already been shipped to the launch site, and it's hooked up. Like a patient in the hospital, it's hooked up to a lot of cables and power and cooling and so forth.

And then when it's on the pad, it's pretty limited. So on the pad means the satellite or the spacecraft is already integrated into the launch vehicle. It's on the launchpad. And then that amount of testing you can do is very limited.

So that's when we say off-pad, on-pad is, is the spacecraft already been integrated on the launcher or not? Once you launch to orbit, you got your eight minutes of terror. And hopefully the launch goes well and the spacecraft is released into its initial target orbit.

And then you do a lot of other tests, like thruster testing. Can you do station keeping? Can you turn on and off the thrusters? You deploy all of your mechanisms, your antennas, your scientific instruments, and then your communications, communication and instruments.

And we'll talk more next week, but this typically is called commissioning. You're commissioning a spacecraft before you actually turn it over to the users. And that commissioning phase could be anywhere from a few days to several weeks, or even a couple months.

And again, you don't want to randomly put commands into your spacecraft. These test sequences, deployment sequences, are very, very, very carefully worked out. Every command, the order in which you send the commands have been worked out ahead of time. They've been simulated. And all you want to see here is confirmation that the spacecraft behaves as planned.

Some pictures, so this is what typical spacecraft integration testing looks like. So this is in a cleanroom environment. You have people in bunny suits. And the idea is to not damage the spacecraft while you're doing the testing.

This is a picture of the Clementine spacecraft. This is a radio frequency anechoic chamber testing. So you see these funny cones here. These are essentially foam cones. And the idea is to prevent multipath to prevent echoes in the test chamber, to test all of the antennas, EMI, electromagnetic interference and compatibility, charging and discharging of the spacecraft.

This is one of the failure modes, is that you have high electrostatic charges that build up on a spacecraft, create a lot large voltage potential across the spacecraft. Some spacecraft have failed because of that. So all of these things you want to test in a very controlled environment.

James Webb Space Telescope, I'll send you a link. I'll send you a link through email. There's a simulation of this on-orbit deployment. It truly is amazing. This spacecraft will be launched in a box, essentially. And the deployment sequence is very carefully choreographed.

First, typically, you deploy your solar panels because you need power. Because you're only running on battery initially. So if your battery runs out before you've had a chance to deploy your solar panels and get fresh power into it, you're in big trouble.

So typically, solar panels first. Then communications. And then you start deploying the other subsystems. So for James Webb, also very tricky, is this-- this is called the Sunshield. It's essentially thin layers of insulation. And the geometry is very important. The primary mirror, the secondary mirror, all these things need to be deployed with very, very high precision.

And it's even to the point where this particular spacecraft is so lightweight that it cannot support its own weight in a 1G gravity field. So there is no way to test, end-to-end, the full deployment sequence on Earth. The first time it will happen is in orbit.

Now, they've tested sub sequences or scaled models. For example, the Sunshield has actually been deployed at a smaller scale in a 1G field, but never the full thing. So this will be kind of scary, after an $8 billion investment. So let's hope for the best, 2018.

So testing is good, testing, testing, testing. But testing also has its caveats. So caveat means limitations, essentially. So testing is critical, but it's very expensive.

Think about test rigs, test chambers, sensors, DAQ is Data Acquisition Equipment. All this stuff is very expensive. And if you can reuse things between different programs, that helps. But still, how much testing should you do of components?

So one of the comments, who mentioned the vendor, the supplier? One of you. You talked about it. And this is a key question. Do you trust the parts that come from your vendors? Or do you retest everything yourself?

Calibration of sensors and equipment, if you've done some testing and you forgot to calibrate your displacement sensors, your thrust sensors, you didn't calibrate them or they're out of calibration, that's a big problem. That's a big problem. So before you start your tests, make sure that all your sensors are properly calibrated, or you can get the wrong conclusions.

This is a mantra that's well-known. Test as you fly, fly as you test. Fundamentally, this means that the configuration of your item-- a spacecraft, aircraft, medical device-- the one that you test should be the same configuration as what you're actually going to fly.

And it's often failures occur when the test went well, but then somebody tinkered with it and modified it before it actually flew. And that change actually caused a big problem. So make sure that your test conditions reflect the actual operations as closely as possible.

Simulated tests, what do we mean by this? So simulated tests use dummy components. Maybe your full spacecraft or aircraft isn't ready yet, you don't have all the pieces. So you can still start testing, but you have to replace the missing pieces with dummy components. At least, they should reflect the right mass distribution. But maybe you can do more.

Simulated operations, so the 0G versus 1G, is it representative? And then, what's often true is that you pass all your tests and then you still have failures in practice. And the failures often happen outside of the test scenarios that you had tested. So you have to be ready for that. But try to avoid that.

So here's from Appendix E. This is called a Validation Requirements Matrix. Essentially what this is, is an organized way to organize your V&V activities, in terms of what's the activity?

What's the objective? Which facility or lab will you do it in? What phase? Who's in charge? And what are the expected results?

It's pretty straightforward. It's just a table to organize these activities. And then appendix I is your more formal V&V plan. This is a suggested outline for it.

And I'll just say this. The degree to which you take Verification and Validation seriously and the resource you make available for it are critical for success. So how many dedicated Q&A personnel?

What is the interaction in working with suppliers? Are you planning ahead for these tests? How close are you getting to actual end-to-end functional testing?

Can you piggyback on existing facilities and equipment? How well do you document all the outcomes and follow up with discrepancies? And my last comment here is this work is often not glamorous, except for some of the very cool flight testing that I showed you.

Most of this work is really hard work. It's very detail-oriented. It's not glamorous. But it's essential. If you cut corners, you often pay the price for it.

So any comments or questions? We'll take a short break, like a five-minute break. But any questions about testing, verification, validation? Yes, go ahead.

AUDIENCE: Not really a question, but I wanted to say that also, flight testing is not so fancy. I mean, many people think it is. Well, if they actually fly, maybe it's fun. But there are not that many. And you need to prepare them weeks and weeks ahead. And it's actually very, very boring. Because you need to make sure that you don't waste time at all. Well, at least you can waste a little bit more time in the lab. So it's very stressing and not so fun to fly.

PROFESSOR: Do have experience with this?

AUDIENCE: I wasn't flying, because you need to be certified. But I was preparing. And I think it's even worse than testing in the lab.

PROFESSOR: Yeah, yeah. Which airplane, or which system were you involved with?

AUDIENCE: I was in with the power plane system for [? Airbus, ?] particularly. And the aircraft was an A330.

PROFESSOR: A330, OK. Great. But I think it's healthy to have this experience. It really makes you humble.

And you also see, for the things in design, did you design well? Did you design for testability? Really, I highly recommend for every one of you to try to get on some kind of test campaign, at least once in your career, because it's eye opening. So thank you for that comment.

Any comments at EPFL? Any of the students? [? Voelker? ?] Did you want to add something? Katya? Go ahead.

AUDIENCE: Yeah, I guess the comment is, it seems like sometimes you can meet all of the requirements in the verification process. But when you get to the validation part, for example, maybe the customer has some expectation that the range for the time of flight would have been on the maximum edge and you were on the minimum edge. But actually, it seems like sometimes you can meet all the requirements but they're still not going to be happy.

When do you find that middle ground? And it's going to be constant, to continue iterating this over and over again. Do you try to involve them earlier on, during the testing process too? How do you handle that, that difference?

PROFESSOR: Yeah, it's tricky, you know? So that's when you do need contractual agreements in place. You need to have the requirements, baseline. You need to have the contractual agreements. And hopefully, any problems that occur will not lead to some kind of legal dispute. But sometimes that's unavoidable.

But as a designer, as a manufacturer, unless you have agreements in place and clear baseline, how do you decide in the end, is it successful or is it not successful? And if there's problems, try to isolate these problems and say, OK here, by and large, the testing went well. But we have like, three, four, five issues that need to be addressed. And you can tackle these issues one by one.

But if you don't have a contract in place that's really a good contract, if you don't have a clear requirements baseline, and then if you don't have a good relationship with your customer, you're setting yourself up for big, big problems. [? Voelker? ?] Go ahead.

AUDIENCE: Yeah. There is also the one big difference between commercial operators or commercial customers that are becoming more and more frequent compared to the institutional ones. And often I remember, for some of the [? Global ?] [? Store ?] [? Iridium ?] series, the customer was only accepting the hardware six months after they had been commissioned in orbit.

So there you have a validation that is still your responsibility in orbit. You can't go and fix it, but it still has to work. And he was retaining up to 10% of the full contact value, even up to the N minus-2-tier-level suppliers, until he was satisfied it was working in orbit. So these considerations, it's really the proof of the pudding when you have to test this up there and can't fix it. So there's no recall of a satellite constellation, like, sorry we messed up with the software. It's not possible there.

PROFESSOR: Yeah. And of course, those terms and conditions you've probably negotiated years before. So you've got to be careful. That's where risk management, which is actually-- thank you, [? Voelker, ?] for that comment. Let's take a short break. And then we'll talk about risk management, which is really what this ends up being.

So let me talk about risk management. And this is actually quite prominent in the System Engineering Handbook. This is right in the middle here of your System Engineering Engine, Technical Risk Management, Section 13.

Why is it important? So first of all, what is risk? So risk is the probability that a program or project will experience some undesired effect or event. And then the consequences or impact or severity of that undesired event should occur.

And so think of risk as the product of probability times impact. And the undesired events could come from a number of things, technical, programmatic. So cost overruns, schedules slippage, safety mishaps, health problems, malicious activities-- cybersecurity is a big thing these days-- environmental impact, failure to achieve the scientific or technological objectives or success criteria.

And so technical risk management is, therefore, an organized systematic risk-informed activity centered around decision making to proactively identify, analyze, plan, track, control, communicate risks to increase the likelihood of success of a program. And so what risk really does is measure the future uncertainties of achieving your program goals-- technical, cost, schedule goals-- and think of risks in a holistic way, all aspects of the technical effort, technology maturity, supplier capabilities, performing against plan, and so forth.

And so the idea of risks is that risks have some root cause. There's something that gives rise to risks. And then the actual quantification of risks happens in terms of likelihood and consequences, which are kept separate, separate dimensions.

So the first thing to think about is where do risks come from? Where is the source of risks? And I want to show you a couple of models for thinking about this.

The first one is this idea of layers of risk, that there are layers of risk. And I want to credit one of my colleagues here at MIT, Don Lessard from the Sloan School, who really developed this Layer of Risk Model and applied it to different industries. So there's a version of this for the oil and gas industry. You could make a version for medical, medical technologies. So this is the version for Mars missions, so if you're designing a new Mars mission, a new Mars Rover.

So you have, in the bullseye here, the narrow interpretation is technical or project risk. So the airbag technology. If you're using airbags for deployment, will it work? The rover/motor performance, are you going to have software bugs? Those are the risks we typically think about.

And the idea is you have high influence over these risks as a system engineer, as a project manager. Then you have a layer around it, which we call industry or competitive risks. Will your contractors perform? Will you have budget stability?

And then there's sort of more country and fiscal risk. So in the US, we have a budget cycle. We have four-year administrations. Will you get your budget? What is the priorities between human and robotic space exploration? And then working with international partners.

And then there's another layer of risk, which are called market risks. So if you think in Mars missions, who's your market? Well, the science community and maybe the public. So will these missions hold their attention?

Are there new science requirements? We discovered there's water, probably flowing water on Mars, maybe with a lot of perchlorates in it. It's not pristine water. But that could change the priorities for your mission.

And then finally, the most outer is what we call natural risk. So this would be things like cosmic radiation, micrometeorites, uncertainties in the atmospheric density of Mars as you're doing entry descent and landing. And you have very low influence.

That doesn't mean you can't protect yourself or take measures to deal with these risks. But fundamentally, the occurrence or the probability is something you can't really do much about. So that's one way to think about risks. And the seeds of risks is in these layers. I know this is very high level, but I find this to be a pretty useful model. Yeah, go ahead.

AUDIENCE: So this references the influence you have, not necessarily the amount that each of these are a risk to the program?

PROFESSOR: That's correct. And That will be program specific. Just the stuff that's in the bullseye here, you can do a lot about it, and perhaps both in terms of probability and impact. And then as you move further out, there's less and less influence you have as a system engineer, as a project manager.

Here's another way to organize your thinking around risks. And this is around the "Iron" Triangle and project management. We talk about the "Iron" Triangle of cost, schedule, and risk. And we call it "Iron" because the idea is that if you constrain all three too tightly, it can be very difficult.

And it's also referred to as the triple constraint in project management. So the three dimensions here are technical risks, cost risks, and schedule risk. And in the center we have programmatic risk, which means it's kind of the combination of all three.

And the idea that even if you do a great job on keeping your budget under control, schedule, and you're meeting your technical objectives, you can still fail because the program as a whole isn't doing the right thing. Or the market that you had been targeting is no longer really attractive by the time you launch. The key idea here is that these risk categories are not independent of each other.

So let me mention a couple of examples. So cost risk might limit your funds. And that could, in itself, induce technical problems which cause you further cost risk.

So one of the big initiatives at NASA in the '90s was the faster, better, cheaper program. We're going to launch more missions, cheaper. And out of 10 missions, maybe 2 or 3 will fail.

And then seven will succeed. But we'll get more value out of this as a portfolio. Unfortunately, it didn't work very well. Because when the one, two, or three missions fail out of your portfolio, the media and the public focuses on the failures rather than the aggregate value of the whole portfolio. And eventually, that's probably the main reason why faster, better, cheaper was abandoned.

So for example, we just talked about testing. If you have a very limited budget, what is the first thing people typically cut out? What's the first thing to go? Testing. Testing is very important.

Sam asked me during the break, what's your typical budget for testing in V&V activities? And in many programs, it's very substantial, you know, 40% of the budget, maybe 30%, 40% of the budget easily. And so you start cutting out tests.

Well, what you do is you introduce technical risk. And if you have failures because you didn't test, that could cause you additional rework and more cost. Similar, schedule slips can induce cost risk.

So as you slow down, you have what's known as the standing army cost, right? People are going to charge to your program, even if it's at a reduced level. And that will also increase your cost. So lots of coupling here between risk categories.

This is a very useful Risk Management Framework. It's essentially a controls framework. And the idea is you start in the upper right.

You anticipate what can go wrong in your program. So that's risk identification. You then analyze these risks, in terms of prioritizing them, which of these are important?

You plan to take action. This is often called risk mitigation. You track these actions. And then you correct any deviations from your plan and you communicate throughout, and you cycle through this. So typically, risk management will happen on a weekly basis, a monthly basis, at least quarterly basis for big programs.

Now, how do you actually do this? First of all, the risk ID and the assessment. So the risks are typically brainstormed.

So you think about risks. You have to imagine all the bad stuff that could happen to you and your program, the probability that these things will happen, and then the impact or consequence if they do happen based on the requirements, the cost, the schedule, the product and its environment.

And this is where actually having a mix of younger engineers and more experienced engineers really comes in handy. The experienced engineers, they will have been through several programs. They will have seen failures in the past.

They will really be able to point to potential risks that less-experienced people may ignore or just not understand how important they could be. So the next step, then, is to aggregate these into categories, typically not more than 20-or-so categories or risk items. Projects often keep so-called risk registers, which is just a database or a list of risks.

If you have hundreds and hundreds of risks in the risk register, it's too much. It's just a long list and really it's just a check-the-box exercise. To really take risk management seriously, you have to focus on few of the risks that you think are important.

You score them based on a combination of opinions and data. And you try to involve all the stakeholders in the risk management. And eventually, risks are placed on this matrix of uncertainty and consequence. So let me zoom in on the matrix.

There are many, many different versions of the risk matrix. This is one that NASA typically uses. And I like this particular version for several reasons that I'll explain. But basically, the way it works is you have the two dimensions, impact and probability.

So probability is how likely is this to occur? And then impact, if it does occur, what will happen? What's the consequence of that?

So of the two things that I really like about it, the first one is that each of these levels, there's actually some definition behind it. Not just guessing at the level, but there's some criteria. So for probability, a level 3 means it's about equally likely that it will happen and not happen. So a level 3 means it's about 50/50, whether this will happen in your program. And then 4 is very likely.

So maybe that's, I don't know, 75%. And then near-certainty is like 90% or more. Improbable is like 10%. Unlikely is 20% to 30%, something like this.

And then more importantly, the impact so a level 1 impact is negligible. It has almost no impact. A level 2 means your mission performance margins are reduced on the technical side.

Do you remember margins? I asked you to assign margins in assignment A2? So it means you're eating into your reserves. But you should still be able to meet all your performance. There should be no visible impact. Your safety cushion is less. That's what level 2 means.

Number 3 means your mission is degraded. So you can still do the mission. But you're not going to hit all your targets.

4 is you lose the mission, but the asset is still recoverable. So maybe you could try again in the future. And then level 5 is a catastrophic failure that involves a loss of mission and/or loss of crew.

On the cost side, you have some thresholds for cost. Obviously these numbers have to be adjusted for different programs. A $10-million loss is a huge thing in some program and almost like pocket change in other programs.

And then schedule, so a level 1 milestone would, for example, be launch. Good example of this was the Mars Science Laboratory, their Curiosity mission. Originally it was supposed to launch in 2009.

They missed that deadline, mainly due to problems with cryogenic actuators. They took a lot of the blames, the actuators. But there were a lot of problems across the board.

They missed that launch window. And they had to launch in 2011. So that was considered, from a programmatic standpoint, a level 5 failure. Because you missed your main launch window and you had to wait 26 months for the next one.

So that's good. Because now when you assign a probability and impact, you can really look at these criteria. And it's easier to do that in a repeatable fashion. The other thing is if you look at the colors on the matrix, you can see it goes from 1, blue, which means low risk, to 12, which is the highest risk.

So there's 12 risk levels. But when you look at the matrix, there's something peculiar about it. So look closely at the colors. And you'll see something special about this matrix. Anybody notice what I'm talking about?

Let's see at EPFL, do you guys, when you look at those colors, at the matrix, do you notice something? Go ahead.

AUDIENCE: It goes from blue to red, which is a light spectrum, with the blue the lowest radio waves, waves, and red the highest ones.

PROFESSOR: Right.

AUDIENCE: [INAUDIBLE].

PROFESSOR: Go ahead.

AUDIENCE: It's due to impact is more serious than probability.

PROFESSOR: Right. So it's asymmetric. You see that? It's asymmetric. So the high-impact, low-probability corner is weighted more heavily than the low-impact, high-probability.

And that's intentional because it's been shown in the past that things that are not likely to happen but if they happen, they're really bad, in the past, people have sort of pushed that away and ignored those. So the purpose of this asymmetry in this matrix is to elevate the low-probability, high-impact events to be higher in the risk level so that people pay more attention to it. OK? So most risk matrices don't have that asymmetry in them, but this one does.

[SNEEZES]

And I like it because of that. Bless you. So the question then is, what do you do with this?

The idea is you do your risk management. You identify your risks. You place them on this matrix. And then you track each of these risk items over time.

So here's your 12 risk levels, between 1 and 12. That's the y-axis. And then on your x-axis there's Time.

And for each of your risk items, people might disagree. Some people on your team might say, hey, look, this is not a big deal. We've seen this before.

The impact is not big. We have a quick fix for this. We know how to deal with this. And other people disagree and say no, this is very, very serious. You have to take it seriously.

So the idea is that for each risk item, you have this optimistic, expected, and pessimistic estimate of what is the true level of risk. That's what these bars are. And then you track it over time.

And depending on whether you're before PDR or CDR, you could have very substantial risks. And that's, I guess, OK still, as long as you find ways to reduce the level of risk. For example, by doing extra testing, by changing your design to put in extra power margins, bigger solar panels, redundancy, for example. There's a lot of things you can do to affect both the probability and the impact, radiation, extra shielding.

And so the idea is that over time you're going to reduce these risks gradually below some threshold. So this red line here is like the acceptable threshold of risks at that point in the program. And then as you get closer to launch, things should be below the threshold. And if it's below this watch domain, then you even don't track it. You don't pay much attention to it.

If it's above this red line, you have a big problem. You might have to stop the program or do a major redesign, or repeat a major milestone. And some programs have been canceled because they just couldn't get these risks under control. So the idea is that gradually you transition. And you do this by actually doing risk mitigation around that risk management cycle.

Now, the last thing I will say here is that every mission that is worthwhile doing is still going to have some residual risk at the end. The requirement is not that all the risks are at 0 in the lower-left corner. You will launch, and you're going to have residual risks. And you just have to accept those. But you have to be cognisant of this. And this is really no different in the automotive industry, for example.

When you're developing a new car or a medical device and you're going to launch it to the market, if your requirement is 0 risk, you will never sell anything. You will never launch anything. Because you will always think of something bad that could happen. And there will always be people saying it's too risky. We can't do it.

So knowing how much residual risk you should be willing to carry is a big part of being a leader, being a system engineer, really understanding things. And in the automotive industry, there are people whose primary job they have is to do this work.

And they're called Quality Engineers or Warranty Engineers. So the Warranty Engineers, their job is twofold. Before you launch a vehicle to market, it's actually ranking on this particular vehicle or program, what are the top 10 things that could cause warranty planes and problems in the future? We don't know that they will, but they might. Right?

And then once a vehicle goes to market and reports are coming back from users and from the fleet, actually tracking what these issues are and then knowing when has it hit a threshold where you do need to do a recall, you do need to do a retrofit, this is a big part of it. And it's really a big deal. I mean, the amount of money that automotive companies spend on recalls every year is about the same as what their profits are.

So if you could eliminate recalls and warranty claims altogether, you'd basically double your profit. And so depending on what industry you're in, whether it's automotive, medical, spacecraft, how much risk and safety is involved, this is more or less emphasized in the industry. But it's a big part, I think, of system engineering job, is to understand this.

This is, again, a flow diagram for how to do this risk management properly. You have a Risk Management Plan, your technical risk issues that are placed on the matrix, any measurements or data you have. And then how do you report this?

And then out of it comes a mitigation plan, a set of actions, technical risk reports, and then any work product from the technical risk management. And the idea is that you repeat this process on a regular basis. And it's a big part of your milestone reviews as well.

OK, so I'd like to spend a few minutes on system safety. I am not going to do this justice because there's a whole class here at MIT on this, taught by Professor Nancy Leveson. By the way, who's taken that class? Or who's been thinking about taking it? About three, four of you.

So I'm just going to give you a very quick exposure to this. So this is a book that Professor Leveson wrote several years ago. And she basically distinguishes two kinds of failures. Component failures, which most people think about, an axle broke or there was a battery caught fire. And clearly, component failures are real and they happen, single or multiple component failures. And usually there's some randomness to them.

So most of the classic accident investigation techniques and safety techniques focus on component failures. But there's also component interaction accidents or failures which are trickier, in a sense, because you could have a system that has no single component that's failed. And yet, you had a system failure.

And so it's the interactions among components. And this could be related to interactive complexity in coupling, more and more computers and software, and then the role of humans and systems. And this is really what a lot of this is about.

So the traditional safety thinking is that component failures, you need to worry about the component failures only. So here's a classic example of a sequence of events. This is for a tank failure.

So in this case, we have a tank. And there's moisture that builds up in the tank. And then corrosion, essentially, as a result.

The metal gets weakened. And under the operating pressure of the tank, the tank itself has been weakened due to corrosion. The operating pressure causes a tank rupture.

And the tank rupture then causes, essentially, an explosion. And fragments or shrapnel from the tank will be projected and then cause equipment damage or personnel injury. And so in this linear chain-of-events model, the way you think about safety is putting in barriers.

This is often also referred as the Swiss cheese model. if you take layers of Swiss cheese-- and I guess it has to be Emmentaler, right? You guys know the Emmentaler at EPFL? Emmentaler is the one, it's got the big holes.

AUDIENCE: Yes.

PROFESSOR: So you take these slices of cheese. And if you can look at the cheese, and there's actually a hole right through, then the accident can happen. But if you put another barrier in between, you can't see through the cheese and the accident is prevented.

That's the classical thinking around system safety is chain of events. And then put barriers between these. And this is, I think, valid for a very particular kind of accidents, which are these component accidents.

What Professor Leveson says in her STAMP and STPA Framework is a little different. This is based on essentially thinking about safety as a lack of control of a system, lack of control-ability of a system. And so if you think about it this way, I'm showing you here a control loop.

In this case, you have the actual system, the actual process that you're executing. The controlled process is here. You're sensing things about that process, so temperature, pressure, proper alignment.

And so one problem could be your sensors are inadequate. You're sensing the wrong information. And then here's your controller model, how should you control the system?

So you have the wrong controller or the wrong process model. And then here's your actuators. Are you issuing commands at the right time? Are you issuing the right commands? Or are you not issuing commands when you should be to the system? And that feeds back into the control process itself.

And so process inputs could be wrong or missing. You could have disturbances into the process that are unidentified or out of range. And then eventually, if this process goes unstable, then you have a failure or an accident.

So it's quite different. It's essentially thinking of this as a control problem instead of a chain-of-events problem. And the argument here is that for safety or failures that involve a combination of hardware, software and humans, often this model is able to be more complete, in terms of identifying hazards and potential mitigation actions.

So I think we're out of time. But I want I want to give this to you as a homework for thinking about this. This is an accident that happened earlier this year, I guess, in July. And this is the Virgin Galactic crash that happened.

Virgin Galactic is one of the space tourism companies. And during a test flight, the airplane crashed because the copilot unlocked the brake system. So it has a kind of feathering mechanism. And the pilot unlocked it too early during a high-speed flight phase.

So what I'd like you to do, the link is here. Just read the story. There's a more lengthy accident report that's come out. I'd like you to just read this quickly and then think about how does this relate to risks? How does it relate to this particular model of system safety?

So the System's Theoretic View of Safety is then that safety is an emergent system property. Accidents arise from interactions among system components-- physical, human, social-- constraint violation. Losses are the result of complex processes, not simple chain of events.

And that most accidents arise from-- you could have a system that's quite safe when you start operating it. But over time it migrates to an unsafe state. Because sensors fail. People start bypassing safety procedures. And gradually, it migrates to high risk.

OK. So the last thing I want to talk about-- just for a minute or two-- is the FRR, the Flight Readiness Review, which is one of the later milestones. And what happens at the flight FRR? Essentially, this is your last chance to raise a red flag.

This is the last milestone before launch. Have all the V&V activities been passed successfully? Are there any waivers that need to be granted?

What are the residual risks we just talked about? And then after the FRR has passed, you actually start the countdown-- T minus X days, Y hours, Z seconds-- to an actual launch, or a product launch, whatever it is. And then here's from the handbook, the entrance and success criteria for FRR.

Everything should have been done at this point. Your design, your integration, your testing, your operating procedure, your people should be trained. This is your last chance to raise a red flag. After the FRR, you're essentially go for launch. So the stakes are high.

OK, so a quick summary. Verification and validation are critical. There's a distinction between the two. Verification is against the requirements as written. Validation is you go back to your customer and you test in a real environment.

Testing, many different kinds of testing. It's a fundamentally Q&A activity, and it's really expensive, but it needs to be done right. Risk management, we have different tools like the risk matrix, risk identification, mitigation.

And that's really where the rubber meets the road, in terms of the tension between cost, scope, schedule, and risk in projects. System safety, think about not just the chain of events model, but this control's view as well. STAMP/STPA is a particular framework for this. And if you're interested, there's a whole class. There's a whole set of things you can learn about just safety.

And then finally, FRR is your last chance to raise the red flag. It's sort of the big milestone before you go live with your system. OK? So any last questions or comments? EPFL? Yes, please go ahead.

AUDIENCE: So after the FRR you should have finished your tests, right? It's the limit?

PROFESSOR: Yes, you should have finished your tests except for the ones that you're going to do, say, on orbit, right?

AUDIENCE: Because I was wondering, in the list there is the go, no-go test. And I'm wondering if it's not related to the launch, actually?

PROFESSOR: Yeah, so the actual launch itself, of course, has the actual launch countdown. And you can stop the launch. So this is a review. The FRR is more like a CDR.

So the countdown hasn't actually started yet. But if you successfully pass the FRR, that's when you begin the countdown. The official countdown starts. And then you still have a possibility, of course, of stopping the launch. But in terms of a formal programmatic review, this is your last chance.

AUDIENCE: I think the point here is more about the what is the system boundary? If you consider the system boundary being the whole mission, including the launch, then obviously the system will only be finished when the mission has finished phase EF.

PROFESSOR: Right.

AUDIENCE: And this FRR, Flight Readiness Review, is linked to the panel or to the whole satellite, specifically that you are allowed to go forward and now start the countdown issues. And then all the other things. Like, you'd extend the mission boundaries to the whole space program or to the whole colonization of Mars, it will be, obviously, later.