Lecture 29: Matrix Exponentials

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Topics covered: Matrix Exponentials; Application to Solving Systems

Instructor/speaker: Prof. Arthur Mattuck

We are going to need a few facts about fundamental matrices, and I am worried that over the weekend this spring activities weekend you might have forgotten them. So I will just spend two or three minutes reviewing the most essential things that we are going to need later in the period. What we are talking about is, I will try to color code things so you will know what they are. First of all, the basic problem is to solve a system of equations. And I am going to make that a two-by-two system, although practically everything I say today will also work for end-by-end systems. Your book tries to do it end-by-end, as usual, but I think it is easier to learn two-by-two first and generalize rather than to wade through the complications of end-by-end systems. So the problem is to solve it. And the method I used last time was to describe something called a fundamental matrix. A fundamental matrix for the system or for A, whichever you want, remember what that was. That was a two-by-two matrix of functions of t and whose columns were two independent solutions, x1, x2. These were two independent solutions. In other words, neither was a constant multiple of the other. Now, I spent a fair amount of time showing you the two essential properties that a fundamental matrix had. We are going to need those today, so let me remind you the basic properties of X and the properties by which you could recognize one if you were given one. First of all, the easy one, its determinant shall not be zero, is not zero for any t, for any value of the variable. That simply expresses the fact that its two columns are independent, linearly independent, not a multiple of each other. The other one was more bizarre, so I tried to call a little more attention to it. It was that the matrix satisfies a differential equation of its own, which looks almost the same, except it's a matrix differential equation. It is not our column vectors which are solutions but matrices as a whole which are solutions. In other words, if you take that matrix and differentiate every entry, what you get is the same as A multiplied by that matrix you started with. This, remember, expressed the fact, it was just really formal when you analyzed what it was, but it expressed the fact that it says that the columns solved the system. The first thing says the columns are independent and the second says each column separately is a solution to the system. That is as far, more or less. Then I went in another direction and we talked about variation of parameters. I am not going to come back to variation of parameters today. We are going in a different tack. And the tack we are going on is I want to first talk a little more about the fundamental matrix and then, as I said, we will talk about an entirely different method of solving the system, one which makes no mention of eigenvalues or eigenvectors, if you can believe that. But, first, the one confusing thing about the fundamental matrix is that it is not unique. I have carefully tried to avoid talking about the fundamental matrix because there is no "the" fundamental matrix, there is only "a" fundamental matrix. Why is that? Well, because these two columns can be any two independent solutions. And there are an infinity of ways of picking independent solutions. That means there is an infinity of possible fundamental matrices. Well, that is disgusting, but can we repair it a little bit? I mean maybe they are all derivable from each other in some simple way. And that is, of course, what is true. Now, as a prelude to doing that, I would like to show you what I wanted to show you on Friday but, again, I ran out of time, how to write the general solution -- -- to the system. The system I am talking about is that pink system. Well, of course, the standard naďve way of doing it is it's x equals, the general solution is an arbitrary constant times that first solution you found, plus c2, times another arbitrary constant, times the second solution you found. Okay. Now, how would you abbreviate that using the fundamental matrix? Well, I did something very similar to this on Friday, except these were called Vs. It was part of the variation parameters method, but I promised not to use those words today so I just said nothing. Okay. What is the answer? It is x equals, how do I write this using the fundamental matrix, x1, x2? Simple. It is capital X times the column vector whose entries are c1 and c2. In other words, it is x1, x2 times the column vector c1, c2, isn't it? Yeah. Because if you multiply this think top row, top row, top row c1, plus top row times c2, that exactly gives you the top row here. And the same way the bottom row here, times this vector, gives you the bottom row of that. It is just another way of writing that, but it looks very efficient. Sometimes efficiency isn't a good thing, you have to watch out for it, but here it is good. So, this is the general solution written out using a fundamental matrix. And you cannot use less symbols than that. There is just no way. But that gives us our answer to, what do all fundamental matrices look like? Well, they are two columns are solutions. The answer is they look like -- Now, the first column is an arbitrary solution. How do I write an arbitrary solution? There is the general solution. I make it a particular one by giving a particular value to that column vector of arbitrary constants like two, three or minus one, pi or something like that. The first guy is a solution, and I have just shown you I can write such a solution like X, c1 with a column vector, a particular column vector of numbers. This is a solution because the green thing says it is. And side by side, we will write another one. And now all I have to do is, of course, there is supposed to be a dependent. We will worry about that in just a moment. All I have to do is make this look better. Now, I told you last time, by the laws of matrix multiplication, if the first column is X c1 and the second column is X c2, using matrix multiplication that is the same as writing it this way. This square matrix times the matrix whose entries are the first column vector and the second column vector. Now, I am going to call this C. It is a square matrix of constants. It is a two-by-two matrix of constants. And so, the final way of writing it is just what corresponds to that, X times C. And so X is a given fundamental matrix, this one, that one, so the most general fundamental matrix is then the one you started with, and multiply it by an arbitrary square matrix of constants, except you want to be sure that the determinant is not zero. Well, the determinant of this guy won't be zero, so all you have to do is make sure that the determinant of C isn't zero either. In other words, the fundamental matrix is not unique, but once you found one all the other ones are found by multiplying it on the right by an arbitrary square matrix of constants, which is nonsingular, it has determinant nonzero in other words. Well, that was all Friday. That's Friday leaking over into Monday. And now we begin the true Monday. Here is the problem. Once again we have our two-by-two system, or end-by-end if you want to be super general. There is a system. What do we have so far by way of solving it? Well, if your kid brother or sister when you go home said, a precocious kid, okay, tell me how to solve this thing, I think the only thing you will be able to say is well, you do this, you take the matrix and then you calculate something called eigenvalues and eigenvectors. Do you know what those are? I didn't think you did, blah, blah, blah, show how smart I am. And you then explain what the eigenvalues and eigenvectors are. And then you show how out of those make up special solutions. And then you take a combination of that. In other words, it is algorithm. It is something you do, a process, a method. And when it is all done, you have the general solution. Now, that is fine for calculating particular problems with a definite model with definite numbers in it where you want a definite answer. And, of course, a lot of your work in engineering and science classes is that kind of work. But the further you get on, well, when you start reading books, for example, or god forbid start reading papers in which people are telling you, you know, they are doing engineering or they are doing science, they don't want a method, what they want is a formula. In other words, the problem is to fill in the blank in the following. You are writing a paper, and you just set up some elaborate model and A is a matrix derived from that model in some way, represents bacteria doing something or bank accounts doing something, I don't know. And you say, as is well-known, the solution is, of course, you only have letters here, no numbers. This is a general paper. The solution is given by the formula. The only trouble is, we don't have a formula. All we have is a method. Now, people don't like that. What I am going to produce for you this period is a formula, and that formula does not require the calculation of any eigenvalues, eigenvectors, doesn't require any of that. It is, therefore, a very popular way to fill in to finish that sentence. Now the question is where is that formula going to come from? Well, we are, for the moment, clueless. If you are clueless the place to look always is do I know anything about this sort of thing? I mean is there some special case of this problem I can solve or that I have solved in the past? And the answer to that is yes. You haven't solved it for a two-by-two matrix but you have solved it for a one-by-one matrix. A one-by-one matrix also goes by the name of a constant. It is just a thing. It's a number. Just putting brackets around it doesn't conceal the fact that it is just a number. Let's look at what the solution is for a one-by-one matrix, a one-by-one case. If we are looking for a general solution for the end-by-end case, it must work for the one-by-one case also. That is a good reason for us starting. That looks like x, doesn't it? A one-by-one case. Well, in that case, I am trying to solve the system. The system consists of a single equation. That is the way the system looks. How do you solve that? Well, you were born knowing how to solve that. Anyway, you certainly didn't learn it in this course. You separate variables, blah, blah, blah, and the solution is x equals, the basic solution is e to the at, and you multiply that by an arbitrary constant. Now, that is a formula for the solution. And it uses the parameter in the equation. I didn't have to know a special number. I didn't have to put a particular number here to use that. Well, the answer is that the same idea, whatever the answer I give here has got to work in this case, too. But let's take a quick look as to why this works. Of course, you separate variables and use calculus. I am going to give you a slightly different argument that has the advantage of generalizing to the end-by-end case. And the argument goes as follows for that. It uses the definition of the exponential function not as the inverse to the logarithm, which is where the fancy calculus books get it from, nor as the naďve high school method, e squared means you multiply e by itself and e cubed means you do it three times and so on. And e to the one-half means you do it a half a time or something. So, the naďve definition of the exponential function. Instead, I will use the definition of the exponential function that comes from an infinite series. Leaving out the arbitrary constant that we don't have to bother with. e to the a t is the series one plus at plus a squared t squared over two factorial. I will put out one more term and let's call it quits there. If I take this then argument goes let's just differentiate it. In other words, what is the derivative of e to the at with respect to t? Well, just differentiating term by term it is zero plus the first term is a, the next term is a squared times t. This differentiates to t squared over two factorial. And the answer is that this is equal to a times, if you factor out the a, what is left is one plus a t plus a squared t squared over two factorial In other words, it is simply e to the at. In other words, by differentiating the series, using the series definition of the exponential and by differentiating it term by term, I can immediately see that is satisfies this differential equation. What about the arbitrary constant? Well, if you would like, you can include it here, but it is easier to observe that by linearity if e to the a t solves the equation so does the constant times it because the equation is linear. Now, that is the idea that I am going to use to solve the system in general. What are we doing to say? Well, what could we say? The solution to, well, let's get two solutions at once by writing a fundamental matrix. "A" fundamental matrix, I don't claim it is "the" one, for the system x prime equals A x. That is what we are trying to solve. And we are going to get two solutions by getting a fundamental matrix for it. The answer is e to the a t. Isn't that what it should be? I had a little a. Now we have a matrix. Okay, just put the matrix up there. Now, what on earth? The first person who must have thought of this, it happened about 100 years ago, what meaning should be given to e to a matrix power? Well, clearly the two na•ve definitions won't work. The only possible meaning you could try for is using the infinite series, but that does work. So this is a definition I am giving you, the exponential matrix. Now, notice the A is a two-by-two matrix multiplying it by t. What I have up here is that it's basically a two-by-two matrix. Its entries involve t, but it's a two-by-two matrix. Okay. We are trying to get the analog of that formula over there. Well, leave the first term out just for a moment. The next term is going to surely be A times t. This is a two-by-two matrix, right? What should the next term be? Well, A squared times t squared over two factorial. What kind of a guy is that? Well, if A is a two-by-two matrix so is A squared. How about this? This is just a scalar which multiplies every entry of A squared. And, therefore, this is still a two-by-two matrix. That is a two-by-two matrix. This is a two-by-two matrix. No matter how many times you multiply A by itself it stays a two-by-two matrix. It gets more and more complicated looking but it is always a two-by-two matrix. And now I am multiplying every entry of that by the scalar t cubed over three factorial. I am continuing on in that way. What I get, therefore, is a sum of two-by-two matrices. Well, you can add two-by-two matrices to each other. We've never made an infinite series of them, we haven't done it, but others have. And this is what they wrote. The only question is, what should we put in the beginning? Over there I have the number one. But I, of course, cannot add the number one to a two-by-two matrices. What goes here must be a two-by-two matrix, which is the closest thing to one I can think of. What should it be? The I. Two-by-two I. Two-by-two identity matrix looks like the natural candidate for what to put there. And, in fact, it is the right thing to put there. Okay. Now I have a conjecture, you know, purely formally, changing only with a keystroke of the computer, all the little a's have been changed to capital A's. And now all I have to do is wonder if this is going to work. Well, what is the basic thing I have to check to see that it is the fundamental matrix? The question is, I wrote it down all right, but is this a fundamental matrix for the system? Well, I have a way of recognizing a fundamental matrix when I see one. The critical thing is that it should satisfy this matrix differential equation. That is what I should verify. Does this guy that I have written down satisfy this equation? And the answer is, number two is, it satisfies x prime equals Ax. In other words, plugging in x equals this e to the at, whose definition I just gave you. If I substitute that in, does it satisfy that matrix differential equation? The answer is yes. I am not going to calculate it out because the calculation is actually identical to what I did there. The only difference is when I differentiated it term by term, how do you differentiate something like this? Well, you differentiate every term in it. But, if you work it out, this is a constant matrix, every term of which is multiplied by t squared over two factorial. Well, if you differentiate every entry of that constant, of that matrix, what you are going to get is A squared times just the derivative of that part, which is simply t. In other words, the formal calculation looks absolutely identical to that. So the answer to this is yes, by the same calculation as before, as for the one-by-one case. And now the only other thing to check is that the determinant is not zero. In fact, the determinant is not zero at one point. That is all you have to check. What is x of zero? What is the value of the determinant of x is e to the At? What is the value of this thing at zero? Here is my function. If I plug in t equals zero, what is it equal to? I. What is the determinant of I? One. It is certainly not zero. By writing down this infinite series, I have my two solutions. Its columns give me two solutions to the original system. There were no eigenvalues, no eigenvectors. I have a formula for the answer. What is the formula? It is e to the At. And, of course, anybody reading the paper is supposed to know what e to the At is. It means that. This is just marvelous. There must be a fly in the ointment somewhere. Only a teeny little fly. There is a teeny little fly because it is almost impossible to calculate that series for all reasonable times. However, once in a while it is. Let me give you an example where it is possible to calculate the series and were you get a nice answer. Let's work out an example. By the way, you know, nowadays, we are not back 50 years, the exponential matrix has the same status on, say, a Matlab or Maple or Mathematica, as the ordinary exponential function does. It is just a command you type in. You type in your matrix. And you now say EXP of that matrix and out comes the answer to as many decimal places as you want. It will be square matrix with entries carefully written out. So, in that sense, the fact that we cannot calculate it shouldn't bother us. There are machines to do the calculations. What we are interested in is it as a theoretical tool. But, in order to get any feeling for this at all, we certainly have to do a few calculations. Let's do an easy one. Let's consider the system x prime equals y, y prime equals x. This is very easily done by elimination, but that is forbidden. First of all, we write it as a matrix. It's the system x prime equals zero, one, one, zero, x. Here is my A. And so e to the At is going to be -- A is zero, one, one, zero. What we want to calculate is we are going to get both solutions at once by calculating it one fell swoop e to the At. Okay. E to the At equals. I am going to actually write out these guys. Well, obviously the hard part, the part which is normally going to prevent us from calculating this series explicitly, by hand anyway, because, as I said, the computer can always do it. The value, how do we raise a matrix to a high power? You just keep multiplying and multiplying and multiplying. That looks like a rather forbidding and unpromising activity. Well, here it is easy. Let's see what happens. If that is A, what is A squared? I am going to have to calculate that as part of the series. That is going to be zero, one, one, zero times zero, one, one, zero, which is one, zero, zero, one. We got saved. It is the identity. Now, from this point on we don't have to do anymore calculations, but I will do them anyway. What is A cubed? Don't start from scratch again. No, no, no. A cubed is A squared times A. And A squared is, in real life, the identity. Of course, you would do all this in your head, but I am being a good boy and writing it all out. This is I, the identity, times A, which is A. I will do one more. What is A to the fourth? Now, you would be tempted to say A to the fourth is A squared, which is I times I, which is I, but that would be wrong. A to the fourth is A cubed times A, which is, I have just calculated is A times A, right? And now that is A squared, which is the identity. It is clear, by this argument, it is going to continue in the same way each time you add an A on the right-hand side, you are going to keep alternating between the identity, A, the next one will be identity, the next will be A. The end result is that the first term of the series is simply the identity; the next term of the series is A, but it is multiplied by t. I will keep the t on the outside. Remember, when you multiply a matrix by a scalar, that means multiply every entry by that scalar. This is the matrix zero, t, t, zero. I will do a couple more terms. The next term would be, well, A squared we just calculated as the identity. That looks like this. Except now I multiply every term by t squared over two factorial. All right. I'll go for broke. The next one will be this times t cubed over three factorial. And, fortunately, I have run out of room. Okay, let's calculate then. What is the final answer for e to At? I have an infinite series of two-by-two matrices. Let's look at the term in the upper left-hand corner. It is one plus zero times t plus one times t squared over two factorial plus zero times t. It is going to be, in other words, one plus t squared over two factorial plus the next term, which is not on the board but I think you can see, is this. And it continues on in the same way. How about the lower left term? Well, that is zero plus t plus zero plus t cubed over three factorial and so on. It is t plus t cubed over three factorial plus t to the fifth over five factorial. And the other terms in the other two corners are just the same as these. This one, for example, is zero plus t plus zero plus t cubed over three factorial. And the lower one is one plus zero plus t squared and so on. This is the same as one plus t squared over two factorial and so on, and up here we have t plus t cubed over three factorial and so on. Well, that matrix doesn't look very square, but it is. It is infinitely long physically, but it has one term here, one term here, one term here and one term there. Now, all we have to do is make those terms look a little better. For here I have to rely on the culture, which you may or may not posses. You would know what these series were if only they alternated their signs. If this were a negative, negative, negative then the top would be cosine t and this would be sine t, but they don't. So they are the next best thing. They are what? Hyperbolic. The topic is not cosine t, but cosh t. The bottle is sinh t. And how do we know this? Because you remember. And what if I don't remember? Well, you know now. That is why you come to class. Well, for those of you who don't, remember, this is e to the t plus e to the negative t. It should be over two, but I don't have room to put in the two. This doesn't mean I will omit it. It just means I will put it in at the end by multiplying every entry of this matrix by one-half. If you have forgotten what cosh t is, it's e to the t plus e to the negative t divided by two. And the similar thing for sinh t. There is your first explicit exponential matrix calculated according to the definition. And what we have found is the solution to the system x prime equals y, y prime equals x. A fundamental matrix. In other words, cosh t and sinh t satisfy both solutions to that system. Now, there is one thing people love the exponential matrix in particular for, and that is the ease with which it solves the initial value problem. It is exactly what happens when studying the single system, the single equation x prime equals Ax, but let's do it in general. Let's do it in general. What is the initial value problem? Well, the initial value problem is we start with our old system, but now I want to plug in initial conditions. I want the particular solution which satisfies the initial condition. Let's make it zero to avoid complications, to avoid a lot of notation. This is to be some starting value. This is a certain constant vector. It is to be the value of the solution at zero. And the problem is find what x of t is. Well, if you are using the exponential matrix it is a joke. It is a joke. Shall I derive it or just do it? All right. The general solution, let's derive it, and then I will put up the final formula in a box so that you will know it is important. What is the general solution? Well, I did that for you at the beginning of the period. Once you have a fundamental matrix, you get the general solution by multiplying it on the right by an arbitrary constant vector. The general solution is going to be x equals e to the At. That is my super fundamental matrix, found without eigenvalues and eigenvectors. And this should be multiplied by some unknown constant vector c. The only question is, what should the constant vector c be? To find c, I will plug in zero. When t is zero, here I get x of zero, here I get e to the A times zero times c. Now what is this? This is the vector of initial conditions? What is e to the A times zero? Plug in t equals zero. What do you get? I. Therefore, c is what? c is x zero. It is a total joke. And the solution is, the initial value problem is x equals e to the At times x zero. It is just what it would have been at one variable. The only difference is that here we are allowed to put the c out front. In other words, if I asked you to put in the initial condition, you would probably write x equals little x zero times e to the At. And you would be tempted to do the same thing here, vector x equals vector x zero times e to the At. Now, you cannot do that. And, if you try to Matlab will hiccup and say illegal operation. What is the illegal operation? Well, x is a column vector. From the system it is a column vector. That means the initial conditions are also a column vector. You cannot multiply a column vector out front and a square matrix afterwards. You cannot. If you want to multiply a matrix by a column vector, it has to come afterwards so you can do zing, zing. There is no zing, you see. You cannot put it in front. It doesn't work. So it must go behind. That is the only place you might get tripped up. And, as I say, if you try to type that in using Matlab, you will immediately get error messages that it is illegal, you cannot do that. Anyway, we have our solution. There is our system. Our initial value problem anyway is in pink, and its solution using the exponential matrix is in green. Now, the only problem is we still have to talk a little bit more about calculating this. Now, the principle warning with an exponential matrix is that once you have gotten by the simplest things involving the fact that it solves systems, it gives you the fundamental matrix for a system, then you start flexing your muscles and say, oh, well, let's see what else we can do with this. For example, the reason exponentials came into being in the first place was because of the exponential law, right? I will kill anybody who sends me emails saying, what is the exponential law? The exponential law would say that e to the A plus B is equal to e to the A times e to the B. The law of exponents, in other words. It is the thing that makes the exponential function different from all other functions that it satisfies something like that. Now, first of all, does this make sense? That is are the symbols compatible? Let's see. This is a two-by-two matrix, this is a two-by-two matrix, so it does make sense to multiply them, and the answer will be a two-by-two matrix. How about here? This is a two-by-two matrix, add this to it. It is still a two-by-two matrix. e to a two-by-two matrix still comes out to be a two-by-two matrix. Both sides are legitimate two-by-two matrices. The only question is, are they equal? And the answer is not in a pig's eye. How could this be? Well, I didn't make up these laws. I just obey them. I wish I had time to do a little calculation to show that it is not true. It is true in certain special cases. It is true in the special case, and this is pretty much if and only if, the only case in which it is true is if A and B are not arbitrary square matrices but commute with each other. You see, if you start writing out the series to try to check whether that law is true, you will get a bunch of terms here, a bunch of terms here. And you will find that those terms are pair-wise equal only if you are allowed to let the matrices commute with each other. In other words, if you can turn AB plus BA into twice AB then everything will work fine. But if you cannot do that it will not. Now, when do two square matrices commute with each other? The answer is almost never. It is just a lucky accident if they do, but there are three cases of the lucky accident which you should know. The three cases, I feel justified calling it "the" three cases. Oh, well, maybe I shouldn't do that. The three most significant examples are, example number one, when A is a constant times the identity matrix. In other words, when A is a matrix that looks like this. That matrix commutes with every other square matrix. If that is A, then this law is always true and you are allowed to use this. Okay, so that is one case. Another case, when A is more general, is when B is equal to negative A. I think you can see that that is going to work because A times minus A is equal to minus A times A. Yeah, they are both equal to A squared, except with a negative sign in front. And the third case is when B is equal to the inverse of A because A A inverse is the same as A inverse A. They are both the identity. Of course, A must have an inverse. Okay, let's suppose it does. Now, of them this is, I think, the most important one because it leads to this law. That is forbidden, but there is one case of it which is not forbidden and that is here. What will it say? Well, that will say that e to the A minus A is equal to e to the A times e to the negative A. This is true, even though the general law is false. That is because A and negative A commute with each other. But now what does this say? What is e to the zero matrix? In other words, suppose I take the matrix that is zero and plug it into the formula for e? What do you get? e to the zero times t is I. It has to be a two-by-two matrix if it is going to be anything. It is the matrix I. This side is I. This side is the exponential matrix. And what does that show? It shows that the inverse matrix, the e to the A, is e to the negative A. That is a very useful fact. This is the main survivor of the exponential law. In general it is false, but this standard corollary to the exponential law is true, is equal to e to the minus A, just what you would dream and hope would be true. Okay. I have exactly two and a half minutes left in which to do the impossible. All right. The question is, how do you calculate e to the At? You could use series, but it rarely works. It is too hard. There are a few examples, and you will have some more for homework, but in general it is too hard because it is too hard to calculate the powers of a general matrix A. There is another method, which is useful only for matrices which are symmetric, but like that -- Well, it is more than symmetric. These two have to be the same. But you can handle those, as you will see from the homework problems, by breaking it up this way and using the exponential law. This would be zero, b, b, zero. See, these two matrices commute with each other and, therefore, I could use the exponential law. This leaves all other cases. And here is the way to handle all other cases. All other cases. In other words, if you cannot calculate the series, this trick doesn't work, I have done as follows. You start with an arbitrary fundamental matrix, not the exponential matrix. You multiply it by its value at zero, that is a constant matrix, and you take the inverse of that constant matrix. It will have one because, remember, the fundamental matrix never has the determinant zero. So you can always take its inverse-ready value of t. Now, what property does this have? It is a fundamental matrix. How do I know that? Well, because I found all fundamental matrices for you. Take any one, multiply it by a square matrix on the right-hand side, and you get still a fundamental matrix. And what is its value at zero? Well, it is x of zero times x of zero inverse. Its value at zero is the identity. Now, e to the At has these same two properties. Namely, it is a fundamental matrix and its value at zero is the identity. Conclusion, this is e to the At. And that is the garden variety method of calculating the exponential matrix, if you want to give it explicitly. Start with any fundamental matrix calculated, you should forgive the expression using eigenvalues and eigenvectors and putting the solutions into the columns. Evaluate it at zero, take its inverse and multiply the two. And what you end up with has to be the same as the thing calculated with that infinite series. Okay. You will get lots of practice for homework and tomorrow.