Lecture 10: Dynamic Programming: Advanced DP

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: In this lecture, Professor Devadas introduces the concept of dynamic programming.

Instructors: Srinivas Devadas

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

PROFESSOR: All right. Good morning, everyone. Thanks for coming on quiz day. I have to have someone to throw Frisbees to. Empty chairs would be difficult.

So we're going to be doing dynamic programming, a notion you've learned in 6006. We'll look at three different examples today. The first one is really at the level of 006, a cute little problem on finding the longest palindromic sequence inside of a longer sequence. Sometimes it's called longest palindromic subsequence.

And as we'll talk about, subsequence means that it can be non-contiguous. So you could skip letters in a sequence of letters and you would still have a subsequence corresponding to that. Don't have to be contiguous.

And then we'll raise the stakes a little bit. So each of these problems gets progressively more complicated, more sophisticated. And you'll probably see problems here, at least alternating coin game, that are beyond 006 in the sense that it wasn't covered. Those kinds of notions weren't covered in 006.

So just in terms of review, I wrote this up here because I don't want to spend a whole lot of time on it. This is something that you should have some familiarity with from the recitation, for example, that you had on Friday and from 006. Dynamic programming is this wonderful hammer. It's an algorithmic technique that you can use to solve problems that look exponential in complexity. But if you can find this optimum substructure associated with the problem and its connection to it's subproblems, and if you can characterize that, then you can do a recursive decomposition of the problem where you show that you can construct the optimum solution from the subproblems.

And that's really the key step in dynamic programming, is step two. Once you've made this characterization, you write this recurrence out that relates the optimal value of a bigger problem to the optimal values of subproblems. And you compute the value of the optimal solution through a recursive memoization. And that memoization is really what gives you the efficient algorithm, because you don't repeat the solution of subproblems.

You can also do this in an iterative fashion. Essentially, you're going to be computing things bottom-up. You might want to think about it as top-down when you write your recurrence. But ultimately, when you actually execute the program, you'll be computing things bottom-up, and you'll be checking the memo table to see if you actually solved this problem before. And that would be the recursive memoized case, which, in some sense, is a little bit easier to think about and translate directly from the recurrence.

But the other way to do it is to do it iteratively. And we'll take a look for this first problem as to how you'd do the two different ways at least from a conceptual standpoint, even though I might not write out the code for each of those cases. So a couple of choices here, recurse and memoize, or essentially do it iteratively. And the smaller subproblems would have to get computed first in both approaches.

The one thing that sometimes we don't spend a whole lot of time on is this last step, which is getting the exact solution. So a lot of the time, you stop with saying, I can compute the value of the optimum solution, the value in terms of the length of the longest palindromic sequence is 7 or 9. But what is that sequence or subsequence? That requires some additional coding, some additional accounting. The construction of this optimal solution typically requires some back-tracing and some information to be kept track of during the recurse and memoize step or during the iterative step.

So that's something to keep in mind. You're not quite done once you've found the value of the optimum solution. More often than not, you want the solution. So we'll talk about that as well a little bit.

But let's just dive in and look at this cute, little problem of longest palindromic sequence. And palindromes, of course, read the same front to back or back to front. Radar is a palindrome. Just as a trivial example, a single letter is a palindrome. Maybe I should have used a since that's actually a word. But here I got bb. That's definitely not a word, at least not a word that-- acronym, maybe. Radar is a palindrome.

Able was I 'ere I saw Elba, right? That's a palindrome. That's, of course, not a single word. But it's a famous palindrome, days of Napoleon.

But what we're trying to do here is, given that we have this notion of a palindrome, we'd like to discover palindromes inside longer words or longer sequences. So what we have is a string, and we'll call it x1 through n, n greater than or equal to 1. And we want to find the longest palindrome that is a subsequence.

And so here's an example to get you guys warmed up. We'll have a couple of puzzles here in a second. So character. And you want to find the longest palindrome. And so you go, I'll pick c, I'll skip h, I'll pick a, I'll pick r, a, and c. And carac, which I guess is not a word either-- but it's the longest palindrome that corresponds to a subsequence of character. Right?

So the game here, as you can see, is to pick the letters that form the palindrome and drop the ones that don't. OK? And we're going to have to use dynamic programming to do this.

The answer will be greater than or-- 1 in length because we've defined a single letter as a palindrome. So it has to be-- if you have one letter in the input, well, you just pick that letter. But regardless of how many letters you have on the input, greater than or equal to 1, you know that you're going to get at least a one letter palindrome at the output.

So here we go. Let's say I have under-- and this is thanks to Eric here. I've got a couple of nice words-- underqualified. The person who gets me the longest palindrome wins a Frisbee. And if you want to code dynamic programming in the next two minutes and run it on your laptops, that's perfectly fine with me. That's not cheating.

So underqualified. So u is a palindrome. So we got that. Right? So one letter for sure.

What else? What's the longest palindrome? Shout it out. Go ahead. Shout it out.

[STUDENTS RESPOND]

PROFESSOR: D-E-I-- wow, that was quick. Deified. That's right. So right? Well, deified is to make somebody a deity. So that was you, a couple of you guys? Do you have a Frisbee yet?

AUDIENCE: No.

PROFESSOR: All right. All right. This is a little bit more difficult. I tried this on my daughter yesterday, so I know it's a little more difficult. Turboventilator. We'll call that a word. Turboventilator. Yell it out.

AUDIENCE: Rotor.

PROFESSOR: Sorry?

AUDIENCE: Rotor.

PROFESSOR: Rotor. OK, well, that's five. Can anybody beat that?

AUDIENCE: Rotator.

PROFESSOR: Rotator. So rotator. Rotator. So R-O-T-A-T-O-R, rotator. All right. Who is the rotator? You already have one? I want to throw this one. Right.

Good practice with the quiz, guys. Good practice with the quiz. No, no, no. These quiz jokes never go over well. I've been teaching for 27 years, and I still haven't learned that you don't joke about exams. But so nothing like this on the quiz. I don't want you studying the thesaurus as opposed to the textbook for the next few hours. OK? Nothing like this on the quiz.

All right. Those of you who are missing Python, who loved 6006 in Python, let's talk about how you would actually solve this using dynamic programming. And so what we want is lij, which is the length of the longest palindromic subsequence for xij. And we're going to have i less than or equal to j.

So that's essentially what we'd like to compute. And essentially when I've said this here, when I have lij, I have decomposed the overall problem into subproblems, and it was kind of fairly obvious in this case, because I'm going to have to go in order. Right? I mean that's the constraint. A subsequence does maintain the ordering constraint. It's not like I can invert these letters. That would be a different problem if I allowed you to do that.

So I'm going to start somewhere, and I'm going to end somewhere. And I want to have a non-null subsequence, so I'm going to have i less than or equal to j. I'm good with i being equal to j, because I still have one letter, and, well, that happens to be a palindrome, and it'll have a length of 1. So that's my lij.

And what I want to do is define a recursive algorithm that computes lij. So we'll just try and figure out what the recurrence looks like, and then we can talk about memoization or iteration.

So if i equals equals j, then I'm going to return 1, because I know that that's a palindrome by default. So that's easy. And what do you think the next check should be? If I look at this x sequence, and I have i as the starting point and j as the next point, what do you think the next check is going to be once I have-- if i is not equal to j?

AUDIENCE: If x of i equals x of j, then j equals as well.

PROFESSOR: Sorry. What was that again?

AUDIENCE: If x of i equals x plus--

PROFESSOR: Beautiful. You're just checking to see-- you're just checking to see whether the two endpoints are equal or not. Because if they're equal, then you can essentially grab those letters and say that you're going to be looking at a smaller subsequence that is going to get you a palindrome. And you're going to be able to add these two letters that are equal on either side of the computed palindrome from the subsequence.

So if x of i equals equals x of j, then I'm going to say, if i plus 1 equals equals j, I'm going to go ahead and return 2, because at that point, I'm done. There's nothing else to do. Else I'm going to return 2 plus L of i plus 1 j minus 1. So I'm going to look inside. And I've got these two letters on either side that are equal. So I can always prepend to the palindrome I got from here the letter, and then append the same letter. And I got 2 plus whatever value I got from this quantity here.

So so far, it's not really particularly interesting from a standpoint of constructing the optimum solution. But this last line that we have here, where we have the case that the two letters are not equal is the most interesting line of code. That's the most interesting aspect of this algorithm. So does someone tell me what this line is going to be out here? Yeah, go ahead.

AUDIENCE: If x of L i plus 1j or L i [INAUDIBLE].

PROFESSOR: Beautiful. That's exactly right. So what you're going to do is you're going to say, I need to look at two different subproblems, and I need to evaluate both of these subproblems. And the first subproblem is I'm going to-- since these two letters are different, I'm going to have to drop one of them. And I'm going to look inside.

I'm going to say-- in this case, I'm going to drop the i-th letter, and I'm going to get L i plus 1j. And in the second case, I'm going to drop the j-th letter, and I'm going to get ij minus 1. And that's it.

So it's a max. And there's nothing that's being added here, because those two letters, one of them had to get dropped. They weren't equal, so one of them had to get dropped. So you're not adding anything to this. So that's good.

And at this point, you're kind of done. Right? Whoops. Oh, nice catch. But you did drop something. All right.

So the thing that we've done here is gotten to step three. So just to be clear, we're not done-done, in terms of this chart here, because we don't have the code there that corresponds to actually computing the sequence. So it's not that hard. I'm not going to go over the code here. You can certainly look at it in the notes.

But you need a little bit of tracing backwards in this recursion to actually compute things. What is the complexity of what I wrote up there, though? Yeah?

AUDIENCE: Theta n squared?

PROFESSOR: Theta n squared. Do people agree that the complexity is theta n squared? Or is this gentleman an optimist? Is the complexity theta n squared? Tell me why the complexity is theta n squared?

AUDIENCE: Because each subproblem is-- that code right there just executed by itself is constant. But then there's theta n squared subproblems.

PROFESSOR: Absolutely. But if you actually implemented this code, and you ran it, and n was 100, how long would you wait for this code to complete? Look at it. What's missing?

AUDIENCE: The cache.

PROFESSOR: The cache, exactly. Well, you fixed your own little error there. It was a trick question.

So there's no recursion-- I'm sorry, there's recursion here, but no memoization. So this is exponential complexity. You will recur. In fact, the recurrence for that is something like T of n equals 1 if n equals 1, and 2T n minus 1 if n greater than 1. And this would be 2 raised to n minus 1 in complexity.

Now there's a single line of code, and you all know this, that would fix this. And that single line of code is simply something that says, right here, look at the lij-- and I'm writing this differently. I'm calling this now a 2D array. So that's why I have the open brackets and close brackets. So I'm overloading l here. But it's a 2D array that is going to essentially be a cache for the subproblem solution values. And then if you want to do the backtracing to actually compute the solution, you could certainly have that as an additional record that's connected to this very same value. But that's implementation, and we won't really go there.

So look at lij and don't recurse if lij already computed. OK. So that's important to remember.

Now, if you actually put that, the cache lookup, hash table lookup, array lookup, whatever you want to call it, out there, then what you said is exactly correct. So our formula for computing the complexity of a DP, that you've seen a bunch of times and I mentioned in the very first lecture as well, is number of subproblems times time to solve each subproblem assuming or given that smaller ones are solved or the lookup is Order 1. So lookup.

Now you could say that hash table lookup is Order 1 on average, et cetera, et cetera. So what actually happens in the worst case, in this particular case and in most DPs, you can do things like perfect caching or, something that's even simpler in this case, is just a 2D array. There's not going to be any collisions if you just use i and j as the indices to the array. So you will definitely get an Order 1 lookup in this case, in most problems we'll look at in 046.

So if we just do that computation, which my friend over here just described, you do get your theta n squared, because you have theta n squared subproblems. And time to solve each subproblem, given that the smaller ones are solved, is simply going to be a computation of a max and an addition. So all of that is theta 1, because you're not counting the recursive calls.

So this is our first example. I'm done with it. Any questions about it? Any questions about DP in general? All right, good.

So little bit of review there. Not a particularly complicated question. Let's go to a different question corresponding to optimal binary search trees. It's a very different question. I don't think I need this anymore.

And it's kind of cute in its own way. One of things that's interesting about this question is it seems like a greedy algorithm should work. And we'll talk about that, as to why the greedy algorithm doesn't quite work.

So it's kind of similar to the interval scheduling and the weighted interval scheduling problem that we had back in February, where the regular interval scheduling problem, greedy worked, earliest finish time worked. But when it came to weights, we had to graduate to dynamic programming.

So here's our second problem, optimal BSTs. So what is an optimal BST? We have a bunch of keys that we want to store in the BST, K1 through Kn. And we'll assume that the way this is set up is that K1 is less than K2, da, da, da, less than Kn. And just to make our life easier in our examples, we just assume that Ki equals i. That's not necessarily required for anything we're going to do next. It's just for the sake of examples and keeping things manageable.

So I got a bunch of keys. And clearly there are many different binary search trees that can come, whether they're balanced or unbalanced. Many different binary search trees can be consistent with a given set of keys. If I choose the root to be Kn, then I'm going to get this horribly unbalanced tree. If I chose the root to be somewhere in the middle, then I'm going to get something that looks a little better, at least at the top level. But again, if I messed it up at the next level, I'd get something that's unbalanced.

So there's clearly many different BSTs. I'm not talking about balanced BSTs here. But we're going to define an optimality criterion that's a little bit different from balanced BSTs, because it's going to have this additional cost function associated with it that corresponds to the weight of the keys. So what is that?

Well, I'm going to have weights associated with each of these keys corresponding to W1 through Wn. And the easiest way to motivate you to think that these weights are an interesting addition to this problem is to think about these weights as being search probabilities. So what you have, for argument's sake, is a static structure that you've created. I mean, you could modify it. There's nothing that's stopping you from doing that. But let's just pretend for now that it's a static structure corresponding to this BST-- has a particular structure. And chances are you're going to be searching for some keys more frequently than others.

And the Wi's tell you what the probabilities are in terms of searching for a particular key Ki. So you can imagine, and we won't have this here, that you take-- the Wi's all sum up to 1, if you want to think of them as probabilities. Or I'm just going to give you numbers. I don't want to deal with fractions, don't particularly like fractions. So you can imagine that each probability corresponds to Wi divided by the sum of all the Wi's, if you want all of the probabilities to sum up to 1.

So think of them as search probabilities, because then you'll see what the point of this exercise is. And the point of this exercise is to find the BST T-- so we're actually constructing a binary search tree here. So it's a little more interesting than a subsequence, for example-- it has a richer structure associated with it-- than an exponential number of possible binary search trees that are associated with a given set of n keys that are all binary search trees. They're consistent, unbalanced, balanced-- unbalanced in one way versus another way, et cetera, et cetera.

So we want to find a binary search tree T that minimizes sigma i equals 1 through n. Obviously, we're going to have this Wi that's the game here. Depth in that T-- so this depth is for that T. What is the depth of the node? And K of i plus 1. And I'll explain exactly what this and tell you what precisely the depth is. But roughly speaking, the depth of the root-- not roughly speaking. The depth of the root is 0. And the depth of 1 below the root is 1, and so on and so forth.

And so, as you can see, what you want to do is collect in-- roughly speaking, you want to collect the high weight nodes to have low depth. If Wi is high, you want the multiplicative factor corresponding to depth T of that node, i, i-th node, to be small. And if Wi is small, then you don't mind the depth number to be higher.

You only have a certain number of low-depth nodes. You only have one node of depth 0. And you have two nodes of depth 1, which means that you only have one node where that quantity there is going to be 1, because you're doing 0 plus 1. And you have two nodes where the quantity is going to be 2, and so on and so forth. So you have some room here to play with corresponding to the BST structure, and you want to minimize that quantity. Any questions so far? All right, good.

So the search probabilities would be an example. So in terms of a more concrete application, you could imagine you had a dictionary, English to French, French to English, what have you. And there are obviously words that are more common, let's say, in common speech than others, and you do want to do the translation in a dynamic way using this data structure. And you could imagine that in this case, the search probability of a word is associated with the occurrence of the word, the number of times over some normalized number of words that this particular word occurs. And that would be the way. So it makes sense to create a structure that minimizes that function, because it would minimize the expected search cost when you want to take this entire essay, for example, and convert it from English to French or vice versa.

So this can-- if these are search probabilities, then this would minimize-- this cost function here would minimize expected search cost. Make sense? Yeah.

So that's the definition of the problem. And now we have to talk about why this is complicated, why this requires dynamic programming. Why can't we just do something fairly straightforward, like a greedy algorithm? And so let's look into that.

Let me give you a really good sense for what's going on here with respect to this cost function and maybe a little abstract by giving you a couple of concrete examples. First off, we got exponentially many trees, exponential in n. Let's say that n equals 2. So the number of nodes is 2. Then remember that I'm assuming that the Ki's are all i's, that 1 and 2 would be the case for n equals 2 and 1, 2, 3 for n equals 3, et cetera. So I could have a tree that looks like that. And I could have a tree that looks like this. That's it. n equals 2, I got 2 trees.

So in this case, my cost function is W1 plus 2W2, and in this case, my cost function is 2W1 plus W2. Just to be clear, what this is the K, and it's also the i. So the numbers that you see inside, those are the i numbers, which happen to be equal to the Ki numbers. And the weight itself would be the Wi. So I put W1 here, and the reason it only gets multiplied by 1 is because the depth here is 0, and I add 1 to it. The depth here is 1, and I add 1 to it, which is why I have a 2 out here.

Being a little pedantic here and pointing things out, because you're going to start seeing some equations that are a little more complicated than this, and I don't want you to get confused as to what the weight is and what the key number is and what the depth is. There's three things going on here.

So over here you see that I'm looking at W1 here, which is this key. So the 1 corresponds to the 1. And this has a depth of 1, so I put a 2 in here, and so on. So so far, so good?

When you get to n equals 3, you start getting-- well, at this point, you have 1, 2, 3, 4, 5-- you have five trees. And the trees look like this. I'll draw them really quickly just to get a sense of the variety here. But I won't write the equations down for all of them.

There you go. So those are the five binary trees associated with n equals 3, the binary search trees. And this is kind of cool. It's nice and balanced. The other ones aren't. And I'm not looking at--

Should I have another one here? I guess I should have. This is a-- oh, no, no, no. So I'm not doing mirrors. So there are a bunch of other trees that have the same equation associated with two-- no, that's not true. Because these are it. I was going to say that you could put 3 in here, but that wouldn't be a binary search tree.

So this is it. And we've got a bunch of trees. This one would be 2W1 plus W2 plus 2W3 by the same process that I used to show you that W1 plus 2W2 for the n equals 2 case. You just go off, and that's the equation.

And so your goal here in an algorithm-- clearly this is not the algorithm you want to enumerate all the exponentially many trees, compute the equations for each of those trees, and pick the minimum. I mean, that would work, but it would take exponential time. But that's what you have to do now. That's the goal. You absolutely want the best possible three that has, for the given Wi's that you're going to be assigned as constants, you do want to find the one tree that has the minimum sum, and you want to do that for arbitrary n, and you want to do that in polynomial time.

So the first thing that you do when you have something like this is forgetting about the fact that we're in a dynamic programming lecture or a dynamic programming module of this class, when you see a problem like this in the real world, you want to think about whether a greedy algorithm would work or not. And you don't want to go off and build this dynamic program solution, which is, chances are, going to be more inefficient then a greedy solution, which, if it produces the optimum answer, is the best way to go.

So an obvious greedy solution would be to pick K of r in some greedy fashion. And what is the obvious way of picking K of r to try and get a greedy solution that at least attempts to minimize that cost function that we have there?

AUDIENCE: Highest weight.

PROFESSOR: Highest weight, right? So just pick K of r to be highest weight. Because that goes back to what I said about if Wi is high, you want this value here to be small. So that's essentially your greedy heuristic. So K of r should be picked in a greedy fashion.

So what you do is you pick K of r as the root in this particular problem. And you could certainly apply this greedy technique recursively. So you do that for every subproblem that you find because when you do this choice of the root for Kr in the current problem that you have, you immediately split the keys into two sets, the sets that have to go on the left-hand side of Kr and the sets that have to go on the right-hand side of Kr.

So in this case, you know that you're going to have a bunch of keys here that correspond to Ki to K4, and over here, you're going to have-- oh, Kr minus 1, I should say, because you already have Kr out there. And the keys here are going to be Kr plus 1 to Kj, if overall I'm going to say that eij is the cost of the optimal BST on Ki, Ki plus 1, all the way to Kj.

So I'm kind of already setting myself up here for dynamic programming. But it also makes sense in the case of a greedy algorithm, where this greedy heuristic is going to be applied recursively. So initially, you're going to have E(1, n)-- so if you want to compute E(1, n), which is the cost of the optimal BST on the original problem, all the way from 1 to n, you're going to look at it, and you're going to say, I have a bunch of keys here of different weights that correspond to K1 through Kn. I'm going to pick the Kr that corresponds to the maximum weight, and I'm going to use that as the root node, and I'm going to stick that up there. And the reason I did that is because I believe that this greedy heuristic is going to work, where this maximum weight node should have the absolute minimum depth.

So I stick that up there. And then my BST invariant tells me that Ki through Kr minus 1-- remember, these are sorted, and they go in increasing order, as I have up here. You're going to have those on the left, and you're going to have those on the right. And then you've got to go solve this problem. And you could certainly apply the greedy heuristic to solve this problem.

You could go essentially find the K that has the highest weight. So you look at Wi through Wr minus 1, and you find the K that has the highest weight. Actually, you're not going midway here. It's quite possible that Kr is K of n. It just happens to be Kr is K of n. So that means that you have the highest weight node up here, but it's also the biggest key. So all of the nodes are going to be on the left. So this greedy heuristic-- and now you're starting to see, maybe there's a problem here. Because if you have a situation where the highest weight node is also the largest node, you're going to get a pretty unbalanced tree.

So I can tell you that greedy doesn't work. And when I gave this lecture, I think it was a couple of years ago, I made that statement, and this annoying student asked me for an example. And I couldn't come up with one. And so I kind of bailed on it, went on, completely dissatisfied, of course. And by the end of the lecture, the same student came up with a counter-example. So I'm going to have put that up here and thank the student who sent me an email about it.

And so here's a concrete example. It turns out I was trying to get a tree node example for a few minutes and failed. It's, I think, impossible-- I haven't proved this-- to find a tree node example with arbitrary weights for which the greedy algorithm fails. But four nodes, it fails. So that's the good news.

So here's the example, thanks to Nick Davis. And it looks like this.

So I claim that the greedy algorithm for the problem that is given to me would produce this BST. And the reason for that is simple. The highest weight node happens to be node 2, which has a weight of 10. So I would pick that, and I would stick that up here, which means, of course, that I'm going to have to have 1 on this side, and I'm going to have to have 4 and 3 on the other side. And since 4 has a higher weight than 3, I'd pick that first. And then 3 would have to go over here.

So if I go do the math, the cost is going to be 1 times 2-- and I'll explain what these numbers are. I did tell you it was going to get a little cluttered here with lots of numbers. But if you keep that equation in mind, then this should all work out.

So what do I have here? I'm just computing-- this is W1. So you see the first numbers in each of these are the weights. And so this would be W2 and et cetera. And you can see that this is at depth 1, which means I have to have 2 here, because I'm adding 1 to the depth. So I have 1 times 2, 10 times 1, because that's at the root, 9 times 2, et cetera. And I get 54.

It turns out the optimum tree-- you can do better than that if you pick 3, and you go like this. And so what you have here is the cost equals 1 times 3 plus 10 times 2 plus 8 times 1 plus 9 times 2, and that's 49. So I'll let you look at that.

The bottom line is because of the way the weights are set up here, you really want to use the top three spots, which have that the minimum depth, for the highest weight nodes. And so you can see that I could make these weights arbitrary, obviously, and I could break a greedy algorithm as this little example shows. So we got no recurs here. We got to do some more work. We're going to have to use DP to solve this problem. The good news is DP does solve this problem.

So let's take a look at the decomposition. And as with anything else, the key is the step that corresponds to breaking up the original problem into two parts or more that essentially give you this sort of decomposition. And we don't quite know. So this is really the key question here. We don't quite know what the root node is going to be.

So what do we do when we don't know what to do? What do we do in DP when we don't know what to do? This is a very profound question. What do you do?

AUDIENCE: Guess.

PROFESSOR: You guess. And not only do yo guess, you guess all outcomes. You say, is it going to come up heads? Is it going to come up tails? I'll guess both. You have a die. It's going to get rolled, a 1, 2, 3, 4, 5, 6. Guess them all, right?

So if you're going to have to guess that the root node could be any of the keys, and it still-- I mean there's a linear number of guesses there. That's the good news. It's going to stay polynomial time.

So we're going to have to guess. And once you do that guess, it turns out that decomposition that you see up there is exactly what we want. It's just that the greedy algorithm didn't bother with all of these different guesses. And the DP is different from the greedy algorithm because it's going to do each of those guesses, and it's going to pick the best one that comes out of all of those guesses. So it's really not that different from greedy. So we'll keep that up there, leave that here. Let's go over here.

So the recursion here is not going to be that hard, once you have gotten the insight that you just have to go make a linear number of guesses corresponding to the root node for your particular subproblem, and obviously this happens recursively.

So there's a little something here that that I'll point out with respect to writing these equations out. So what I have here, just to be clear, is what I wrote up there. Even when I was talking about the greedy algorithm, I had the subproblems defined.

So the original problem is E1 through n. The subproblems correspond to eij. And given a subproblem, once I make a root choice, I get two subproblems that result from the root choice. And each of those two subproblems is going to be something-- and this is something to keep in mind as we write these equations out-- they're going to be one level below the root. So keep in mind that the original eij subproblem corresponded to this level, but that Ei r minus 1 problem and the E r plus 1j problem are at one level below. So keep that in mind as we write these equations up.

And so what I want to do here is just write the recurrence. And after that, things become fairly mechanical. The fun is in the recurrence.

So I got Wi if i equals j. And so I'm at this level. And if I just have one node left, then at that level, I'm going to pay the weight associated with that node, and I'm going to have to do a certain amount of multiplication here with respect to the depth. But I'm just talking about eij as the subproblem weight, just focusing in on that problem. So if I only have one node, it's the root node. And that weight is going to be Wi, because the root node has depth of 0, and I'm going to add 1 to it. So for that subproblem, I got Wi. Keep that in mind, because that's the only subtlety here with respect to these equations, making sure that our actual cost function that we're computing has the correct depth multiplicands associated with it.

So then we have our linear guessing. And I might have said max at some point. We want to get the min here. And I had minimize here, so I think I wrote things down correct, but at some point, I think I might have said we want to maximize cost. But we do want to minimize cost here corresponding to the expected search.

So we're doing a linear, and we're doing a min. And we're going to go off and look at each of the different nodes as being the root, just like we discussed. And what I'm going to do here is I'm going to simply say, first, that I'm going to look at Ei r minus 1, which corresponds to this, and I'm going to have plus E of r plus 1, j.

And at this point, I don't quite have what I want because I haven't actually taken into account the fact that the depth of the Ei r minus 1 and the r plus 1j is 1 more than the eij. So it's 1 more than that. And I also haven't taken into account the fact that, in this case, I definitely need to add a Wi as well.

Because the root node is part of my solution in both cases. In one case, it ended my solution, the top case. But in this case, it's also the root node, as you can see over on the left, and I've got to add that in there. So I definitely need to add a Wi here. And what else do I need to add? From a standpoint of the weights, what other weights do I need to add to this line? Yeah? Go ahead.

AUDIENCE: First of all, shouldn't that be a Wr?

PROFESSOR: First of all, that should be a Wr, you're exactly correct.

AUDIENCE: And you want to add all the weights.

PROFESSOR: And you want to add all the weights, right. You're exactly right. I have two more Frisbees, but I need to use them for something else. So you get one next time. Or do I have more than that? No, no. These are precious Frisbees here. Sorry, man.

And you corrected me, too. Shoot. This is sad.

So that needed to be a Wr. But you also need to add up all of the nodes that are in here, because they're one more level. And now you see why I made the mistake. I don't usually make mistakes. But what I really want-- actually it's more like-- I don't know, a few per lecture. Constant order one mistakes.

I'm going to say this is Wi, j, where Wi, j is simply the sum of all of the weights from i to j. And this makes perfect sense, because the nice thing is that I don't even need to put an r in there. I could choose a particular r. Wr will be in there. But all of the other nodes are going to be in there anyway. So it doesn't really matter what the r selection is corresponding to this term here. I will put it inside the minimization. This bracket here is closed with that, but you can pull that out, because that doesn't depend on r. It's just going to be there for all of the cases, all of the guesses.

So that's it. That's our recurrence relationship corresponding to the DP for this particular problem. You can go figure out what the complexity is. I have other things to do, so we'll move on. But you can do the same things, and these are fairly mechanical at this point to go write code for it, trace the solution to get the optimum binary search tree, yadda, yadda, yadda. Any questions about this equation or anything else? Do people get that?

Yeah, go ahead.

AUDIENCE: What is the depth input?

PROFESSOR: So the depth is getting added by the weights. So basically what's happening is that as I go deeper into the recursion, I'm adding the weights and potentially multiple times, depending on the depth. So if you really think about it, this Wij, I added all of these weights. But when you go into this recursion, into the ei of r minus 1, well, you're going to see i to r minus 1 in the next level of recursion. So you would have added the weight again.

So it's not the case that the weights are only appearing once. They're, in fact, appearing this many times. So that's what kind of cute about this, the way we wrote it. Any other questions? All right, good. So we're done with that.

So one last example. This is, again, a little bit different from the examples of DP we've looked at up until now, because it's a game. And you have an opponent, and you have to figure out what the opponent is going to do and try to win. I suppose you could try to lose as well. But let's assume here, same thing with minimization and maximization, most of the time you can invert these cost functions, and DP will still work. But let's assume here that you want to win this game.

So the game is an alternating coins game where we have a row of n coins of values V1 through Vn. These are not necessarily in any particular order. And n is even. And the goal here is select the outer coins. So select either the first our last coin from the row, and then the opponent plays. Remove permanently, but add it to your value and receive the value.

So I need two volunteers to play this game. And you want to maximize the value. The winner gets a blue Frisbee. The loser gets a purple Frisbee, because blue is greater than purple. And you might ask, why is that.

Well, if you went to a beach, and you saw water this color, and you went to another beach, and you saw water this color, which beach would you pick? Right? This is Cozumel. This is Boston Harbor. So blue is greater than purple. Don't use this proof technique in the quiz.

So do I have a couple of volunteers to play this game? Two of them over there. Are you waving for him? Yeah. I see one. You can come down. Another volunteer? Yeah. Over there. You don't get your Frisbees yet.

So we're going to make this really fair. What's your name?

AUDIENCE: Josiah.

PROFESSOR: Josiah.

AUDIENCE: Tessa.

PROFESSOR: Tessa. Josiah and Tessa, I'm going to write out a bunch of-- it's going to be a fairly short game. And we're going to be really fair, and we're going to flip a coin to decide whether Josiah goes first-- you can pick heads or tails-- or whether Tessa goes first. Actually if you win, you can let her go first if you want. But you get to choose.

AUDIENCE: All right.

PROFESSOR: So pick.

AUDIENCE: Heads.

PROFESSOR: That's heads. Do you want to go first, or do you want to let Tessa go first?

[LAUGHTER]

AUDIENCE: You should go first.

PROFESSOR: All right. So Tessa, you get to go first.

AUDIENCE: OK, 6.

PROFESSOR: 6, OK. So let's just say T. So you get to choose either 25 or 4, Josiah.

AUDIENCE: I think I'd pick 25.

PROFESSOR: You think you'll take 25. So that's j. So it's not on the 4 and 19 over here because those are gone. So your turn again.

AUDIENCE: OK. Can I take a minute, or should I just go?

PROFESSOR: Well, you can take 30 seconds.

AUDIENCE: 19.

PROFESSOR: 19, OK.

AUDIENCE: 4.

[LAUGHTER]

AUDIENCE: All right, 4.

PROFESSOR: 4. All right. And 42?

AUDIENCE: Yeah.

PROFESSOR: All right, 42. This is one strange game. Now, we get to add up the numbers. This is going to be tight. 4 plus 39 is 43. 43 plus 25 is 68. 42 plus 19 is 61. 61 plus 6 is 67. Ooh. Well, you get Frisbees. Well, blue for you. I mean, you could give her blue if you like.

AUDIENCE: I prefer purple.

PROFESSOR: You prefer purple. Thanks. Good job.

[APPLAUSE]

No offense is intended to Josiah and Tessa, but that was a classic example of how not to play the game. First off, Josiah could have won regardless of the coins that were up there, regardless of the values of the coins, if he'd chosen to go first. So this game, the person who goes first is guaranteed to not lose. And by that, I mean you may have a situation where you can have a tie in terms of the values, but you're guaranteed to not lose.

Now, he ended up winning anyway, because there were other errors made during this, and I don't have the time to enumerate all of them.

[LAUGHTER]

So we're just going to go move on and do the right thing. So let me first tell you, outside of DP, just in case you play this game over Spring Break or something, how you can win this game without having to compute complicated recursive memoization DP programs for this case.

So let's say I have V1, V2, V3, all the way to V n minus 1 and Vn. And remember n is even. So you've got to pick n over 2 coins if you're the first player and n over 2 if you're the second player.

So what the first player does is simply compute V1-- take the odd numbers-- and n is even, so you've got V n minus 1 being odd. Compute V1 plus V3 all the way to V n minus 1, and compare that with the even positions, V2 plus V4 all the way Vn. So in this particular instance, Josiah, who went second, if you just look at the odd positions, which is, in fact, what he ended up picking, it was 4 plus 39 plus 25, which was 68. So you can do this computation beforehand. You compute this to be 68, in our case. Compare that with 67. And you don't give up your first-player advantage. So you're going with the first player.

But now you say, I know that I can set this up-- and that wasn't the order that he did this-- but I know I can set this up so I always have a chance to get V1, V3, V5, et cetera. Because let's just say that the first player Josiah started first. And he decided, in this case, that the odd values are the ones that win. So let's say he picks V1.

At this point, Tessa sees V2 through Vn. So you're just looking at that. Now Tessa could either pick V2 or she could pick Vn. In that case, Josiah, who's the first player, could pick V3 or Vn minus 1 and stick with his rule of picking the odd coins. So regardless of what Tessa does, the first player, Josiah, could pick odd if odd was the way to go or if the values were such that, even with the way to go, he could go even. So no reason for the first player to lose.

But let's just say that you're a nasty person, Michael Jordan nasty. You want to just press your opponent. You want to make sure they never want to play you again. So now you have a maximization problem. So this is clearly not DP. You don't need DP to add up a bunch of numbers.

But let's say that you want to, given a set of coins, V1 through Vn, you want a dynamic strategy that gives you the maximum value. This odd versus even is a bit constraining, because you're stuck to these positions. And it's good to be in that situation if you just want to win and you don't care how you win. But if you want to maximize, then it's a more complicated problem.

And so we have to talk about how you would do something that would give-- maybe it would be V1 and V4 and something else, depends on the values. How would you get to a situation-- if you're the first player. We'll just stick with you're the first player. You know you can't lose using this strategy. But not only do you want to not lose, you want to get as many coins as possible-- let's say this is money. More money is better, and that's your goal. So people understand that.

A little trick there with, I guess, a greedy algorithm you can think of. Or it's not really even something that you might want to call an algorithm. It's a little computation. And our goal now is to maximize the amount of money when assuming you move first. And so this is going to be a little bit different from all of the other problems we've looked at, because we have to now think about what the opponent would do. And there's going to be a sequence of moves here. You're going to move first, so that one is easy. You've got the whole problem.

But now you have to say, the opponent is going to move and pick a coin. And then you have to think of your subproblems as the potential rows of coins that are going to be different, depending on what the opponent does. And through this process, you have to maximize the value. So it's really pretty different from the other couple of examples that we looked at.

So we're going to have to do a little bit of set-up before we get to the point where we can write something like this, which is the solution to our DP. And so let me do that set-up. But it's not super complicated. And part of this is going to look kind of the same as previous problems we've looked at.

So Vij is the max value we can definitely win if it is our turn. And only coins Vi through Vj remain. And so we have Vii. Then there's only one coin left, and you just pick i.

Now, you might say, but that's never going to happen if there's an even number of coins and I'm playing first, because the other player is going to be at the end of the-- is going to be the person who picks the last coin. But we need to categorize Vii because we need to model what the other player is going to do. So we can't just say that we're going to be looking at an even number of coins in terms of the row of coins that you look at, which is true, that when you get to move, you only see an even number of coins if you're the first player. But you do have to model what the opponent does . So we need the Vii.

And what you might see, of course, is a board that looks like Vi i plus 1, for some arbitrary i. So that's why I have it up there. But remember that you're only picking coins on the outside, so it's not like you're going to have a gap. I mean, you're not going to have V3 left and V7 left. There's no way that's going to happen. You're just going to keep shrinking, taking things from the left or the right. So it's going to be i and i plus 1. That make sense?

And so in this case, what would you pick? Vi and i plus 1. You just pick the max. Because at the end of this, either you did it right or you did it wrong. Either way, you're going to improve your situation by picking the max of Vi or Vi i plus 1. So there's no two things about it.

So here, you're going to pick the maximum of the two. And you might have Vi i plus 2, which is an odd number of coins that your opponent might see. It gets more complicated for Vi and i plus 2. We're going to have to now start thinking in more general terms as to what the different moves are. But we've got the base cases here. All I did here was take care of the base case or a couple of base cases associated with a single coin, which is what your opponent will see and pick, or two coins, which is your last move. So with DP, of course, you always have to go down to your base case, and that's when things become easy.

So we have to talk about two things and put up our recurrence together. The two things we have to talk about are what you do when you move, and that's actually fairly easy, and the second thing is what the model of the opponent looks like when you're waiting for him or her to move. So let's take a look at your move. And I've got V1. Let's look at Vi. Vj here, dot dot dot, Vn. So that's what you see. And you're looking at i and j. And at this point, you're seeing all those other coins, the outer coins have disappeared into people's pockets. And you're looking at this interval.

So I'm going to write out what Vij should be. And keep in mind that we want to maximize the amount of money. And we want to say that you should be able to definitely win this amount of money, regardless of what the opponent does. So I want to do a max, obviously. And I have two choices here. I can pick Vi or I can pick Vj. It's not like there's a lot of choices here.

So if I pick Vi, then-- let me go ahead and say I pick Vi here. And here, I pick Vj. Let me draw this out a little bit better. So I've got to fill in these two arguments to my max.

So this is easy. I'm going to have a plus Vi here. Going to have a plus Vj here because I picked the appropriate value. And now, this is also not that difficult. What exactly happens? What can I put in here if I pick Vi?

AUDIENCE: Vi plus 1.

PROFESSOR: Yeah. Vi plus 1 to j. So the range becomes i plus 1 j. I have to be a little careful here in terms of whether I can argue that it's actually the V that I put in here. So the subtlety here is simply that the Vi plus 1 j is not something that I see in front of me. This is the complication. Vi plus 1 j is never a board that I see in front of me, whereas Vij was a board that I saw in front of me. So I have to model the boards that I see in front of me, because those are the boards that I have control over that I can maximize. I cannot would be Vi plus 1 j in there, simply because I don't quite know what that is, because of what I get eventually is not something I control. It's going to be the board after my opponent has moved.

So all I'm going to do is I'm going to say the range becomes-- range is i plus 1 j. And I'm going to say something else in a second here. The range is i j minus 1. And in both of these cases, the opponent moves.

So in order to actually write out my DP, I'm going to have to now look at the worst case situation in terms of the board I get back, because the only times I'm adding values is when I see a board in front of me and I pick a coin. And I'm going to have to say now the opponent is going to see an i plus 1 j-- or an i j minus 1, excuse me, and might do something. And I'm going to get something back. I'm going to assume that the opponent is just as smart as I am, knows DP, taken 6046, et cetera, et cetera. And I still want to get the maximum value that I can definitely win.

And so we need to look inside of that a little bit. And it's not that hard if you just make the assumption that I just did, which is the opponent is going to do-- I mean might not necessarily do the right thing. But you have to assume that the opponent knows just as much as you do and is going to try and do as well as possible.

So let's do that. That's the last thing that we have to do here. And it's just one more equation. So here's the solution. We now have Vi plus 1 j subproblem with the opponent picking. And the simple observation is that we are guaranteed the min of Vi plus 1 j minus 1 or Vi plus 2 j.

In this case, the opponent picks Vj, and in this case, the opponent picks Vi plus 1. And you're guaranteed the minimum of these two, because the opponent can only pick one coin. So that's the simple observation that lets you jump ahead to your next move, which is really what the DP is interested in, because that's the only time that you're actually executing something in terms of picking coins from the board and adding to your value. But we did have to model that as to what the maximum value was that you could win, jumping ahead of this move. And you have the min in here because you're assuming that this is a definite guarantee. That's what we want. It's possible that the opponent plays a different game from the one that we think he or she is going to play, which maximizes his or value. But the min is guaranteed.

So now that you've made this observation, it turns out we just have to take that, and we'll be done. So let's see. Let me erase this. It's the last set of equations. And it's just plugging in that observation into what I wrote before.

So we have Vi j equals max. This is the outer max that I had up there already, so that's the same max. And put these giant brackets here. And inside, I'm going to plug in this min, which is something that corresponds to the value that I would win in the worst case after the opponent plays the best possible move. And that would be Vi plus 1 j minus 1, V1 plus 2 j plus Vi. This Vi is the same as this one up here.

And you've got a plus Vj here. And I didn't actually do this, but in terms of writing out what the opponent would do in other subproblem case, but it's really pretty straightforward. You have a problem that corresponds to the i j minus 1 problem. And the opponent could pick the i-th or could pick the j minus 1th. If the opponent picks the j minus 1th, you get Vi j minus 2, and you need to take the min of that. In the other case, where the opponent picks i, in which case you get i plus 1 j minus 1.

And that's our DP. That's our DP. So the big difference here was the modeling that you had to do in the middle. We didn't actually have to do that in any of DPs we've covered in 046 at least up until this point.

Before we talk about complexity, that should just take a minute, people buy that? See that? Good.

So with all of these problems, the complexities are fairly straightforward to compute. The complexity here is simply, again, as before, the number of subproblems times the time it takes to solve the subproblem. You can see here are that these are all constant time operations.

So assuming that the Vij's have been computed, it's theta 1 time to compute a subproblem. So that's your theta 1. And as before, there are n squared subproblems. Sorry. Let's just do number of subproblems times theta 1. It's just theta n squared. So that was good.

All right. So good luck for the quiz. And don't worry too much about it. See you guys next week.

Free Downloads

Video


Caption

  • English-US (SRT)