Recitation 5: Dynamic Programming: More Examples

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: In this recitation, problems related to dynamic programming are discussed.

Instructors: Ling Ren

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation, or view additional materials from 100 of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

LING REN: Everyone, today we're going to look at dynamic programming again. So I think I have mentioned several times, so you should all know it by heart now, the dynamic programming, its main idea is divide the problem into subproblems and reuse the results of the problems you already solved. Right? And, of course, in 6.046 we always care about the runtime. So those are the two big themes for dynamic programming.

Now, let's start with a warm-up example. It's extremely simple. Let's say we have a grid, and there's a robot from, say, coordinate 1,1 and it wants to go to coordinate m,n. So at every step, it can only either take a step up, or take a step on the right. So how many distinct paths are there for the robot to take? Is the question clear? So we have a robot at coordinate 1,1. It wants to go to coordinate m,n. And every step, it can either take a step up, or take a step to the right. How many distinct path are there that can take the robot to its destination? Any ideas how to solve that? Go ahead.

AUDIENCE: So, we define subproblems as the number of distinct paths from some point x,y to m,n. Then the number of distinct paths from some point is the number of paths if you go up if you're allowed to go up, plus the number of paths if you go right if you're allowed to go right. So if you were on the edge, [INAUDIBLE].

LING REN: Yup, yup. Does everyone got that? So, it's very simple. So, I know I have only one way to get to these points. I need to go all the way right. And only one way to get to these points. I need to go all the way up. So for all the intermediate nodes, my number of choices are-- is this board moving? Are just the number of distinct paths I can come from my left, plus the number of distinct path I can come from bottom. And then I can go in. For every node, I'll just take a sum between the two numbers on my left and on my bottom. And go from there. OK. Is that clear?

So this example is very simple, but it does illustrate the point of dynamic programming very well. You solve subproblems, and ask how many distinct path can I come here, and you reuse the results of, for example, this subproblem because you are using it to compute this number and that number. If you don't do that, if you don't memorize and reuse the results, then your runtime will be worse. So what's the runtime of that? Speak up.

AUDIENCE: [INAUDIBLE]

LING REN: It's just m times n. Why? Because I have this many unique sub problems. One at each point, and I'm just taking the sum of two numbers at each subproblem, so it takes me constant time to merge the results from my subproblems to get my problem. So to analyze runtime, usually we ask the question how many unique problems do I have. And what's the amount of merge work I have to do at every step? That's the toy example.

Now let's look at some more complicated examples. Our first one is called make change. As its name suggests, we have a bunch of coins. s1, s2, all the way to, say, sm. So each coin has some values, like 1 cent, 5 cent, 10 cent. We're going to make change for a total of n cents, and ask what's the minimum number of coins do I need to make change of n cents.

So to guarantee that we can always make this change, we'll set s1 to be 1. Otherwise, there's a chance that the problem is unsolvable. Any ideas? Is the problem clear?

STUDENT: How do you find s1 again? Or si?

LING REN: What, these numbers? They are inputs. They are also inputs. It could be 1 cent, 5 cent, 10 cent. Or 3 cent, 7 cent. Though the smallest one is always 1. OK. I need to find a combination of them. For each of them, I have an infinite number of them. So I can find two of these, three of that, five of that, such that their sum is n. Is the problem clear? OK. Any ideas how to solve that?

So let's just use a naive or very straightforward algorithms. Go ahead.

AUDIENCE: You pick one, and then you do mc of n minus that.

LING REN: OK, great. Yeah, let's just do exhaustive search. Let's pick si. If I pick this coin, then my subproblem becomes n minus the coin value. And of course, I use the one coin. That's si. So then I think the min of this for all the i's, and that's the solution. So far so good?

OK. So what's the runtime of this algorithm? If it's not immediately obvious, then we ask how many unique subproblems are there. And how much work do I have to do to go from my subproblems to my original problem? So how many subproblem are there? So to be clear, for this one, we have to call this recursive call again. n minus si, probably minus sj. And if you cannot compute how many subproblems are there, let's just give a bound. Any ideas? John, right?

AUDIENCE: I'm not sure there would be more than n subproblems, because the smallest amount we can subtract from the original is 1. And if we keep subtracting 1 repeatedly, we get n subproblems, and that will cover everything-- that subproblem.

LING REN: Yeah, correct. So this may not be a very tight bound, but we know we cannot have more than this number of subproblems. Actually, I don't need to even put the order there. I know we can have no more than n subproblems. They're just make change of n, n minus 1, n minus 2, all the way to make change 1. And actually, this bound is pretty tight, because we set our smallest coin is 1, so we won't make a recursive call to make change n minus 1, right? If I pick the 1 coin, the 1 cent coin first. And then from there, I will pick a 1 cent coin again. That gives me a subproblem with n minus 2. So indeed, I will encounter all the n subproblems.

OK, so having realized that, how much work do I have to do to go from here to there?

AUDIENCE: [INAUDIBLE]

LING REN: Correct. Because I'm taking the min of how many terms? m terms. So that's our runtime. Any questions so far? If not, let me take a digression. So, make change, this problem. If you think about it, it's very similar to knapsack. Has anyone not heard of this problem? Knapsack means you have a bunch of items. You want to pack these into a bag, and the bag has a certain size. So each item has a certain value, and you want to pack the items that have the largest combined value into your bag. So, why are they similar?

So in some sense, n is our size. We want to pick a bunch of coins to make the size n. And each coin here actually has a negative value, because we want to pick the min of it. If you do that, then this problem is exactly knapsack. And knapsack is NP-complete. That means we don't know a polynomial solution to it yet. However, we just found one. Our input is, m stuff and n. Our solution is polynomial to m, and polynomial to n. If this is true, then I have found the polynomial solution to one NP problem. So P equals NP. SO we should all be getting Turing award for that. So clearly something's wrong. But there's no problem with this solution. This covers all the cases. And our analysis is definitely correct.

So does anyone get what I'm asking? So what's the contradiction here? I will probably discuss this later, in later lectures when we get to complexity or reduction. But to give a short answer, the problem is that when we say the input is n, its size is not n. So I only need log n this to represent this input. Make sense? Therefore, for log n length input, my runtime is n. That means my runtime is exponential. It's not polynomial. OK. Now that's the end of the digression.

Now let's look at another example. This one is called rectangular blocks. So in this problem, we have a bunch of blocks. Say 1, 2, all the way to n. And each of them has a length, width, and height. So it's a three-dimensional block. So I want to put blocks, stack them on top of each other to get the maximum height. But in order for j to be put on top of i, I require the length of j to be smaller then the length of i, and the width of j is also smaller with width of i.

So visually I just meant this is a block. I can put another block on there. They are smaller in width and length. But I cannot put this guy on top of it because one of its dimension is larger than the underlying block. And to make things simple, that's not allowed, rotating. So OK, I can rotate. It still doesn't fit. But you see the complication. So you allow rotate, then there's more possibility. Length and width are so one of them is north-south, the other is east-west, and you cannot change that. OK. Is the problem clear? You want to stack one on top of each other to get the maximum height.

Any ideas? Again, let's start from simple algorithm. Say, let's just try everything out. OK, go ahead.

AUDIENCE: If you try everything else, you have n factorial.

LING REN: Pardon?

AUDIENCE: It would be O of n factorial?

LING REN: You're going too fast. Let's write the algorithm first. So I want to solve my rectangle block problem, say from 1 to n. What are my subproblems?

AUDIENCE: Choose one block.

LING REN: OK. Let's choose one block.

AUDIENCE: And then you run RB of everything except that block.

LING REN: So I get its height, and then I have a subproblem. What is the subproblem? And then I'll take a max. So the difficulty here is this subproblem. So Andrew, right? So Andrew said it's just everything except i. Is that the case? Go ahead.

AUDIENCE: It's everything except i, and anything with wider or longer than i.

LING REN: Do you get that? Not only do we have to exclude i, we also have to exclude everything longer or wider than i. So that's actually a messy problem. So let me define this subproblem to be a compatible set of w i. And let me define that to be the set of blocks where the length is smaller than the required length, and their which is also smaller than the required width. So this should remind you of the weighted interval scheduling problem, where we define a compatible set once we have chosen some block. Question?

AUDIENCE: What are we trying to do here? Are we trying to minimize h?

LING REN: Maximize h. We want to get as high as possible. I choose a block, I get its height, and then I find out the competitive remaining blocks, and I want to stack them on top of it. Everyone agrees this solution is correct? OK, then let's analyze its runtime. So how do we analyze runtime? So what's the first question I always ask?

AUDIENCE: How many subproblems?

LING REN: Yeah. I'm not sure who said that, but how many subproblems do we have?

AUDIENCE: At most n?

LING REN: At most n. Can you explain why is that the case? Or it's just a guess?

AUDIENCE: Because if n is compatible-- nothing in the compatible-- n will not be in the compatible set of anything that is in the compatible set of n.

LING REN: OK, that's very tricky. I didn't get that. Can you say that again?

AUDIENCE: Because for example, if you start with n, then everything that's in the compatible set of n. n won't be in the compatible set of that.

LING REN: OK. I think I got what you said. So, if we think there are only n subproblems, what are they? They have to be compatible sets l1, w1, then l2, w2. These are the n unique subproblems you are thinking about. Is there any chance that I will get a compatible set like something like l3 but w5? If I ever have this subproblem then, well, my number of subproblems are kind of exploding.

Yeah, I see many of you are saying no. Why not? Because if we have a subproblem, say, compatible set of l i and w i, and if we go from here, and choose the next block, say t, it's guaranteed that t is shorter and narrower. That means our new subproblem, or new compatible set becomes-- our new subproblem needs to be compatible with t instead of i. So, the only subproblems I can get are these ones. I cannot have one of these. The number of subproblems are n. And how much work do I have to do at each level?

AUDIENCE: n.

LING REN: n, because I'm just taking the max, and there are n potential choices inside my max. So runtime n squared. OK, we're not fully done, because there is an extra step when we're trying to do this. We have to figure out what each of these are. Because once I go into this subproblem, I need to take a max on all the blocks that's in this set. I have to know what blocks are in that set. Is that hard? So how would you do that?

AUDIENCE: You just check for all of them, and that's O of n.

LING REN: OK. So, I check all of them. That's O of n. I'm pretty sure you just meant scanning, scan the entire thing, and pick out the compatible ones. But that's for this subproblem. We have to do it for every one. Or there may be a better way. So I think the previous TA is telling me there's a better way to do that. So in order to find the entire compatible stuff, he claims he can do it in n log n, but I haven't checked that, so I'm not sure. This is a folklore legend here. Yeah, we'll double check that offline. But assuming if I don't have this, then figure out all these subproblems will also take n squared. Then my total runtime is n squared plus n squared, and still n squared. Question?

AUDIENCE: Is the n log n solution giving us sorting this by [INAUDIBLE]?

LING REN: Yeah, I think it should be something along those lines, but yeah, I haven't figured out whether you sort by length or by width. You can only sort by one of them. So after sorting, say let's sort by length. Then after sorting, I may get something like this. And if I'm asking what's the compatible set of width this guy, I still have to kick all of them out. Yeah, so it's not entirely clear to me how to do it, but I think you can potentially consider having another, say, binary search tree that's sorted by width, and you can go in and just delete everything larger than a certain width. So that's the, yeah. OK, go ahead.

AUDIENCE: Can you convert into a directed graph, where each pair of shapes that's compatible, you do an edge. And then path find.

LING REN: OK. OK. But constructing that graph already takes O n squared, correct? Yeah, OK, let's move on. I don't have time to figure this out. So, this problem is remotely similar to interval scheduling, weighted interval scheduling, in a sense that it has some compatible set. And in the very first lecture and recitation, we have two algorithm for weighted interval scheduling, and one of them is better than the other. And this one looks like the naive algorithm. So, does anyone remember what the better algorithm is for weighted interval scheduling?

But instead of checking every one as my potential lowest one, it really doesn't make sense to do that. Because for the very small ones, I shouldn't put them as my bottom one. I should try the larger ones first as the very bottom one. Go ahead. Oh, you're not--

AUDIENCE: You could create a sorted list of length n with the width. So you know that items that are later in the list, they're not going to be in the first level of the tower.

LING REN: Yeah, correct. So, just in the same line of thought as weighted interval scheduling, let's first sort them. But then, it's a little tricky because do I sort by length or width? So I'm not sure yet, so let's just sort by length and then width. So this means if they have the same length, then I'll sort them by width. So I can create a sorted list. Let me just assume that it's in-place sort, and now I have the sorted list. So once I have that, the potential solutions I should consider is that whether or not I put my first block as the bottom one. It doesn't make sense for me to put a later one down. So my original problem becomes taking the max, and whether or not I choose block one. If I do, then I get its weight-- height, sorry. And my subproblem is the ones compatible with it.

If I do not choose it, then my sub problem is like what Andrew first said, from 2 all the way to n. So why is this correct? So I claim this covers all the cases. Either h1 is chosen as the first bottom one, or it's not. It's not chosen at all. It's impossible for h1 to be somewhere in the middle, because it has the longest, largest length. OK. So how many subproblems do I have? Go ahead. Still n. So there are all of these compatible set of l1 w1, l2 w2. But it looks like I do have some new subproblems.

These do not exist before. However, there are only n of them. They're just a suffix of the entire set. So I still have O of n subproblems. And at each step, I'm doing constant amount of work. There are just two items. So we found an order n solution. Are we done? Is it really order n? OK, no.

AUDIENCE: You still have to find the c.

LING REN: Yeah. I still have to find all these c's. And first, I actually have a sort step. That sort step is n log n. Yeah, then again, well, if we do it naively, then it's again n squared, because I have to find this compatible set, each of them. But if there's an n log n solution to find these compatible sets, then my final runtime is n log n. Make sense? Any questions so far?

OK. So now we actually have a choice. So we can either go through another DP example, I do have another one. But Nancy, one of the lecturers suggested, that it seems that many people have some trouble understanding yesterday's lecture on universal hashing and perfect hashing. So we can also consider going through that. Well, of course, the third option is to just call it a day. So, let me just take a poll. How many people before we go over the hash stuff? How many people prefer another DP example? OK. Sorry guys. How many people just want to leave? It's fine. OK. Great. That's it.

OK. So, so much for DP. We do have another example. We will release it in recitation notes. For those of you who are interested, you can take a look. So, well, sure you all know that we haven't go into DP in the main lectures yet. So this is really just a warm up to prepare you to go to the more advanced DP concepts. And also, DP will be covered in quiz 1. But the difficulty will be strictly easier than the examples we covered here. OK?

Now let's review universal and perfect hashing. So it's not like I have a better way to teach it. Our advantage here is that we have fewer people, so you can ask questions you have. So let me start with the motivating example. So why do we care about hash? It's because we want to create a hash table of, say, n. It has n bins. And we will receive input, say, k0, k1, all the way to k n minus 1. n keys. And we'll create a hash function to each of them to map them to one of the bins. That the hope is that if n is theta m, or in the other way, m is theta n, then each bin should contain a constant number of keys.

So to complete the picture, all the keys are drawn from a universe that has size u. And this u is usually pretty large. Let's say it's larger than m squared. It's larger than the square of my hash table size. But let me first start with a negative result. So if my hash function is deterministic, then there always exists a series of input that all map to the same thing. We call that worst case. We don't like the worst case. Why? Because in that case, the hash is not doing anything. We still have all of the items in the same list.

Why is that lemma true? Because by a very simple pigeonhole argument, so imagine I insert all of the keys in the universe into my hash table. I would never do that in practice. It's just a thought experiment. So by a simple pigeonhole argument, if u is greater than m squared, then at least some bin will contain more than m elements. Well, if it just so happens that my inputs are these m keys, then my hash will hash all of them to the same bin. Make sense? So this is the problem we're trying to solve. We don't want this worst case. And it does say that if h is deterministic, we cannot avoid that. There always exist a worst case. So what's the solution? Then the solution is to randomize h.

However, I can't really randomize h. If h take some key, if my hash function maps a key into a certain bin, well, the next time I call this hash function, it better give the same bin. Otherwise I cannot find that item. So h needs to be deterministic. So now our only choice is to pick a random h. Make sense? Every hash function is deterministic, but we will pick a random one from a family of hash functions. So in some sense, this is cheating. Why? Because all I'm saying is I will not choose a hash function beforehand. I will wait for the user to insert inputs. If I have too many collisions, I'll choose another one. If I have too many collisions. I'll choose another one.

OK. I think I forgot to mention one thing that's important. So you may ask why do I care? Why do I care about that worst case? What's the chance of it happening in practice? It's very low, but in algorithms, we really don't like making assumptions on inputs. Why? Because if you imagine you're running, say, a website, a web server, and you code has some has table in it. So if your competitor, or someone who hates you, wants to put you out of business, and if he knows your hash function, he can create a worst case input. That will make your website infinitely slow.

So what we are saying here is I don't tell him what hash function I'll use. I'll say I choose one. If he figures out the wrong input, the worst case input, I'm going to change my hash function and use another one. Make sense?

Now the definition of universal hash function is that if I pick a random h from my universal hash function family, the probability that any key i mapped to the same bin as any key j should be less or equal than 1 over m, where m is my hash table. This is really the best you can get. If the hash function is really evenly distributing things, you should get this property.

So we have seen one universal hash function in the class. I'll just go over the other example, which is ak plus b modulo p, and then modulo m. So p is a prime number that is greater than the universe size. We'll see why this is a universal hash function. So to do that, we just need to analyze the collision probability. So if I have two key, that k1 and k2 that map to the same bin, that means they must have this property. After taking the mod m, their difference should be a multiple of m. Because if this is true after taking the modulo m, they will map to the same bin. Make sense?

Now I can quickly write it as a times the difference of the key equals a multiple of m, mod p. Now, k1 and k2 are not equal, so they are nonzero. And in this group, based on some number theory, we have an inverse element for it. So, if this happens, we'll call it a bad a. How many bad a's do I have? One of a will make this equation holds with i equals 1. Another a make the equation holds with i equals 2. But how many such a's do I have? At most, because this equation can hold with m, 2m, 3m, all the way to p over m floored m. This is the total number of possible ways this equation can hold.

So how many bad a's do I have? I have p over m, over the total number of a's, which is p minus 1. Oh, yeah, I forgot to mention that. So a is from 1 to p minus one. OK. So I can always choose my p to be not a multiple of m. If I do that, this floor-- so, then p and p minus 1 do not cross the boundary of modulo m. Then this is true, and this is less than 1 over m. So this is a universal hash function family.

So what's the randomness here? The randomness is a. I'll pick an a to get one of my hash, and if it doesn't work, I pick another a.

AUDIENCE: What is b? What is b?

LING REN: p is a prime number I choose--

AUDIENCE: [INAUDIBLE]

LING REN: b?

AUDIENCE: Yeah.

LING REN: Oh, b. I think it's also a random number. Yeah, so, actually it's not needed, but I think there's some deep reason that they keep it in the hash function. I'm not sure why.

Now once we have that, once we have universal hash, people also want perfect hashing, which means I want absolutely 0 collision. So how do I do that? Let me first give a method 1. I'll just use any universal hash function, but I choose my m to be n squared. I claim this is a perfect hash function with certain probability. Why? Because I want to calculate probability no collision. Yeah, 1 minus probability I do have a collision. And I can use a union bound. That's the probability that any pair has a collision. Any pair of hx equals hy. How many pairs do I have?

AUDIENCE: N choose 2.

LING REN: Yeah. n choose 2, which is this number. So if it's a universal hash function, then any collision, any two colliding, the probability is 1 over m. So I choose my m to be n squared, so this one is larger than 1/2. So what I'm saying, to get a perfect hash function, I'll just use the simplest way. I select the universal hash function with m equals n squared. I have a probability more than 1/2 to succeed. Or if I don't succeed, I'll choose another one until I succeed. So this is a randomized algorithm, and we can make it a Monte Carlo algorithm or Las Vegas algorithm.

So I can either say if I choose alpha log n times, then what's the chance that none of my choice satisfies perfect hashing? My failure probability is less than this. My each chance I have a half success rate, and I try this many times, what's the chance of all of them failing? This is 1 over n raised to alpha. Of course, I can also say, I'll keep trying until I succeed. Then I have a 100 percent success rate, but my runtime could potentially go unbounded. Make sense? OK. This sounds like a perfect solution. The only problem is that the space complexity of this method is n squared, because I choose my m hash table size to be n squared. So this is the only thing we don't want in this simple method.

Our final goal, is to have a perfect hash function that has space O of n, and also runtime some polynomial in n, and failure probability arbitrarily small. And the idea there is this two-level hashing. So, I choose h1 first to hash my keys into bins. And for each bin, say I get l1 elements here, l2 elements here, so on and so forth. I'll choose each of the bins to be a second level perfect hashing. So we can use the method one to choose this small one. If I choose m1, which is the hash table size of this guy, to be l1 squared, then I know after alpha log n trial, this one should be a perfect hashing. After another alpha log n trial, I should resolve all the conflicts in l2 to make it a perfect hashing. Make sense?

So after n log n trials, I will resolve all the conflicts in my second level hashing. Question?

AUDIENCE: It was mentioned in the lecture that this only works if there are no inserts or deletes, or something like that?

LING REN: Let me think about that offline. I'm not sure about that. OK. So the only remaining problem is we need to figure out whether we achieve this space O of n. What is this space complexity of this algorithm? It's n plus l i squared, because each table size is the square of the elements in it. And finally, we have that Markov inequality or I think something like that, to prove this is the case with-- so my space is O of n, also with the probability of greater than 1/2. I can keep going. I'll try alpha log n times on my first level hash function, until my space is O of n. Once I get to that point, I'll try choosing universal hash functions for my smaller tables, until I succeed.

OK? That's it for hashing and DP.

Free Downloads

Video


Caption

  • English-US (SRT)