Flash and JavaScript are required for this feature.
Download the track from iTunes U or the Internet Archive.
6: Perceiving: Interpreting...
Related Resources
Handout (PDF)
The following content is provided by MIT open courseware under a creative commons license. Additional information about our license, and MIT open courseware, in general, is available at ocw.mit.edu.
PROFESSOR: Good afternoon. In response to public demand here-- let's see how many people do you-- again, the colors may have faded by now, but check it when you tilt your head. The effect changes. How many people still have some sort of an effect? It's really nice to know that this course has some lasting impact on people. Maybe we should try posting the Powerpoint on the web, because then if you want to build up your own McCullough effect you can do this and entertain yourselves.
If you have been following the story thus far, you should basically have the idea that at the front of the visual system, or of any sensory system, you've got all sorts of information coming in. That the job early processes in the visual systems do. The sorts of things that the early parts of visual cortex do and say, oh look, there's a little line at this point in the visual field. Oh look, there's a dot there, and a line here, and it's moving like that. Little bits of information all over the place. Too much of it for you to handle.
So, last time I talked about this bottleneck of attention that allows only some of it through to processes that would do things like, say, recognition. And, then, I'm going to run out of places to draw. We'll take a little detour here. Somewhere up here, you're going to get perception. And the job of today's lecture is to convince you that that percept, your current understanding of the world, is always the result of many, many inferences. Many, many guesses of what the nature of the world might be. Because, not only is there too much information coming in, there's also too little information coming to specify exactly what's going on out there in the world. Consider just a couple of the problems -- I think are they on the hand out? Couple of them are-- well, one of the obvious ones. It's the world is 3D. Your job is to figure out what's going on out in the world.
The input is inherently 2D. That retina that you've got is a 2D surface. So, if you are seeing 3D, you are recovering that 3D information from essentially 2D input. You're collecting information about light intensities. You don't care about light intensity. You care about surface properties in the world. So, everything patch that you're seeing -- if you're looking at this spot right here, what you're seeing is the product of the surface, the properties of the surface, and the properties of the illuminant, the properties of what's lighting it up. You don't care about the properties of the illuminate. You want to recover just the properties of the surface. So how do you successfully ignore the properties of illumination? We'll say a little bit about that.
The world you're looking is an essentially stable world. I mean things move around in it, but the whole world doesn't jump around. But, you're looking at it from an inherently unstable vantage point. You're moving around. And more to the point, even if you're salk still, the way you're looking at the world is you're moving your eyes around. Try this for a moment. Look at the lower left hand corner of the screen. Well, actually that might be a little large, so look at the lower left hand corner this McCollough test pattern. Look at the lower right hand corner of the McCollough test pattern. Did the McCollough effect test pattern jump when you did that? No. It didn't. Why didn't it jump? Because when you're looking here-- do I have a laser pointer today? No. Oh well -- when you're looking here, the bulk of that square is to the right of fixation. When you're looking here, the bulk that image is to the left the fixation. So it's in two different spots on the retina. If I put something here, and something here, on your retina, it'll look like it's moving. Why didn't that look like it's moving? The reason is, that when you tell your eyes to move, you send a copy of that command, in effect, to visual centers of your brain saying, look, I just told my eyes to move, kindly ignore the resulting smear. In fact, I want you to do two things. I want you-- you, your visual system-- I want you to shut down during the course of the eye movement, and I want you to compensate for the fact that everything is been displaced.
You can see what would happen if that was not the case, by taking your finger, and poking your eye. On your eyelid, you should try this, because it's more interesting if you actually try this, it if you wiggle your eyeball. You can do it slowly, or you can do it quickly. Look at me, and poke your eye ball around. You will notice that all your friends look funny. But, you'll notice that things are jumping around. Why is that? Well look. Millions of years of evolution did not provide you with a mechanism that said, I am now going to poke my eye. Please send a copy of that signal to the visual centers of the brain, saying to cancel that out. There's no cancellation signal here. And so, you see the image moving around. You don't see the image moving around when you move your eyes normally, because you're compensating for it.
All right, so you're collecting all this information. You're doing your best to register it in a stable kind of a way. Oh look at that. It also says I'm going to demonstrate the vestibular ocular reflex. Let me do that, just because you might as well get to use your fingers some more. You also want the world not to jump around too much when you're moving your head. One of the things you do very reflexively is, if you rotate you take your head one way, your eyes counter rotate the other. That's a very quick reflex. If you want to see how quick it is, try this. Hold your finger out in front of you. Look at your finger, and move your head back and forth, and just keep your eyes on the finger. No problem, right? You know you can do that. Now, at the same speed, move your fingers, and try to keep your eye on the finger. Can't do it. It doesn't work, because that tracking movement -- there are more neurons involved. It's a slower process.
The vestibular ocular reflect is a very short latency kind of a reflex designed to keep the input relatively stable. So, you take all that lovely input in, you got all these little bits information all over the place, and then you've gotta make your best guess about what it is that you're looking at. The reason that this is an interesting picture, it's in the book, by the way, so when you can't see here, you can go and study it in the book, until you can see the dalmatian dog that really is there. How many people can see it now? Oh, we got most of them. His head is right above my finger, front paws, back paws. He's on a road sloping from lower left to upper right. The point of a picture like this, is that it's slows down the process of inference enough, that you can sort of feel it happened. Normally, when I look out at you, for instance, my visual system kicks up one, and only one, interpretation of what I'm looking at, so rapidly, that I never notice all the work that's involved.
The purpose of this picture, and really the purpose of this lecture, is to show some of the work that's involved. So what you've got here is a bunch of isolated little black and white regions. In order to figure out what's going on here, you've got to decide who goes together. How are we going to decide who goes together? Well lets think of this in the context of a bunch of little line segments. You can, sort of, decide that some of these guys might go with some of the other ones but what I'm going to do, is rotate all of them by 90 degrees, if I recall. No. Maybe some other orientation. But anyway, now they are the same lines on the screen, but now some of them hang together, right? Got that sort of potato shape thing there. Why? What makes these guys go along with each other now in a way that they didn't before? Well, if you're a little chunk of brain, whose job it is to figure out where contours are out in the world, what you're getting from earlier in the visual system, is word that there's a little bit of contour here, a little bit of a contour here, a little bit of a contour here. I wonder if those should go together?
Now how do we decide that the little bits of contour might go together? Well, you might do something like this. If you've got a line here, and you're asking what's the best bet about if this is really a piece of a continuing contour? Where's this likely to go next? Well, it might make a hair pin turn and go off that way. Doesn't seem really likely. More likely, it's going to go off in something like the direction that it's pointing. And that turns out to be the potato shaped ones. This is a very rule governed behavior. If you've got a bunch a little line segments, as long as the deviation here isn't more than about, as I recall, 30 degrees or so, from co-linear, you're willing to string those together pretty happily. If it starts to be more than, that your unlikely to string them together. The beginning of an effort to tie little pieces of information together into larger structures.
All right, what do you see here? Two lines crossing each other. A reasonable interpretation. Though, it's not the only possible interpretation of this. I mean, it could be something like this. Two birds kissing each other, or something like that. Why do you see and, in fact, this is something like that. And you see that as an x with two lines crossing each other, but if I provide enough other details here, it's a sea lion? I don't know what it is. The point is, he's still a lousy artist. It hasn't gotten any better. The point is more or less the same. You've got this little process that's worrying about that little piece of line segment, gets to this junction, and is busy doing the three roads diversion of yellow wood kind of thing, trying to decide which way to go. And, it's guessing that all else being equal, I should probably go with that. It sometimes goes by the name of good continuation. It's one of a variety of so- called grouping rules that were developed first by the Gestalt psychologists starting in the early part of the twentieth century, and they're really rules for figuring out how bits of the scene might hang together.
You can see a similar sort of process going on here. You can see that as three isolated line segments, but you probably don't. You see that as a curvy lines that occluded, right? If I suddenly reveal this, you don't go, ooh. Amazing! Kind of what I thought was there. That's also highly rule governed. If you've got a line segment and you've got another line segment, you're perfectly happy to see this one and this one is connected. Well how about this. This will do. If I put one up here. All the more if I erase the stuff in between. That, you're less likely to see as connected. Now why are you less likely to see that as connected? The rule turns out to be, that if I can connect two lines with a smooth curve, I'm in business. I'll be willing to see those as connected. But, if I have to put an inflection in it to make it work, then it doesn't look as convincing. I mean it's not that I deny the possibility this could ever connect with that, but if I give people a bunch of stimuli like this, and ask good connection or a bad connection? The good connections are the ones that can be done can with a single smooth curve. That can't quite. I think you probably have to get an inflection in there somewhere. And if you have to inflict the curve, it fails. People report it as looking less convincingly continuous.
All right, it's voting times here. Here we've got a whole bunch isolated guys, but they do seem to have something to do with each other. If you had to pick, this being organized by columns or rows, how many vote for columns? How many vote for rows? Ain't much to chose, right? But, if I do this, OK, how many vote for columns? How many vote for rows? So, we've now skewed it very heavily in the direction of columns. And, all that I've done is change the proximity of elements. Once the distance from one item to the next item in a vertical direction is closer than to the items in the horizontal direction, it's another one just these gestalt grouping rules of proximity takes over, and says, well, all else being equal, if I had to guess who goes with who, the guys that are close to each other. They probably go with each other. Multiple rules operate at the same time, so I'll keep the proximity rule working here. Now, if you have to vote, how many vote for columns? How many votes for rows? So now, I've skewed it very heavily in the direction of rows, even though the proximity rule, is still going for columns. Those things are closer to each other in a vertical direction that horizontal. In this case, the similarity is trumping that. You can balance these off against each other. How similar to does it need to be to compensate for a two to one difference in distance for instance or something like that, but the important point here is, that what you're trying to do. These are, you know, demonstration versions of presumably what you're doing all the time.
I'm looking out there, and I'm seeing regions of redness. I see sort of disconnected regions of redness, but I'm guessing, they're all part of her top. It's your top there. It's you. No, not her. That's pink, you're wearing. The woman behind you. Well, the woman next to you is also red. But, in this case, you can see the role of proximity here. So, the similarity thing is telling me, all those red things are tied together, and I'm making it into sort of one piece of clothing. The proximity thing is saying, well I don't think her red top, and her red top are the same red top. That would be a very weird assumption. On the other hand, the guy she's sitting next to is also wearing red. So maybe they're just wearing one garment. No. I'm probably not going to come up with that assumption either. But what my visual system is doing, is continuously trying to cut the world up into meaningful chunks, that are going to be worth subsequent analysis. I don't want to go off and analyze every little pixel in this scene. I don't have the brain power to do that. I want to have meaningful chunks that are worth analyzing, so here I might decide to the meaningful chunks were the rows. They are a something or other. OK.
There's a couple of reasons why you need to do this grouping business over little elements in the world. One of them-- I was sort of cartooning over here-- which is that early in the visual system, the chunks of the brain that are looking at it bits of the world are only looking at very teeny bits, and you're going to have to tie those together. The other reason, is that out in the world, contours don't behave well. They tend to do awkward things like disappear on you. And you don't want to get the idea that these are-- if I've got a contour like that, if for some reason bits of it are deleted, maybe because there's an occluder, or maybe because something just bad happened in the image, you don't want to lose this whole structure, because bits of it have been deleted. So you have a lot clever mechanisms designed to help you find where the edges are of things out there in the scene. And, putting the bits of edges together into a long coherent one. If you play with Photoshop, you can go and find the edges I think. Isn't there like a find edges filter, or something? So, try that sometime. Do find edges on an image of, say, a person, and you'll see that it comes up with a lot of edges that you recognize as being related to this person, but its fragmented all over the place. And, in fact, getting your computer to figure out which bits go with each other is a tricky piece of work. It's tricky for you too. You just don't know that it's tricky, because, it works all the time. All right. So, this beautifully boring stimulus is there, because you see this beautiful vertical contour, right?
Now, all I'm going to do, I'm going to leave it there, but I'm going to put a new background on it. Isn't that lovely? Why is that interesting? Well, among the reasons it is interesting, is because it's the same gray stuff that was there before. So, the top of that bar, and the bottom of that bar are still identical. All four of those little rectangles are all identical. Even though they no longer look identical. Because what the system is doing, is-- actually can I do this? I forget what I programmed. Oh, there we go. little key's going to turn into that one. Isn't that fun? Get back there. There we go. -- but my real point -- so what that is, is a simultaneous contrast affect, by the way. This square looks bright, because it's surrounded by darker stuff. That square looks dark, because it's surrounded by brighter stuff. And even though this bar is continuous with it's gray level from bottom to top, it picks up it's apparent brightness from the immediately surrounding contours. The interesting aspect of this from the point of view of understanding were edges are, is that if the bar is brighter than the background here, and darker than the background up there, there must be a place in the middle where it's gone. Where there is no contour. But you don't see that. I was too busy making this little thing move, and I forgot to do that. Oh well. It's not a small region. There's a fairly sizable region of that middle there where there is no physical contour. But you fill it in. You know, in some fashion, that contour is there. That's called a subjective contour, where you're completing a contour that doesn't have any real support in the image. That's the piece that your Photoshop filter will have a hard time doing, by the way.
Oh, we can also do a second order affect here, that's kind of -- well, no, we'll do that later. Oh. Come on. Go away. You've moved often enough. OK.
Let's continue the same point here. All right, everybody sees this rectangle, right? And what are those black things? Three quarter circles. Yes. The literalists figured out that these are Pacmen or three quarter circles, or something like that, but you didn't really see that when it came up. You said, oh that's a rectangle sitting on top of four circles of some variety, and, in fact, you still see it as a rectangle sitting on top of four circles. In fact, you're probably reasonably convinced that you can see the contour. That's weird. I'm reasonably convinced that there's an interesting artifact on the screen that's creating contours. I don't know what that's about. It looks better over there. It looks less bogus over there. You can probably see the whole rectangle. The white on white borders are not there. There's simply is no physical contour there. There may be here, because the project's doing something mutant. But, there's certainly no contour there, even though the rectangle at the center looks somewhat brighter. Again, what you're doing is, making a guess about what is it that actually created that image that's landing on my retina now? It could be a little conference of pac-people. Four little three quarter circles that got together to talk to each other. But that doesn't seem the most likely possibility. What seems more likely here, is that it's a white rectangle sitting on top of four black circles. And you end up completing that contour. Or this circle for that matter. What are you doing here? You don't need to have fancy computer graphics to do this.
One of the advantages of the material in this particular lecture is that it provides great material for doodling in other lectures. So, if you like subjective contours, you can make your own. And, so how's that look? Got a subjective contour there? It's not perfectly circular. That's OK. But what seems to happen is that you generate a hypothesis that says, the whole problem I got here, is contours don't just end in the world, they tend to continue. Well, if this guy's continuing, well, what happened here? Well. Maybe it ran into another edge, it's being hidden. All else being equal, it probably ran into an edge that's orthogonal to the direction it's going, let's guess that. And, if we guess a whole bunch of little orthogonal edges, we're back to that earlier demonstration with a bunch of little line segments. I can tie those little bits together. They make a kind of a circle thing. So I end up seeing that imaginary circle. And, in fact, I'm going to start filling in the contour all the way around. That suggests, by the way, that if I was to tilt all these lines a little bit off of straight radial, so that the virtual line segment was not forming a nice, neat circle, that the impression of a subjective circle would get weaker.
And you can decide whether or not that's true here. So, see this circle? Now, the question is, we'll give you three choices here. Stronger, weaker or about the same? How vote that this one is stronger than the previous one? How many vote for just about the same? How many vote for weaker? Just about the same. That's a boring demo. I'll have to change that next year. Because, it didn't work well enough. OK.
Another example. If your following along on the notes, we have now gotten to the so-called Craik-O'Brien-Cornsweet illusion, named after Craik, O'Brien and Cornsweet. suites. We're continuing the edge business, but, now, what I want to do is tie that into that topic that I mentioned early in the lecture, about how it is that, what you're interested in is things the surface properties in the world, and you're not interested in lighting. Lighting is very boring. So, what do you see here? Some bold soul described this complicated image, boy, slow group, gray thing. Two gray things. One is darker. Oh boy, I'm going to bring my pliers next time. This is like pulling teeth. Yes.
AUDIENCE: It's like the top of the pyramid, and there's shading on the other side.
PROFESSOR: Oh. Yeah. Ok, so the light's coming from the left, or something. OK. Yeah. That's a complicated inference about these. OK. But what you don't particularly see, is this. If I take the edge out, if I take that edge away from the middle here, what you discover is the whole thing is the same gray. That's not obvious. If you were to draw the luminance profile, drag a photo detector across this thing, what you would get is something like. Which side is bright? OK. That side bright. So it rises, then drops across the edge, and then rises back like that. So, that if you take out the actual edge, it's equal on the two sides. So, that's kind of weird. Why does it look like, now, they didn't believe me. They thought I was doing something evil here. Here we go. Look. That's why I had this thing slowly sneak up, so you could, well, semi slowly, fast-ly sneak up. What's going on here is that the visual system knows something about edges and about lighting. Edges in the world tend to be fairly abrupt. Lightning changes, so it's bright here, and dimmer over here, lighting changes tend to be fairly gradual. So, if what I'm interested in is seeing what's on the surface, as opposed to seeing the product of boring light and shade variations. What I might want to do is to look for abrupt changes, and to, in effect, suppress gradual changes. And that's what's going on here. We might as well do an entertaining second order effect here. Remember that negative after image thing? You know, look at red, you see green, and stuff like that? What I should be able to do here, is produce a negative version of this affect, where you'll end up seeing this side is dark and this side is light, even though I'm not going to change anything over here, over here. Stare at that center line, right. Stare rigorously at the center line, and keep staring. And then when I do that, so, this is two illusions concentrates on top of that. Do it again, was that? All right. For the people incapable of following instructions the first time, try it again. So stare at the center and hold your fixation there. Actually, people who got it the first time, keep staring at the center, but the people who got it the first time, can try something tricky-er which is move your fixation, a little bit, and you'll change the proportion that looks light or dark. Isn't that fun? You should understand why that works. If you don't, write yourself a little note on your paper saying I don't understand why that works and figure it out.
All right. So the important thing is that what you're trying to do here, is you're trying to get rid of information about the light. You don't care about the light levels. What you care about is what's going on in the world. Now Ted Adelson in the brain and cogs department here has exploited this fact brilliantly in a variety of gorgeous demos. One of which is this. It is extremely difficult, even if you've seen this before, to convince yourself that the gray levels of A and B are identical. Which they are. In fact, it's so difficult that even if I stick a bar across it that's clearly the same thing, it's still kind of hard to see that as identical. Your brain wants to do all sorts of things to deny that possibility. What's going on here? What's going on is that virtual cylinder is casting a virtual shadow, that you are busy discounting. You're saying, I don't care about the shadow. There's a checker board. I know about checker boards. Checker boards to go dark square, light square. I can figure this out. That means B is a light square, because it's surrounded by dark squares, and A is dark square because it's surrounded by light squares. And, it's a particularly lovely example of, among other things, this ability to get rid of information about the illumination. This isn't to say that what you do is somehow just run some sort of, throw away the light source information, and don't do anything with it. Get back there.
What's it say? Cow. There's only white and black on that screen. Look at the C and ask where that contour is coming from. That contour, particularly the outside of that C, has extremely little support in the image. What you are doing is making an inference. This time, you're using the shadow information. You don't want to see the shadow. How many people see it not only as cow, also as an embossed word cow, sticking out a little bit? The reason you're seeing the cow at all, is you're assuming that those black things are shadows. If those black things are shadows, it follows the light is coming from the upper left. If the light is coming from the upper left, well, we can go and figure out what shape object must have been producing those shadows, and the answer spells cow. But, if you were to take a look at the C, is about the clearest. Well, I don't know, the O is pretty good too, and so the W, for that matter. Anyway, looking any of those guys, and look at the shapes of the black bits, which is the only thing that stands out from the background, of course. There's nothing else there, except the black bits on a white background. None of those bits say C or O or W. It's a construction based on the assumption that the black bits are shadows. You use that sort of information all the time to do things like see faces.
These are so called Mooney faces, named after a guy named Mooney. You can tell me a lot about the curvature of these faces, even though, again, there's nothing on the screen except for black regions and white regions. None of which are, themselves, particularly faced shape. I mean look at the eye. Are any of those eyes actually shaped -- look at the guy on the right-- I mean his eye, he's only got one apparently, that eye looks like, I don't know, a mutant bunny or something. If I just presented the eye piece on the on the guy on the right, that black glob that's defining his eye, if I just presented that in isolation, nobody would say, oh yeah, sure. That's an eye. You'd all be saying dalmatian dog before, mutant bunny this time, or something. And, it relies on these assumptions about shadow, is what is what you're doing here. It survives inversion reasonably well. But, those don't look like faces very much. What went wrong? You might think, I know that shadows aren't red and blue. But that's actually not the problem. Here they look pretty good, right. Those faces look ok, but these faces look lousy. Why do they look lousy? Somebody raise a hand, or something, yeah there's a theory.
AUDIENCE: The shadows are lighte than the [INAUDIBLE]
PROFESSOR: Yes, if you make the shadow regions lighter than the lit regions, the brain says, that's not a shadow. I don't care about shadows. I don't want to see shadows as entities in their own right particularly, most of the time. But I'll tell you one thing I know about shadows. Shadows are darker than the other stuff. If the shadows are lighter than the other stuff, it's not shadow. It's something else, and something weird. So this doesn't work. But that works. Oh, I suppose the fact that it said on this one, shadow must be darker, might have tipped some people off.
OK. Let's see here. I think what I will do -- this makes it very natural break point -- where it says Mooney face and we'll go on to this question about making the best guess you can make in the context of going from 2D to 3D. D But before we go on to that, let's us lets take our brief, stretch your limbs kind of break, here. OK. Let us gather back together here. By the way, it looks like it's getting to be that time in the term, where people are abusing their natural sleep mechanism that I will talk about later on in the term. But looking around at this crowd, I would say that you're not getting the seven to eight you need. Or, if you are, maybe you really need ten and you're catching the extra to 2 here.
I've been talking about sort of little, almost like atomic small scale examples, of these sort of inferences that you make. And, now what I want to do, is sort of head for the larger picture of how you make an inference about the whole scene. I'm not going to get all the way there, and there are many realm I could talk about this in. So I'm going to talk about it in one restricted area, which is this question of going from a 2D image to 3D inferences about the world. Something that you do automatically, all the time. I want to explain a bit about how you do it.
You will see, on your hand out, this is very useless, blank region, that says one two, three, four, five. I'm going to go through a series of depth cues. Probably more than five of them. And, that's what's supposed to go in there. They're all lots of different sources of information that you use to go from the 2D world to the 3D world, many of them seen here in this lovely piece of renaissance art that we will come up back to. But, here's a much more boring piece of art. What do you see? Circle, square and a diamond. And, then there's the clever person, who's trying to figure out, I've describe them as pac-man before, but these aren't. But, you see a circle, square, and a diamond. You don't have any serious difficulty inferring that -- triangle, sorry. You don't see this. For present purposes, the important point here is, you can also tell me their depth order. It's in some sense so obvious that you never think about it, but it's a very important source of information about depth order, that you get simply because you know that solid object occlude each other. So you firmly believe that I am standing in front the screen. I could had, well no, I couldn't have, it'd be theoretically possible that I had suddenly cut a cunningly wolf shaped hole in the screen and I'm A, very large, and B, standing over towards east campus somewhere. And, you're looking at me through this set of holes. No. You automatically leap to the assumption that if A looks like it's occluding B, A in front of B.
Ah, my bunnies. I don't know what happened to them. They got kind of pixellated. But, all right, which bunnies are closer to you? The big bunnies are closer to you. Right. Why do you think the big bunnies are closer to you? Because, you're making an inference that bunnies are roughly bunny sized. And, in the same way, I'm currently making the assumption that you guys are all more or less people sized. If I did not make that assumption, I would come to some very odd inferences about the current view that I'm looking at. So, the people in the front row -- there's a person in the front row -- her head is about two degrees of visual angle. Remember 360 degrees around my head each degree it's about my thumb, so her head takes up about two degrees. And, let's see, there's this guy in the cheap seats back there, his head is only about half a degree. I could make the assumption that he's a pinhead. A guy with a real small had sitting out the deep back out there. Well actually, I wouldn't make the assumption that he was sitting in the back. He's a pin head sitting at the same distance as large head woman here in the front. But that's dumb. Right? Your visual system knows that's dumb. Your visual system knows people are roughly people's sized. Not exactly people sized, but roughly people sized, and if I see a bunch of small things there and a bunch of big things here, odds are that this is closer than that. And that's part of what's giving me my current inference that I'm looking at a tilted plane of people in purple seats is this information about size. If I organize the bunnies the way you guys are organized, I got a much clearer sensation of depth from this texture radiant. So, now you should be able to see sort of tilted rabbit plane, right? Even though you know objectively, it's just sitting flat on the screen, it looks tilted.
Is that a hand up there? That was a hand. Sorry I missed that. My previous image. We can do that.
AUDIENCE: [INAUDIBLE]
PROFESSOR: Yeah. That's another possibility. It could would be that. And, in fact, it is, just a flat image. And you actually sound like you're getting a sort of a hybrid of a tilted plane with bunnies of a range of sizes. Actually, we can see that combination here. So, we got a whole bunch of big bunnies, and two little bunnies. Which is the smallest bunny in this display? The bottom right bunny, right? These guys are identical in size. It's a very minimal display. There's a much more vivid version of this illusion in the book as I recall. It's a very minimal version of the illusion. Because you assume, if you are -- All right. So I still can't draw. -- If I'm looking at two bunnies, if I'm looking at a ground plane, closer is also lower in the visual field. So, you make an automatic assumption that was what was giving this woman that notion of a fairly tilted plane in the first bunny example. You make the assumption that the bottom of the image is closer to you then the top of the image. Well, if the bottom of the image -- let's go a back here-- if the bottom of the image is closer then the top, and these two bunny images are the same size, if this guy's closer, it must be really small.
Suppose I took the guy from the back row here. His whole upper body fills sort of the top joint of my thumb in visual angle terms. If I moved that image to the front, and sat him in a seat here, he would be about the size of this woman's upper arms. And, I would think, that's a really small guy. She could wear him on her --instead of wearing your heart on your sleeve, you could wear the whole guy on your sleeve. So, that's what's going on here. You're making the assumption that small bunny one is closer than small bunny two. They're the same image size. You therefore infer that out in the world, this must be a really small bunny, and that one is just a reasonably small bunny. Now the bunny part turns out to be not that critical.
Here you can also see a nice plane going off into the distance with objects that are clearly not meaningful. Right. Big stuff, front and low. Small stuff high in the image and, you simply infer this is close, and stuff up there is far, and you get a nice impression of a tilted plane. This is one of the cues that is interestingly variable depending on where you're from. The atmosphere scatters light, particularly water in the atmosphere scatters light, that's why the sky is blue. The result is that objects that are far away tend to be both hazier and bluer. Something that you can see in all sorts of works of art. Go to the museum, you can see artists take advantage of this right, left and center. And, you can probably get even in my pathetically reduced version of it. You probably get a sensation of depth here that you don't get here. The geographic aspect of it, -- anybody here from Arizona? Doesn't work well in Arizona. I know this because I went to a meeting in Tucson and it was boring, so I went out for a walk, and I saw this hill, and I said to the guy who was at the street corner with me, how long would it take me to walk to that hill? And he said, three days, four days? It's like 50 miles away and it's like 6000 feet high. But it was extremely crisp. And, there's no water in Arizona. I don't know why people live there. It's like hot all the time. And there's no water. But anyway, aerial perspective cues don't work, so not only are you thirsty, put your short one depth cue. Anyway, this works much better in a humid setting that in a non humid setting, but the point is that again you know about-- this is your visual system making use of the physics of the situation in order to infer something about the depth of the situation. You also know about the geometry of the world. So, this is an extremely limited picture. If I say, this is a highway going off to infinity somewhere in Arizona, or something like that, that's not a great picture. But you can believe that. Because you know implicitly that parallel lines in the world, if they're in depth, will look like they are converging towards a vanishing point somewhere. Now it is sometimes claimed that this is known as linear perspective, that linear perspective was discovered by artists during the renaissance. That's only sort of true. What happened in the renaissance, was that they made this knowledge explicit, and became able to use it for instance to make their art works.
But, your cat and your lizard and stuff know about linear perspective. They just know it implicitly. The same way they knew about arial perspective and size clues and occlusion cues, and things like that. It wasn't that we woke up one day in Renaissance, Italy and suddenly we could use this depth cue. What we figured out was how to paint with this depth cue. And you could do all sorts of amusing things with the depth cue. So, for instance. Lines look more or less the same size, right. Let me see if I can change that here.
All right. Even though we're using crude materials, let's do a forced choice vote here. If you know they're at the same size, so it's boring to ask if they're the same size. But if you had to vote bigger or smaller, how many people would vote that the bottom line now looks bigger? How many would vote that the bottom line now look smaller. Well, I guess that worked, cheap chalk and all. This is an illusion known as the Ponzo illusion. There are a number of ways to account for it, but one of the intuitively appealing ones, at least, is to say, what this is doing, even though you're not particularly seeing it as a depth cue, is it's telling the chunks of your brain that are trying to figure out 3D, I see these two converging lines. If they're parallel lines in the world, they must be going off into depth. If they're going off into depth, then this thing is further away than this one. Well, if this one is further away, and they are the same size on my retina, this must be bigger. If that's not intuitively obvious to you, think about this as train tracks. So here are train tracks going off into the distance, and ask yourself, which maiden here, tied of to the tracks is in more distress? It's obvious that this must be the bigger person if we interpret this as train tracks going off into the distance. Even though we know that they're essentially the same size. You're getting the results of the inference are then influencing what you see. They are influencing other inferences that you make about the image.
Now you can exploit these rules of linear perspective in much more elaborate fashion then that. And the great master of that game is Escher. Here is one of Escher's pictures. What you want to do again, I think the image looks rather sharper up on these sides guys, but ask yourself where the vanishing point is. So, look at the first floor of that structure. And, it's pretty clear that those railings are converging to a vanishing point off to the right somewhere. Well, now look at the top floor. That's converting to vanishing point off to the left somewhere. And, then when you tried it put the whole thing together, you've got a structure the doesn't quite hang together quite right. This tells you a couple of things. Thing one it tells you Escher was a very clever draftsman. Thing two that it tells you is that you do a lot of these calculations about perspective very locally. What you do is you say, your attending, let's say, to the lower floor there, and you say, yeah, this all adds up. It makes sense. And, then you direct your attention to the upper floor. And it all adds up. It makes sense. It's only when you try to combine all of that across whole image, that you realize that the whole image somehow doesn't quite make sense. And, that's how Escher can do things like, have staircases that always go up, and water falls that fall apparently in an infinite loop and things like that. Grab yourself your favorite Escher website and or your favorite Escher book, and you can watch him manipulating these depth cues endlessly. It's great entertainment.
Now what I want to do is to bring together the themes of the lectures to this point in single demo. At the moment, that doesn't look like much of nothing, except that, well, I don't know, what does it look like? Cubes. All right. That's interesting, because I don't see no cubes! Where's that inference coming from? What you're really have is a bunch of Y junctions. And you know about Y junctions. Those are probably corners. And, they look like they might be kind of cube-y corners. But, what I'm going to do is rotate each of those. Now, this is cool stimulus for a variety of reasons. First of all, now you're really seeing a cube. Right? No problem. Second of all, there are two cubes. There's the cube, with its face pointing. Let's try this. There's that face, pointing down and to the right. There's that face pointing up and to the left. So, you've got two cubes. This is an ambigious by stable figure. It's known as a Necker cube. We can quickly draw one of those. Endless fun for doodling again. You can make yourself ambiguous figures instantly. That figure by itself is known as the Necker cube after a guy named Necker. So, you're inferring two cubes. You're inferring one of two cubes. The cube isn't particularly there. The black stuff -- all you're seeing is the verticies of this cube-- but you're still managing to infer the rest of the cube You can probably see the lines of the cube in the black region, right. In fact, you can see the intersection, if you straight up from here, you can see the intersection of two lines that aren't there. I can make those lines go away. Now take a look at that cube, and imagine that what you're looking at is a cube -- sort of a wire frame cube-- that's behind a sheet of sort of black swiss cheese. You're looking at it through holes. Can you get it to go back there? If you get it to go back there, you can hold it back there, you probably noticed that the subjective contours pretty much disappear on you.
Why is that? Well, if it's behind, there's no reason that you should be seeing those contour. They would be invisible. And so the invisible contours become invisible. Now if you bring the cube back in front in your perception, you'll see, oh yeah, now if that cube's floating in front. I ought to be able to see the whole cube. And now, I can see the subjective contours. So you can make the subjective contours contingent on which particular interpretation you care to give to the image. So, I think this illustrates very nicely the notion that what you are seeing is your current hypothesis about what might be generating the image on the screen.
Another thing that it points out, is that you are only willing to entertain at limited set of hypothesis. It is extremely hard to look at this and see it as however many it is. Eight little disks with chicken feed in them. With little Y's in them of some sort. It's very, very hard to see that. Even though that is perfectly consistent with the image hypothesis. You are out there trying to make a guess about the world. And, you're not willing to entertain all of them. At least most of them are immediately relegated to the realm of the very unlikely. And, normally out in the world, what happens is that a single hypothesis immediately pops to the four, and you accept it. In weird situations like this, you can entertain a few of them. You immediately narrow down the realm of possibilities to a few, not to an infinity. Even though there's an infinite number of possible ways to generate this thing.
Shadows I've already shown are a depth cue. The reason for putting this nice piece of renaissance art up there, is to point out that, while shadows are a depth cue, you're not actually terribly picky about the physics of the situation. At least not the global physics of the situation. So where's the sun here? This is an outdoor scene. Where's the sun coming from? Well, if you look at the people in the lower left -- the people on the ground plane there -- it's pretty clear that the sun must be down and to the left somewhere. Right? Well, look at the shadow underneath that portico. Well, the sun must be up and to the right there somewhere. There's no consistent source of illumination in this image, but your perfectly willing to use the shadow information to give you depth information. It's giving it to you locally. The fact that it doesn't add up globally, doesn't bother you. And it doesn't even bother you to the extent that it bothers you in the Escher picture where the building was actually impossible. Here, you don't recognize the impossibility at all, unless it's pointed out to you directly. And, finally I should add to the list, three more than are rather hard to demonstrate. That are hard to put up just as Powerpoint slides.
When people think about depth perception, if they think about depth perception, it's stereopsis binocular vision that they typically think of. Your two eyes are in two different places in your head for most of us. If you blink from eye to eye, or just cover your one eye after the other, it's actually better if you hold one finger out in front of you, and blink from eye to eye. You'll see the image in the two eyes is not the same. The difference in those two images is highly geometrically regular. And you make use of that regularity as a depth cue. It's called binocular disparity. Very useful depth cue. It's what's giving you the magic eye demos that you get. Those posters that if you cross your eyes just right, they jump out in depth. Those still around? It's a very useful depth cue. It's a little on the overrated side, because it's a lot of fun to study. People who think that binocular vision is the be all and end all depth cue, should cover one eye, and ask if the world suddenly looks very flat. It looks a little flatter, but I can still perfectly well tell who's in front of who. On the other hand, if you want to see what's stereo is doing for you, on a beautiful day like today, go outside, lie under a tree, close one eye, and look up into the branches, and try to figure out which twigs are in front of other twigs. You'll have very hard time doing it. Open the eye, and the whole thing will jump out at you in depth. That's the sort of information that stereo is giving you.
You don't have to do it with stereo. Motion parralax to jump to the bottom of that list, is a similar sort of geometric clue. If I'm here, and then I'm here, the image changes in a geometrically regular way. You guys are sliding around on my retina in such a way that these guys are moving more than you guys out in the back on my retina. And, I know that, again implicitly, like I know linear praralax and I can use that to inferred depth. Try this under the tree, and you can see the same thing. Close one eye, look up into the branches. The branches look relatively flat. Now rather than opening this eye, just move your head back and forth, and the tree will jump out at you in depth. Actually quite striking, you should try this sometime. I don't know if anybody ever does take me up on this suggestion. So if you actually try it, let me know so that I know that somebody actually tried it.
Oh and vergence is another one of these geometrical cues. Hold your finger out in front of you. Look at your finger. Now move the finger towards you, holding it as a single finger as long as you can. So I can watch you go cross eyed. What you are doing is converging your eyes. And, if you could move your finger further out, you would be diverging your eyes. Well, what you can think of that as doing, is taking like a pair of sticks and pointing them at the object. It's your visual axis in the sense, you're pointing at the object there. And the angle formed by those sticks is narrower if you're looking far away, then it is if you're looking close up. And you have a fairly impoverished ability to use that as a depth cue too. If you were a chameleon you'd be much better at this. Chameleons have eyes the move independently and are very sensitive to the angle that their eyes are pointing. And, in fact, you've seen the Animal Planet kind of videos where the chameleon's tongue goes out the length of its body, and it grabs a fly or something like that. How does that it know where the fly is? It knows by measuring the angle of its eyes. How do we know that? Well, we know that because, somebody went and put glasses on chameleon that diverged the eye. So, in order to point it's eyes at the fly, it had the angle wrong, basically. So, you put a fly on a popsicle stick you put the glasses on the chameleon and then your film the chameleon's tongue, and the chameleon keeps missing.
If I put those prisms on your eyes, you will adapt, and you will eventually be able to catch the fly again. Should you be so inclined. The chameleon turns out to be a less adaptable creature than you, and will not adapt. You can do the same game with chickens. Put a pair of prisms on your eyes to divert everything off say fifteen degrees to the left, and then if I tell you pick this up, you'll reach fifteen degrees in the wrong direction. But eventually you'll learn. Put the prisms on a chicken. Put some grain down. Here are the grains here. The chicken sees it over there. Chickens pecking over there. Chickens not getting any grain. Chicken will do that forever and apparently not learn to get it right.
Oh, good. I left myself with enough time to talk about inferences in a more global sense of combining information from across the senses. So, you've got this job to try to figure out what's going on, I've been talking about doing that specifically with the visual system. But, you're collecting information from multiple sources, and taking whatever the best information is, so if I show you a movie, you'll get captured by the visual information, and you're perfectly happy to hear the words coming out of the mouth of the guy on the screen, even though it's coming out of some speaker on the side. And it doesn't matter if they got fancy dolby stereo or something like that. Use a cheap, simple speaker sitting off to the side and you'll still hear it as coming out of the guy's mouth if you're watching it. So you're combining information from multiple senses. The particular example I thought I would discuss with you, is the ever pleasant example of motion sickness. So, when I first came to graduate school, my lab was doing research for NASA on the effects of looking at large fields that were rotating. Sort of Omni theater stuff. If you look at a whole field that's rotating counter clock wise around your line of sight, you feel like you're rotating clock wise. Meantime guys up at Brandeis who we were collaborating with, were doing the same sort of thing, but their depended measure was how long it took you to throw up. Fortunately, that was not my introduction to graduate school. But, it will make you sick. I might as well collect some data here. How many people have ever been motion sick here? OK. What made you sick? Motion. Yes. Thank you. Could were get a little more specific, while keeping within the realm of good taste here? Oh. Reading on a bumpy bus. That's one good example. Anybody got another good one?
We'll take one more here. Oh, spinning around in a circle. Did you get sick while you're spinning around in circle, or after you stopped? After you stopped. And, then when you stopped, you felt like you were spinning around in the other direction. So when you're spinning yourself around in the circle, -- you have, in your inner ears, these tubes filled with fluid, -- and quick drawing here, quick bit of vestibular physiology --need a nice fat piece of chalk -- oh, it's yellow -- you've got these tubes inside your ears that are filled with a fluid, and inside a little space in there, are these hairs. If you bend the hairs, it sends a signal off to your brain, that's the transducer to the equivalent of the photo receptors from the eyes. You can imagine taking a bucket and starting to move, the fluid stays behind for a little while, and sloshes around. That's why its hard to carry buckets of liquid around if they are too full. So if you rotate your head, the fluid tends to stay put. And, it moves over the hairs bending them and telling you that you move your head. That's what's telling you about this sort of motion. Well, you spin around for awhile, and the fluid in here eventually catches up with you and starts moving. Then you stop. And the fluid keeps going, and you say, oh no. The other way you can do this, by the way-- so it's very carefully calibrated system, turns out that alcohol is lighter then this fluid. If you drink, the reason you get dizzy, it's not because you pickled your brain, which is also true, but because you've dilute it this fluid with alcohol and this stuff is uncalibrated. And, now, you move your head a little, and the brain, says, oh, man. We just moved a lot. Don't move that head. What she was pointing out, is that what turns out to be the great stimulus for making yourself motion sick, is a mismatch between the information that in particular this vesticular system, this balance system, and your eyes are giving you. So the example given here of reading on the bus is a marvelous example. Because what you're doing, in order to read, you're holding this thing so it's a relatively stable visual stimulus. Nothing's happening here. But your vestibular system is saying [? bumbidingdibingdaeda-- ?] lots of stuff is happening. And your digestive system is busy saying [? Blech. ?]
Same thing happens on a plane, right? The problem on a plane is that when the plane bounces up and down, I mean short of catastrophic bouncing up and down, when a plane hits turbulence and it's bouncing up and down, what do you see? Nothing. Right. There's nothing happening visually. What do you feel? You feel [? bompitdumitbomp. ?] and you know, [? bloop. ?]
And, the interesting question is why'd you get sick? It's fine to say that the mismatch of visual and vestibular information turns out to be nauseating. But, why should it make you sick? The answer is, it's another inference. So, the next question then is what's the inference? What are you guessing? You're guessing that the airline food was lousy. Yeah. Yeah, you're guessing that you were poisoned. Why you guessing that you were poisoned? It's the mismatch that's critical. If you just make your vision strange, I could show you weird stuff you hadn't seen before, and you wouldn't throw up. And, I can also bounce you around, and until I do fairly dramatic stuff, you won't throw up.
But, if I mismatch what your visual system is telling you about your body, and what your vestibular system is telling you, what this is,is a protection of sorts against neuro-toxins. How do poisons work? Well some of them work by attacking your nervous system. Lot of work by attack your nervous system. How are you, the owner of the nervous system, going to know, you're going to think, I can't integrate anymore. I can no longer remember that I love my mother. The sorts of things that immediately present themselves to you as you are being poisoned, are, your senses are coming unglued from each other, and so if your eyes are saying, bong-de-bong bong, and your vestibular system is saying that your body infers that you have been poisoned, and a useful idea, if you've been poisoned, is to get rid of whatever you just ate. So, why does this happen on airplanes and stuff like that? You weren't built to be flying around at 30,000 feet bouncing around in the clouds. It just wasn't what's nature set you up to do, and so you get the unfortunate situation where even though you haven't been poisoned, you get sick. OK. Enough of that cheery topic.