Lecture 5 Summary

« Previous: Lecture 4 Summary Next: Lecture 6 Summary »

Floating-point arithmetic, continued. The key point is that the nearest floating-point number to x, denoted fl(x), has the property that |fl(x)−x| ≤ εmachine|x|, where εmachine is the relative "machine precision" (about 10−16 for double precision). Moreover the IEEE standard guarantees that the result of xy where ♦ is addition, subtraction, multiplication, or division, is equivalent to computing fl(xy), i.e. computing it in infinite precision and then rounding (this is called "exact rounding" or "correct rounding").

Briefly discussed some myths about floating point (from the Kahan handout, last time), especially the pernicious myth that all floating point arithmetic is a little bit "random", and that integer arithmetic is somehow more accurate. Discussed decimal versus binary floating point, and contrasted with fixed-point arithmetic.

Gave the obvious definition of accuracy, or more technically stability: "forwards stability" = almost the right answer for the right input. Showed that this is often too strong; e.g. adding a sequence of numbers is not forwards stable.

More generally, we apply a weaker condition: "stability" = almost the right answer for almost the right input. (Gave the technical version of this, from the book.) Forwards stability implies stability but not the converse.

Often, it is sufficient to prove "backwards stability" = right answer for almost the right input. Showed that, in our example of adding a sequence of numbers, backwards stability seems to work where forwards stability failed. (Will give a rigorous proof next time.)