In this lecture, we will continue our discussion of data models and database system architecture, looking in more detail at the relational model.
There is a lot of reading for this lecture. You should start early and try to digest it all, as it lays the foundation for much of what is to come. The papers are:
- Stonebraker, Michael, and Joseph Hellerstein. "What Goes Around Comes Around." In Readings in Database Systems (aka the Red Book), or online here (PDF). Read sections 1-4 (if you know something about XML, you may also enjoy reading sections 10 and 11).
- Codd, E. F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM 13, no. 6 (1970): 377-387. (Focus on sections 1.3 and all of section 2.)
You may also find in useful to read pages 57-63 of Ramakrishnan and Gehrke for a brief overview of the relational model.
As you read these papers, think about and be prepared to answer the following questions in Lecture:
- What is the notion of data independence? Why is it important?
- Codd spends a fair amount of time talking about "Normal forms". Why is it important that a database be stored in a normal form?
- What are the key ideas behind the relational model? Why are they an improvement over what came before? In what ways is the relational model restrictive?
- What, according to Codd, are the most important differences between the "hierarchical" model (as exemplified by systems like IMS) and the relational model that Codd proposes? Make sure you understand what Codd means by "Data Dependencies".