Syllabus

Course Meeting Times

Class Sessions: 2 sessions / week, 1.5 hours / session

Studio Sessions: 1 session / week, 1.5 hours / session

Course Arc

  • Probability: (uncertain world, perfect knowledge of the uncertainty)
    • Counting
    • Random variables, distributions, quantiles, mean variance
    • Conditional probability, Bayes' theorem, base rate fallacy
    • Joint distributions, covariance, correlation, independence
    • Central limit theorem
  • Statistics I: pure applied probability (data in an uncertain world, perfect knowledge of the uncertainty)
    • Bayesian inference with known priors, probability intervals
    • Conjugate priors
  • Statistics II: applied probability (data in an uncertain world, imperfect knowledge of the uncertainty)
    • Bayesian inference with unknown priors
    • Frequentist significance tests and confidence intervals
    • Resampling methods: bootstrapping
    • Linear regression

Computation, simulation, and visualization using R and applets will be used throughout the course.

Broad Course Objectives

  • Learn the language and core concepts of probability theory.
  • Understand basic principles of statistical inference (both Bayesian and frequentist).
  • Build a starter statistical toolbox with appreciation for both the utility and limitations of these techniques.
  • Use software and simulation to do statistics (R).
  • Become an informed consumer of statistical information.
  • Prepare for further coursework or on-the-job study.

Specific Learning Objectives

Probability

Students completing the course will be able to:

  • Use basic counting techniques (multiplication rule, combinations, permutations) to compute probability and odds.
  • Use R to run basic simulations of probabilistic scenarios.
  • Compute conditional probabilities directly and using Bayes' theorem, and check for independence of events.
  • Set up and work with discrete random variables. In particular, understand the Bernoulli, binomial, geometric and Poisson distributions.
  • Work with continuous randam variables. In particular, know the properties of uniform, normal and exponential distributions.
  • Know what expectation and variance mean and be able to compute them.
  • Understand the law of large numbers and the central limit theorem.
  • Compute the covariance and correlation between jointly distributed variables.
  • Use available resources (the internet or books) to learn about and use other distributions as they arise.

Statistics

Students completing the course will be able to:

  • Create and interpret scatter plots and histograms.
  • Understand the difference between probability and likelihood functions, and find the maximum likelihood estimate for a model parameter.
  • Do Bayesian updating with discrete priors to compute posterior distributions and posterior odds.
  • Do Bayesian updating with continuous priors.
  • Construct estimates and predictions using the posterior distribution.
  • Find credible intervals for parameter estimates.
  • Use null hypothesis significance testing (NHST) to test the significance of results, and understand and compute the p-value for these tests.
  • Use specicific significance tests including, z-test t-test (one and two sample), chi-squared test.
  • Find confidence intervals for parameter estimates.
  • Use bootstrapping to estimate confidence intervals.
  • Compute and interpret simple linear regression between two variables.
  • Set up a least squares fit of data to a model.

Basic Course Structure

Before Class

You must do the reading and answer reading questions before each class, as lectures will be given under the assumption that you have completed the reading. We do not expect that you will have mastered the material on first reading. The goal is to start the process, so class will be more productive. The reading questions will prepare you for the harder questions we will work during class and on the problem sets.

Class Sessions

Class sessions will be a blend of lecture, concept questions and group problem solving. In-class group work will be done in groups of three of your choosing. will use groups of 3. We will use "clicker questions" in class.

Studio Sessions

Studio sessions will involve longer problems and the use of R for computation, simulation and visualization. You will need to bring your laptop during these sessions. We will make frequent use of R for computation, simulation and visualization. We will teach you everything you need to know to use R as a tool, and you will not be expected to use R to do any hardcore computer programming.

Collaboration

MIT has a culture of teamwork so we encourage you to work with study partners. Collaboration on homework is encouraged, but you must write your solutions yourself, in your own words. You must also list all collaborators and outside sources of information.

Discussion Boards

This course makes use of discussion boards, which can be a great resource for helping each other understand the material and problem sets. We encourage collaboration and learning communities but please avoid asking for and/or posting answers to assignments: You may help clarify what's being asked, shed light on a concept, or direct others to relevant material. You may not provide solutions to problem sets.

Grading

ACTIVITIES PERCENTAGES
Reading questions and in-class clicker questions 10%
Problem sets (with lowest score dropped) 25%
Exam 1 15%
Exam 2 15%
Final exam 35%