1. Geolocation
  2. Cherry Blossom Runners
  3. Spam
  4. Robots
  5. Airline Delays
  6. Pairs Trading
  7. Branching process
  8. BML cars
  9. Blackjack and CSV files
  10. Baseball
  11. CIA Factbook
  12. Scraping jobs

This is a book containing 12 comprehensive case studies focused primarily on data manipulation, programming and computional aspects of statistical topics in authentic research applications. The aim is to provide students, researchers and faculty with exposure to the entire thought process of approaching the computations of a complete data analysis project. This differs from teaching a programming language. Instead, it illustrates how to think about programming with very concrete and complete examples. We also emphasize testing and validating computations, when and how to make them faster, and give the reader insight into how high-level programming evolves.

Each chapter works through all of the computations and programming to acquire, transform and explore the data or create the simulations. We discuss different aspects of the analysis and show results. However, readers have the opportunity to take the analyses much further, building on the core computational work described in the chapters.

Using KML and Google Earth to visualize data

Occurrences of technical words in Kaggle Job Posts

The case studies form 3 basic groups (with overlap in most chapters)

The chapters within these 3 groups illustrate the use of a range of useful topics including

The chapters also provide rich examples of some more advanced aspects of R, including

Duncan Temple Lang <>
Last modified: Sun Nov 15 06:52:02 PST 2015