Teaching professional data analysis skills has never been easier! Murach’s R for Data Analysis covers everything your students need to hit the ground running with R and RStudio, even if they’ve never programmed before. Then, it presents a thorough course in data analysis. And it includes three real-world case studies that tie all the coursework together.
The Canvas course file contains all the objectives, quizzes, assignments, and slides that you need to run an effective course. It only takes a few clicks to import it into the Canvas LMS. Then, you can customize it for your course. Learn more.
I really appreciated the case studies. They were a big help for my students as they illustrated all phases of data analysis and visualization.”
As we see it, this is the best primary text for any course in which the focus is on using R for data analysis. But it is also the ideal supplementary text for a general course on data analysis because it shows how to use R to apply the concepts and statistical methods to real-world data sets.
Like all our books, this book is designed to make it as easy as possible for your students to learn new skills faster and retain them better. Here are a few of those features:
To present the essential R and data analysis skills in a manageable progression and at the right pace, this book is divided into 5 sections.
This section gets your students off to a fast start. First, they’ll learn how to use RStudio, a popular program for coding in R. Then, they’ll learn the parts of the R language that they’ll need to analyze data. Next, they’ll learn how to use R with the tidyverse package to create their first analysis.
Most analysis is descriptive analysis, in which you analyze data to better understand it. That’s why section 2 of this book presents the critical descriptive analysis skills that your students need. That includes how to:
This section presents three complete analyses that show your students how the skills presented in the first two sections can be applied to real-world data sets:
These in-depth analyses make sure that your students master the professional skills they’re going to need.
Predictive analysis uses statistical models to predict unknown or future values. Although predictive analysis is a large topic that could be an entire course of its own, this section presents the concepts your students need to get started with it. More specifically, it shows your students how to use linear regression models to predict continuous numeric values and how to use classification models to predict categorical values.
Section 5 shows how to present an analysis. To do that, your students can use R Markdown to convert an analysis into an HTML document, PDF file, or PowerPoint slideshow. This is an important skill because the value of an analysis comes from being able to present the insights gained from it to a target audience.
The only prerequisite for this book is basic computer literacy. That’s because chapters 1 and 2 present the parts of the R language that your students need to start using R for data analysis. However, it’s helpful for your students to have some background in statistics.
To analyze data with R as shown in this book, your students just need to download and install the RStudio program and the R language. Both are available for free. Then, they can install some R packages for data analysis that are also freely available. For information about how to do this, they can consult appendix A for Windows or appendix B for macOS.
“I really appreciated the four case studies. They were a big help for my students as they illustrated all phases of data analysis and visualization.”
— J. Jasperson – Texas A&M University
“In his first at-bat, Scott McCoy smashes this one out of the park! This book is not just informative, it is exciting.”
— Scott Spurlock, Software Engineer, Georgia
“Unlike some other books on data analysis with Python, the explanations of how to perform data analysis are thorough rather than terse or with no explanations.”
— Posted at an online bookseller
“I really like the paired-pages format of detailed information on the left and quick notes on the right. This helps me to quickly find the information I’m looking for.”
— Roxanne T., Student, Washington
“Another awesome book from Murach. Their format makes learning new material easier, and their code examples WORK.”
— Posted at an online bookseller
“I can’t praise this book highly enough. The clarity used in picking what to include, when to introduce it, and how to do so is remarkable.”
— Charles Ferguson, Software Developer, Australia
“This book is very well-organized and easy to follow. It covers the perfect amount of description, and it does not make you bored by providing unnecessary details.”
— Posted at an online bookseller
“You folks make the hard stuff seem easy.”
— Thomas Finn, Sr. Software Developer, Illinois
“This is my first exposure to Murach’s books, and I love them. I like the organization of the content, the consistent approach in each book, and the accuracy of the material.”
— Bob L., Michigan
“Another thing I like is the exercises at the end of each chapter. They’re a great way to reinforce the main points of each chapter and force you to get your hands dirty.”
— Hien Luu, SD Forum/Java SIG
“Your book was indispensable to me. The answers were right there at every turn. All the examples made sense, and they all worked!”
— Alan Vogt, ETL Consultant, Massachusetts
“Your books shine out from the rest—the quality of writing and presentation of information is topnotch, and the consistency of quality across books is impressive.”
— Nolan Tamashiro, Developer
View the table of contents for this book in a PDF: Table of Contents (PDF)
Click on any chapter title to display or hide its content.
What data analysis is
The five phases of data analysis
Introduction to RStudio
How to run code in the Console pane
How to run code in the Source pane
How to view variables in the Environment pane
How to create variables
How to work with variables
How to code arithmetic expressions
How to use arithmetic expressions in statements
How to interpret error messages
How to call functions
How to use functions to work with strings
How to use functions to work with numbers
How to work with vectors
How to work with data frames
How to work with lists
How to add values to data structures
How to use the relational operators
How to use the logical operators
How to code if statements
How to code nested if statements
How to code for loops
How to define functions
The child mortality data
How to set the working directory
How to work with packages
How to read the data into a tibble
How to select the top and bottom rows
How to view summary statistics
How to melt the data
How to add, modify, and rename columns
How to save a tibble as an RDS file
How to calculate summary columns
How to create a line plot
How to use the datasets package
How to get the irises data
How to get the chicks data
How to select rows based on a condition
How to create a base plot
Functions for common plot types
How to create a line plot
How to create a scatter plot
How to create a bar plot
How to create a box plot
How to create a histogram
How to create a KDE plot
How to create an ECDF plot
How to create a 2D KDE plot
How to combine plots
How to create a grid of plots
How to view documentation
How to save a plot
How to find the data you want
How to read data from CSV and Excel files
How to download data
How to work with a zip file
How to connect to a database
How to list the tables in a database
How to list the columns of a table
How to code a query
How to use a query to read data
How to read a JSON file into a list
How to get the index for a list
How to get data from a list
How to build a tibble from the data in a list
A general plan for cleaning data
How to display column names and data types
How to examine the unique values for a single column
How to display the unique values for all columns
How to count the unique values for all columns
How to display the value counts
How to sort the data
How to filter and drop rows
How to drop columns
How to rename columns
How to find missing values
How to fix missing values
How to select columns by data type
How to convert strings to numbers
How to convert strings to dates and times
How to work with the factor type
How to assess outliers
How to calculate quartiles and quantiles
How to calculate the fences for the box plot
How to fix the outliers
How to work with date columns
How to use stringr to work with strings
How to work with string and numeric columns
How to use statistical functions
How to summarize data
How to group and summarize data
Another way to group and summarize data
How to rank rows
How to add a cumulative sum
How to bin data
How to define functions that operate on rows
How to define functions that operate on columns
How to use lambda expressions instead of functions
How to add columns by joining tibbles
How to add rows
Get the data
More skills for working with scatter plots
More skills for working with bar plots
How to add an error bar to a bar plot
More skills for working with line plots
How to create a smooth line plot
How to add labels to plots
How to plot shapes
How to plot a baseball field
How to return plot components from a function
How to plot hits on a baseball field
How to plot maps
How to add data to a map
How to zoom in on part of a plot
How to adjust the limits of a plot
How to work with the plot title and axes labels
How to change the position of the legend
How to edit the legend
How to hide the text and ticks for each axis
How to set the colors for the plot
How to change the theme of the plot
How to create a pairwise grid of scatter plots
How to use other plot types in the grid
Load the packages
Get the data
Examine the data
Select and rename the columns
Sort the rows
Select the rows
Improve some columns
Add columns
Pivot the data
Plot the national polls
Plot the polls for swing states
Analyze the polls by voter type
Plot the gap for the last week of the election
Plot the weekly gap over time
Load the packages for this analysis
Unzip the database file
Read the data from the database
Improve column names and data types
Drop duplicate rows
Select rows for large fires
Examine NA values
Add, modify, and select columns
Sort the rows
Plot the largest fire per year in California
Plot the mean and median acres burned in California
Plot the fires per month in California
Plot the total acres burned for the top 10 states
Plot the acres burned per year for the top 4 states
Plot the 20 largest fires in California
Plot all fires in California larger than 500 acres
Plot all fires in the U.S. larger than 100,000 acres
Load the packages
Read the data
Build the tibble
Examine the unique values
Select and rename the columns
Improve the data types for two columns
Add a Season column
Add a Points column
Add some summary columns
Plot shots made per game by season
Plot shots attempted vs. made per game
Plot shots made per game for all seasons
Plot shot statistics by season
Plot shooting percentages per season
Plot shot locations for two games
Define a function for drawing the court
Plot shot locations for two games on a court
Plot shots by zone for one season
Plot shot count by zone
Plot shooting percentage by zone
Plot shot density
Compare shot locations and density for two seasons
Types of predictive models
Introduction to regression analysis
How to get the data
How to examine and clean the data
How to interpret correlation coefficients
How to identify correlations with r-values
How to identify correlations visually
A procedure for working with a regression model
How to split the data
How to drop outliers from the training data set
How to create a model
How to use a model to make predictions
How to plot an equation
How to plot an equation on a scatter plot
How to code formulas
How to plot a formula on a scatter plot
How to create a model for a curved line
How to create and fit the model
How to judge the model by its R2 value
How to judge the model by its residuals
More formula operators
How to create and fit the model
How to view the model’s terms
How to remove insignificant terms
How to plot regression coefficients
Five common nonlinear patterns
How to transform variables
How to create, fit, and judge the model
How to examine ordinal variables
How to create, fit, and judge the model
Introduction to classification analysis
How to get the data for this chapter
How to visually investigate the data
How to create a decision tree
How to plot a decision tree
How to judge a model with a confusion matrix
How to use variable importance to select variables
How to adjust the hyperparameters
How to compare decision trees
How to cross validate a model
How to tune hyperparameters with a grid search
How to create an R Markdown file
How to render an R Markdown file
How to code the YAML header
How to add headings and paragraphs
How to add chunks of code
How to run chunks of code
How to format text
How to create dynamic documents
How to specify multiple output formats
The HTML document displayed in a browser
The PDF and Word documents for the same markdown
The R Markdown
How to start a presentation
The first two slides of a presentation
How to install R
How to install RStudio
How to install the files for this book
How to install the packages for this book
How to install R
How to install RStudio
How to install the files for this book
How to install the packages for this book
To learn about the supporting courseware that we provide for our books, please visit About our Courseware.
This download includes files for:
Appendixes A and B show how to install these files on Windows and macOS.
For a more detailed description of the courseware for this book, please read the Instructor’s Summary.
On this page, we’ll be posting answers to the questions that come up most often about this book. So if you have any questions that you haven’t found answered here at our site, please email us. Thanks!
To view the corrections for this book in a PDF, just click on this link: View the corrections
Then, if you find any other errors, please email us so we can correct them in the next printing of the book. Thank you!
This is our site for college instructors. To buy Murach books, please visit our retail site.