Our Ironclad Guarantee
You must be satisfied. Try our print books for 30 days or our eBooks for 14 days. If they aren't the best you've ever used, you can return the books or cancel the eBooks for a prompt refund. No questions asked!
From its start, the R programming language was designed to be used for statistical analysis. Today, it’s one of the top languages used by data analysts and statisticians. With this book, you’ll learn the R skills you need to become a successful data analyst, even if you’re new to programming or have never studied statistics.
Go to our instructor’s site to learn more about this book and its instructor’s materials.
In his first at-bat, Scott McCoy smashes this one out of the park! This book is not just informative, it is exciting.”
This book is for anyone who wants to learn how to visualize and present data professionally. The only prerequisite is basic computer literacy. That’s because chapters 1 and 2 present the parts of the R programming language that you need to get started with data analysis. Then, the rest of the book shows how to use R to analyze data.
Thanks to its unique paired-pages format, this book works equally well if you’re new to programming or if you’re an experienced programmer. Each figure is paired with explanatory text in clear, easy-to-understand language. If you’re new to programming, you’ll want to read each chapter carefully, following along with the code examples. If you’re an experienced programmer, you can read more quickly and apply your new skills on the job.
This section gets you started programming right away. First, you’ll learn how to use RStudio, a popular program for coding in R that’s available for free. Then, you’ll learn the parts of the R language that you need to analyze data. Next, you’ll learn how to use R with the tidyverse package to create your first analysis.
Most analysis is descriptive analysis in which you analyze data to better understand it. That’s why section 2 of this book presents the critical descriptive analysis skills that you need for success on the job. That includes:
This section presents three complete analyses that show how the skills presented in the first two sections can be applied to real-world data sets:
These in-depth analyses make sure that you master the professional skills you need.
Predictive analysis takes data analysis to another level by using statistical models to predict unknown or future values. Although predictive analysis is a large and complex topic, this section presents the concepts you need to get started with it. More specifically, this section shows how to use linear regression models to predict continuous numeric values and how to use classification models to predict categorical values.
Section 5 shows how to present an analysis. To do that, you can use R Markdown to convert your analysis into an HTML document, PDF file, or PowerPoint slideshow. This is an important skill because the value of an analysis comes from being able to present the insights gained from it to your target audience, whether that’s your boss, your clients, or the general public.
Like all our books, this book is designed to make it as easy as possible for you to learn new skills faster and retain them better. Here are a few of those features:
To use R for data analysis, you only need to download and install the RStudio program and the R programming language. Both are available for free. Appendix A shows how to install them on Windows, and appendix B shows how to install them on macOS.
“In his first at-bat, Scott McCoy smashes this one out of the park! This book is not just informative, it is exciting.”
— Scott Spurlock, Software Engineer, Georgia
“Unlike some other books on data analysis with Python, the explanations of how to perform data analysis are thorough rather than terse or with no explanations.”
— Posted at an online bookseller
“This is my first exposure to Murach’s books, and I love them. I like the organization of the content, the consistent approach in each book, and the accuracy of the material.”
— Bob L., Michigan
“I can’t praise this book highly enough. The clarity used in picking what to include, when to introduce it, and how to do so is remarkable.”
— Charles Ferguson, Software Developer, Australia
“Another thing I like is the exercises at the end of each chapter. They’re a great way to reinforce the main points of each chapter and force you to get your hands dirty.”
— Hien Luu, SD Forum/Java SIG
“Throughout the entire project, your book was indispensable to me. The answers were right there at every turn. All the examples made sense, and they all worked!”
— Alan Vogt, ETL Consultant, Massachusetts
“This book covers the perfect amount of description, and it does not make you bored by providing unnecessary details.”
— Posted at an online bookseller
“I picked up my first Murach book at a local bookstore in 2006, not knowing what was inside or what level of knowledge it would require of me, and it has changed my life since, literally. Your format (the paired pages) made it easy for me, an accountant with no IT or software development background, to understand databases and gain skills that proved useful throughout my entire career.”
— Giovanni Galope, Accountant, Philippines
“Your books shine out from the rest—the quality of writing and presentation of information is topnotch, and the consistency of quality across books is impressive.”
— Nolan Tamashiro, Developer
View the table of contents for this book in a PDF: Table of Contents (PDF)
Click on any chapter title to display or hide its content.
What data analysis is
The five phases of data analysis
Introduction to RStudio
How to run code in the Console pane
How to run code in the Source pane
How to view variables in the Environment pane
How to create variables
How to work with variables
How to code arithmetic expressions
How to use arithmetic expressions in statements
How to interpret error messages
How to call functions
How to use functions to work with strings
How to use functions to work with numbers
How to work with vectors
How to work with data frames
How to work with lists
How to add values to data structures
How to use the relational operators
How to use the logical operators
How to code if statements
How to code nested if statements
How to code for loops
How to define functions
The child mortality data
How to set the working directory
How to work with packages
How to read the data into a tibble
How to select the top and bottom rows
How to view summary statistics
How to melt the data
How to add, modify, and rename columns
How to save a tibble as an RDS file
How to calculate summary columns
How to create a line plot
How to use the datasets package
How to get the irises data
How to get the chicks data
How to select rows based on a condition
How to create a base plot
Functions for common plot types
How to create a line plot
How to create a scatter plot
How to create a bar plot
How to create a box plot
How to create a histogram
How to create a KDE plot
How to create an ECDF plot
How to create a 2D KDE plot
How to combine plots
How to create a grid of plots
How to view documentation
How to save a plot
How to find the data you want
How to read data from CSV and Excel files
How to download data
How to work with a zip file
How to connect to a database
How to list the tables in a database
How to list the columns of a table
How to code a query
How to use a query to read data
How to read a JSON file into a list
How to get the index for a list
How to get data from a list
How to build a tibble from the data in a list
A general plan for cleaning data
How to display column names and data types
How to examine the unique values for a single column
How to display the unique values for all columns
How to count the unique values for all columns
How to display the value counts
How to sort the data
How to filter and drop rows
How to drop columns
How to rename columns
How to find missing values
How to fix missing values
How to select columns by data type
How to convert strings to numbers
How to convert strings to dates and times
How to work with the factor type
How to assess outliers
How to calculate quartiles and quantiles
How to calculate the fences for the box plot
How to fix the outliers
How to work with date columns
How to use stringr to work with strings
How to work with string and numeric columns
How to use statistical functions
How to summarize data
How to group and summarize data
Another way to group and summarize data
How to rank rows
How to add a cumulative sum
How to bin data
How to define functions that operate on rows
How to define functions that operate on columns
How to use lambda expressions instead of functions
How to add columns by joining tibbles
How to add rows
Get the data
More skills for working with scatter plots
More skills for working with bar plots
How to add an error bar to a bar plot
More skills for working with line plots
How to create a smooth line plot
How to add labels to plots
How to plot shapes
How to plot a baseball field
How to return plot components from a function
How to plot hits on a baseball field
How to plot maps
How to add data to a map
How to zoom in on part of a plot
How to adjust the limits of a plot
How to work with the plot title and axes labels
How to change the position of the legend
How to edit the legend
How to hide the text and ticks for each axis
How to set the colors for the plot
How to change the theme of the plot
How to create a pairwise grid of scatter plots
How to use other plot types in the grid
Load the packages
Get the data
Examine the data
Select and rename the columns
Sort the rows
Select the rows
Improve some columns
Add columns
Pivot the data
Plot the national polls
Plot the polls for swing states
Analyze the polls by voter type
Plot the gap for the last week of the election
Plot the weekly gap over time
Load the packages for this analysis
Unzip the database file
Read the data from the database
Improve column names and data types
Drop duplicate rows
Select rows for large fires
Examine NA values
Add, modify, and select columns
Sort the rows
Plot the largest fire per year in California
Plot the mean and median acres burned in California
Plot the fires per month in California
Plot the total acres burned for the top 10 states
Plot the acres burned per year for the top 4 states
Plot the 20 largest fires in California
Plot all fires in California larger than 500 acres
Plot all fires in the U.S. larger than 100,000 acres
Load the packages
Read the data
Build the tibble
Examine the unique values
Select and rename the columns
Improve the data types for two columns
Add a Season column
Add a Points column
Add some summary columns
Plot shots made per game by season
Plot shots attempted vs. made per game
Plot shots made per game for all seasons
Plot shot statistics by season
Plot shooting percentages per season
Plot shot locations for two games
Define a function for drawing the court
Plot shot locations for two games on a court
Plot shots by zone for one season
Plot shot count by zone
Plot shooting percentage by zone
Plot shot density
Compare shot locations and density for two seasons
Types of predictive models
Introduction to regression analysis
How to get the data
How to examine and clean the data
How to interpret correlation coefficients
How to identify correlations with r-values
How to identify correlations visually
A procedure for working with a regression model
How to split the data
How to drop outliers from the training data set
How to create a model
How to use a model to make predictions
How to plot an equation
How to plot an equation on a scatter plot
How to code formulas
How to plot a formula on a scatter plot
How to create a model for a curved line
How to create and fit the model
How to judge the model by its R2 value
How to judge the model by its residuals
More formula operators
How to create and fit the model
How to view the model’s terms
How to remove insignificant terms
How to plot regression coefficients
Five common nonlinear patterns
How to transform variables
How to create, fit, and judge the model
How to examine ordinal variables
How to create, fit, and judge the model
Introduction to classification analysis
How to get the data for this chapter
How to visually investigate the data
How to create a decision tree
How to plot a decision tree
How to judge a model with a confusion matrix
How to use variable importance to select variables
How to adjust the hyperparameters
How to compare decision trees
How to cross validate a model
How to tune hyperparameters with a grid search
How to create an R Markdown file
How to render an R Markdown file
How to code the YAML header
How to add headings and paragraphs
How to add chunks of code
How to run chunks of code
How to format text
How to create dynamic documents
How to specify multiple output formats
The HTML document displayed in a browser
The PDF and Word documents for the same markdown
The R Markdown
How to start a presentation
The first two slides of a presentation
How to install R
How to install RStudio
How to install the files for this book
How to install the packages for this book
How to install R
How to install RStudio
How to install the files for this book
How to install the packages for this book
To get a better idea of how well this book can work for you regardless of your level of experience, you can download the third chapter of this book in PDF format.
The goal of this chapter is to give you a taste of how data analysis works. In addition, it’s designed to introduce you to some of the most important R packages for working with data analysis. To do that, this chapter presents the R code for a simple but complete analysis of child mortality data. The code for this analysis uses a collection of packages known as the tidyverse.
Chapter 3 PDF Download Now
This download includes files for:
Appendixes A and B show how to install and use these files on Windows and macOS.
Zip file Download Now
On this page, we’ll be posting answers to the questions that come up most often about our R data analysis book. If you have any questions that you haven’t found answered here, please email us. Thanks!
To view the corrections for this book in a PDF, just click on this link: View the corrections
Then, if you find any other errors, please email us so we can correct them in the next printing of the book. Thank you!
For orders and customer service:
1-800-221-5528
Weekdays, 8 to 4 Pacific Time
If you're a college instructor who would like to consider a book for a course, please visit our website for instructors to learn how to get a complimentary review copy and the full set of instructional materials.