Our Ironclad Guarantee
You must be satisfied. Try our print books for 30 days or our eBooks for 14 days. If they aren't the best you've ever used, you can return the books or cancel the eBooks for a prompt refund. No questions asked!
Today, data analysts are in demand in all types of fields, with Python as a preferred language. And now, with this book, you can gain the Python data analysis skills you need to broaden your career opportunities…more quickly and easily than you ever thought possible!…using the proven Murach approach. What’s more, after you’ve used this book to master those skills, it will become your all-time favorite on-the-job reference.
Go to our instructor’s site to learn more about this book and its instructor’s materials.
I got my very first Murach book back in 2006 from a local bookstore, not knowing what was inside, and it has changed my life since, literally. Your book format made it easy for me, an accountant with no IT background, to gain skills that proved to be useful throughout my career.”
To make this book work as effectively as possible for you, the content is divided into 4 sections.
Section 1 consists of 4 chapters that get you started with data analysis as quickly and effectively as possible.
You’ll learn how to use JupyterLab and Jupyter Notebooks to organize and develop your analyses. You’ll learn how to use a subset of the Pandas module for data analysis and visualization. And you’ll learn how to use a subset of the Seaborn module to create professional data visualizations that can be used for presentations.
When you’re done with this section, you’ll be able to start doing analyses of your own.
Most analysis is descriptive analysis in which you analyze past data to help you gain new insights. That’s why section 2 of this book presents the critical descriptive analysis skills that you need for success on the job. That includes:
Predictive analysis takes data analysis to another level by using statistical models to predict unknown or future values. Although a complete treatment of predictive analysis is far beyond the scope of this book, all analysts should know the basic concepts and skills. That’s why section 3 of this book presents those concepts and gets you started doing your own predictions.
This introduction includes how to find the correlations between variables, how to use Scikit-learn to work with linear regression models, and how to use Seaborn to create and plot linear regression models. It also shows you how to select the right variables and the right number of variables for multiple regressions...one of the critical skills for doing an effective job of making predictions.
This section presents 4 case studies that show you how the skills you’ve been learning can be applied to real-world datasets:
Frankly, you can’t master on-the-job skills by working with toy datasets, and these case studies help make sure that you will master the professional skills that you need.
This book is for anyone who wants to become a data analyst, no matter what the field. The only prerequisite is some programming experience, although it doesn’t have to be in Python.
That’s because chapter 1 presents the minimal set of Python skills that you need for this book: how to import modules; how to call and chain methods; how to code lists, slices, tuples, and dictionaries; and how to continue statements over two lines.
Of course, the more programming experience you have, the faster you’ll move through this book. In fact, our unique presentation methods let you set your own pace. If you have relatively little experience, you can move more slowly and do the exercises at the ends of the chapters. If you have a lot of experience, you can move quickly and apply your new skills on the job right away.
Like all our books, this one has features that ease the learning curve for you, even though you won’t find them in competing books. Here are a few of those features.
If you haven’t done that much Python programming before you read this book, we would like to recommend the perfect companion book: Murach’s Python Programming. It will help you raise your Python skills to a professional level, and it too is a terrific on-the-job reference.
To do data analysis with Python as shown in this book, you just need to download and install the Anaconda distribution of Python. It includes JupyterLab, Pandas, Seaborn, Scikit-learn, and more. To help you install it, appendixes A and B present the procedures you need for both Windows and macOS systems. Then, chapter 1 shows you how to get started with JupyterLab.
“This is my first exposure to Murach’s books, and I love them. I like the organization of the content, the consistent approach in each book, and the accuracy of the material.”
—Bob L., Michigan
“I really like the paired-pages format of detailed information on the left and quick notes on the right. This helps me to quickly find the information I’m looking for.”
—Roxanne T., Student, Washington
“I can’t praise this book highly enough. The clarity used in picking what to include, when to introduce it, and how to do so is remarkable.”
—Charles Ferguson, Software Developer, Australia
“Another thing I like is the exercises at the end of each chapter. They’re a great way to reinforce the main points of each chapter and force you to get your hands dirty.”
—Hien Luu, SD Forum/Java SIG
“Throughout the entire project, your book was indispensable to me. The answers were right there at every turn. All the examples made sense, and they all worked!”
—Alan Vogt, ETL Consultant, Massachusetts
“This book covers the perfect amount of description, and it does not make you bored by providing unnecessary details.”
—Posted at an online bookseller
“I picked up my first Murach book at a local bookstore in 2006, not knowing what was inside or what level of knowledge it would require of me, and it has changed my life since, literally. Your format (the paired pages) made it easy for me, an accountant with no IT or software development background, to understand databases and gain skills that proved useful throughout my entire career.”
—Giovanni Galope, Accountant, Philippines
On Murach’s Python Programming: “This is now my third book for Python, and it is the ONLY one that has made me feel comfortable solving problems and reading code. The paired pages approach is fantastic, and it makes learning the syntax, rules, and conventions understandable for me.”
—Posted at an online bookseller
“Your books shine out from the rest—the quality of writing and presentation of information is topnotch, and the consistency of quality across books is impressive.”
—Nolan Tamashiro, Developer
View the table of contents for this book in a PDF: Table of Contents (PDF)
Click on any chapter title to display or hide its content.
What data analysis is
The five phases of data analysis and visualization
The IDEs for Python data analysis
How to install and import the Python modules for data analysis
How to call and chain methods
The coding basics for Python data analysis
How to start JupyterLab and work with a Notebook
How to edit and run the cells in a Notebook
How to use the Tab completion and tooltip features
How syntax and runtime errors work
How to use Markdown language
How to get reference information
How to split the screen between two Notebooks
How to use Magic Commands
The Polling case study
The Forest Fires case study
The Social Survey case study
The Sports Analytics case study
The DataFrame structure
Two ways to get data into a DataFrame
How to save and restore a DataFrame
How to display the data in a DataFrame
How to use the attributes of a DataFrame
How to use the info(), nunique(), and describe() methods
How to access columns
How to access rows
How to access a subset of rows and columns
Another way to access a subset of rows and columns
How to sort the data
How to use the statistical methods
How to use Python for column arithmetic
How to modify the string data in columns
How to use indexes
How to pivot the data
How to melt the data
How to group the data
How to aggregate the data
How to plot the data
The Python libraries for data visualization
Long vs. wide data for data visualization
How the Pandas plot() method works by default
The three basic parameters for the Pandas plot() method
How to create a line plot or an area plot
How to create a scatter plot
How to create a bar plot
How to create a histogram or a density plot
How to create a box plot or a pie plot
How to improve the appearance of a plot
How to work with subplots
How to use chaining to get the plots you want
The Seaborn methods for plotting
The general methods vs. the specific methods
How to use the basic Seaborn parameters
How to use the Seaborn parameters for working with subplots
How to set the title, x label, and y label
How to set the ticks, x limits, and y limits
How to set the background style
How to work with subplots
How to save a plot
How to create a line plot
How to create a scatter plot
How to create a bar plot
How to create a box plot
How to create a histogram
How to create a KDE or ECDF plot
How to enhance a distribution plot
How to use other Axes methods to enhance a plot
How to annotate a plot
How to set the color palette
How to enhance a plot that has subplots
How to customize the titles for subplots
How to set the size of a specific plot
Common data sources
How to find and select the data that you want
How to import data directly into a DataFrame
How to download a file to disk before importing it
How to work with a zip file on disk
How to run queries against a database
How to use a SQL query to import data into a DataFrame
How to get and explore the metadata of a Stata file
How to build DataFrames for the metadata and the data
How to download a JSON file to disk
How to open a JSON file in JupyterLab
How to drill down into the data
How to build a DataFrame for the data
A general plan for cleaning the data
What the info() method can tell you
What the unique values can tell you
What the value counts can tell you
How to drop rows based on conditions
How to drop duplicate rows
How to drop columns
How to rename columns
How to find missing values
How to drop rows with missing values
How to fill missing values
How to find dates and numbers that are imported as objects
How to convert date and time strings to the datetime data type
How to convert object columns to numeric data types
How to work with the category data type
How to replace invalid values and convert a column’s data type
How to fix data problems when you import the data
How to find outliers
How to fix outliers
How to work with datetime columns
How to work with string columns
How to work with numeric columns
How to add a summary column to a DataFrame
How to apply functions to rows or columns
How to apply user-defined functions
How lambda expressions work with DataFrames
How to apply lambda expressions
How to set and remove an index
How to unstack indexed data
How to join DataFrames with an inner join
How to join DataFrames with a left or outer join
How to merge DataFrames
How to concatenate DataFrames
What the warning is telling you
What to do when the warning is displayed
What to watch for when the warning isn’t displayed
How to melt columns to create long data
How to plot melted columns
How to group and apply a single aggregate method
How to work with a DataFrameGroupBy object
How to apply multiple aggregate methods
How to use the pivot() method
How to use the pivot_table() method
How to create bins of equal size
How to create bins with equal numbers of values
How to plot binned data
How to select the rows with the largest values
How to calculate the percent change
How to rank rows
How to find other methods for analysis
How to generate time periods
How to reindex with datetime indexes
How to reindex with a semi-month index
How a user-defined function can improve a datetime index
How reindexing with an improved index can improve plots
How to use the resample() method
How to use the label and closed parameters when you downsample
How downsampling can improve plots
The concept of rolling windows
How to create rolling windows
How to plot rolling window data
How to create running totals
How to plot running totals
Types of predictive models
Introduction to regression analysis
The Housing dataset
How to identify correlations with a scatter plot
How to identify correlations with a grid of scatter plots
How to identify correlations with r-values
How to identify correlations with a heatmap
A procedure for creating and using a regression model
The function and methods for linear regression models
How to create, validate, and use a linear regression model
How to plot the predicted data
How to plot the residuals
The lmplot() method and some of its parameters
How to plot a simple linear regression
How to plot a logistic regression
How to plot a polynomial regression
How to plot a lowess regression
How to use the residplot() method to plot the residuals
The Cars dataset
How to create a simple regression model
How to plot the residuals of a simple regression
How to create a multiple regression model
How to plot the residuals of a multiple regression
How to identify categorical variables
How to review categorical variables
How to create dummy variables
How to rescale the data and check the correlations
How to create a multiple regression that includes dummy variables
How to select the independent variables
How to test different combinations of variables
How to use Scikit-learn to select the variables
How to select the right number of variables
Import the modules that you will need
Get the data
Display the data
Examine the data
Drop columns and rows
Rename columns
Fix object types
Fix data
Take an early plot with Pandas
Save the DataFrame
Add columns for grouping and filtering
Create a new DataFrame in long form
Take an early plot of the long data with Seaborn
Add monthly bins to the DataFrame
Add an average percent column for each month
Save the wide and long DataFrames
Plot the national and swing state polls
Plot the voter types
Plot the last two months of polling
Plot the gap changes in selected states
Prepare the gap data for the last week of polling
Plot the gap data for the last week of polling
Prepare the weekly gap data for the swing states
Plot the weekly gap data for the swing states
Download and unzip the SQLite database
Connect and query the database
Import the data into a DataFrame
Examine the data
Improve the readability of the data
Drop unnecessary rows
Drop duplicate rows
Convert dates to datetime objects
Check for missing contain dates
Add fire_month and days_burning columns
Examine the contain_date and days_burning columns
Analyze the data for California
Two more plots for California fires
Rank the states by total acres burned
Prepare a DataFrame for total acres burned by year within state
Prepare a DataFrame for the top 4 states
Plot the acres burned total by year for the top 4 states
Review the 20 largest fires in California
Use GeoPandas to plot the California map
Use GeoPandas or Seaborn to plot the California fires on a map
Plot the fires in the continental United States
Download and unzip the zip file for the data
Build a DataFrame for the metadata
Use the codebook and read the data that you want
Prepare the data
Plot the data and reduce the number of categories
Plot the total counts of the responses
Convert the counts to percents and plot them
Search the codebook for small question sets
Read and review the work-life data
Plot the responses for the first question
Plot the responses for the second and third questions
Use the codebook to find related columns
Use the codebook to find follow-up questions
Select the columns for an expanded DataFrame
Bin the data for a column
Develop and test a first hypothesis
Develop and test a second hypothesis
Develop and test a third hypothesis
Get the data
Build the DataFrame
Locate and drop unneeded rows
Locate and drop unneeded columns
Convert the game_date column to datetime data
Add a column for the season
Add a column for the shot result
Add a column for points made for each shot
Add three summary columns
Plot the points per game by season
Plot the averages of shots, shots made, and points per game by season
Plot the shot locations for two games
Plot the shot locations for two seasons
Plot the shot density for one season
Plot the shot density for two seasons
How to install Anaconda
How to use the Anaconda Prompt
How to use the Anaconda Navigator
How to install the files for this book
How to make sure Anaconda is installed correctly
How to download the large data files for this book
How to install Anaconda
How to run conda commands
How to use the Anaconda Navigator
How to install the files for this book
How to make sure Anaconda is installed correctly
How to download the large data files for this book
To get a better idea of how well this book can work for you regardless of your level of experience, you can download the first chapter of this book in PDF format.
This chapter gives you an overview of what data analysis entails and what you’ll learn in the rest of the book. So after a quick review of the required Python skills, you’ll learn how to use JupyterLab for developing data analyses, and you’ll be introduced to the 4 case studies that are used throughout the book.
Chapter 1 PDF Download Now
This download includes:
Appendix A for Windows and appendix B for macOS show how to install and use these files.
Zip file Download Now
In December 2023, we updated the download for this book to work with the latest versions of Pandas, Seaborn, and scikit-learn. If you want to use this download, it's available here:
Updated zip file Download Now
The code in the updated download doesn't always match the code presented in the book, but the changes are summarized here:
PDF Summary of updates Download Now
To view the "Frequently Asked Questions" for this book in a PDF, just click on this link: View the questions
Then, if you have any questions that aren't answered here, please email us. Thanks!
To view the corrections for this book in a PDF, just click on this link: View the corrections
Then, if you find any other errors, please email us so we can correct them in the next printing of the book. Thank you!
For orders and customer service:
1-800-221-5528
Weekdays, 8 to 4 Pacific Time
If you're a college instructor who would like to consider a book for a course, please visit our website for instructors to learn how to get a complimentary review copy and the full set of instructional materials.