Pandas for Everyone: Python Data Analysis

Sharing practical insights into solving real-world data science problems using Pandas library and Python programming language.

(PYTHON-PANDAS.AP1) / ISBN : 978-1-64459-413-1
Lessons
Lab
TestPrep
AI Tutor (Add-on)
Get A Free Trial

About This Course

This course, Pandas for Everyone: Python Data Analysis, teaches how to tackle real-world data analysis problems using the popular Pandas library. You'll begin with the fundamentals, learning how to load data sets, explore their structure, and create basic visualizations. As you progress, you'll explore data manipulation techniques and be introduced to powerful data cleaning and transformation tools. Finally, the course will briefly introduce you to the broader Python data science ecosystem, touching on tools like scikit-learn for machine learning and visualization libraries like Seaborn.

Skills You’ll Get

  • Load, explore, and manipulate data using Pandas DataFrames
  • Create basic data visualizations in pandas labs
  • Combine and clean messy datasets
  • Handle missing values and work with different data types
  • Perform groupby operations and data normalization
  • Apply functions and regular expressions for data transformation
  • Conduct statistical modeling using techniques like linear regression and logistic regression
  • Gain exposure to the broader Python data science ecosystem

1

Preface

  • Breakdown of the Course
  • How to Read This Course
  • Setup
2

Pandas DataFrame Basics

  • Introduction
  • Load Your First Data Set
  • Look at Columns, Rows, and Cells
  • Grouped and Aggregated Calculations
  • Basic Plot
  • Conclusion
3

Pandas Data Structures Basics

  • Create Your Own Data
  • The Series
  • The DataFrame
  • Making Changes to Series and DataFrames
  • Exporting and Importing Data
  • Conclusion
4

Plotting Basics

  • Why Visualize Data?
  • Matplotlib Basics
  • Statistical Graphics Using matplotlib
  • Seaborn
  • Pandas Plotting Method
  • Conclusion
5

Tidy Data

  • Columns Contain Values, Not Variables
  • Columns Contain Multiple Variables
  • Variables in Both Rows and Columns
  • Conclusion
6

Apply Functions

  • Primer on Functions
  • Apply (Basics)
  • Vectorized Functions
  • Lambda Functions (Anonymous Functions)
  • Conclusion
7

Data Assembly

  • Combine Data Sets
  • Concatenation
  • Observational Units Across Multiple Tables
  • Merge Multiple Data Sets
  • Conclusion
8

Data Normalization

  • Multiple Observational Units in a Table (Normalization)
  • Conclusion
9

Groupby Operations: Split-Apply-Combine

  • Aggregate
  • Transform
  • Filter
  • The pandas.core.groupby. DataFrameGroupBy object
  • Working With a MultiIndex
  • Conclusion
10

Missing Data

  • What Is a NaN Value?
  • Where Do Missing Values Come From?
  • Working With Missing Data
  • Pandas Built-In NA Missing
  • Conclusion
11

Data Types

  • Data Types
  • Converting Types
  • Categorical Data
  • Conclusion
12

Strings and Text Data

  • Introduction
  • Strings
  • String Methods
  • More String Methods
  • String Formatting (F-Strings)
  • Regular Expressions (RegEx)
  • The regex Library
  • Conclusion
13

Dates and Times

  • Python's datetime Object
  • Converting to datetime
  • Loading Data That Include Dates
  • Extracting Date Components
  • Date Calculations and Timedeltas
  • Datetime Methods
  • Getting Stock Data
  • Subsetting Data Based on Dates
  • Date Ranges
  • Shifting Values
  • Resampling
  • Time Zones
  • Arrow for Better Dates and Times
  • Conclusion
14

Linear Regression (Continuous Outcome Variable)

  • Simple Linear Regression
  • Multiple Regression
  • Models with Categorical Variables
  • One-Hot Encoding in scikit-learn with Transformer Pipelines
  • Conclusion
15

Generalized Linear Models

  • About This Lesson
  • Logistic Regression (Binary Outcome Variable)
  • Poisson Regression (Count Outcome Variable)
  • More Generalized Linear Models
  • Conclusion
16

Survival Analysis

  • Survival Data
  • Kaplan Meier Curves
  • Cox Proportional Hazard Model
  • Conclusion
17

Model Diagnostics

  • Residuals
  • Comparing Multiple Models
  • k-Fold Cross-Validation
  • Conclusion
18

Regularization

  • Why Regularize?
  • LASSO Regression
  • Ridge Regression
  • Elastic Net
  • Cross-Validation
  • Conclusion
19

Clustering

  • k-Means
  • Hierarchical Clustering
  • Conclusion
20

Life Outside of Pandas

  • The (Scientific) Computing Stack
  • Performance
  • Dask
  • Siuba
  • Ibis
  • Polars
  • PyJanitor
  • Pandera
  • Machine Learning
  • Publishing
  • Dashboards
  • Conclusion
21

It’s Dangerous To Go Alone!

  • Local Meetups
  • Conferences
  • The Carpentries
  • Podcasts
  • Other Resources
  • Conclusion
A

Appendix A: Concept Maps

B

Appendix B: Installation and Setup

  • B.1 Install Python
  • B.2 Install Python Packages
  • B.3 Download Book Data
C

Appendix C: Command Line

  • C.1 Installation
  • C.2 Basics
D

Appendix D: Project Templates

E

Appendix E: Using Python

  • E.1 Command Line and Text Editor
  • E.2 Python and IPython
  • E.3 Jupyter
  • E.4 Integrated Development Environments (IDEs)
F

Appendix F: Working Directories

G

Appendix G: Environments

  • G.1 Conda Environments
  • G.2 Pyenv + Pipenv
H

Appendix H: Install Packages

  • H.1 Updating Packages
I

Appendix I: Importing Libraries

J

Appendix J: Code Style

  • J.1 Line Breaks in Code
K

Appendix K: Containers: Lists, Tuples, and Dictionaries

  • K.1 Lists
  • K.2 Tuples
  • K.3 Dictionaries
L

Appendix L: Slice Values

M

Appendix M: Loops

N

Appendix N: Comprehensions

O

Appendix O: Functions

  • O.1 Default Parameters
  • O.2 Arbitrary Parameters
P

Appendix P: Ranges and Generators

Q

Appendix Q: Multiple Assignment

R

Appendix R: Numpy ndarray

S

Appendix S: Classes

T

Appendix T: SettingWithCopyWarning

  • T.1 Modifying a Subset of Data
  • T.2 Replacing a Value
  • T.3 More Resources
U

Appendix U: Method Chaining

V

Appendix V: Timing Code

W

Appendix W: String Formatting

  • W.1 C-Style
  • W.2 String Formatting: .format() Method
  • W.3 Formatting Numbers
X

Appendix X: Conditionals (if-elif-else)

Y

Appendix Y: New York ACS Logistic Regression Example

Z

Appendix Z: Replicating Results in R

  • Z.1 Linear Regression
  • Z.2 Logistic Regression
  • Z.3 Poisson Regression

1

Pandas DataFrame Basics

  • Performing Grouped and Aggregated Calculations Using the .groupby() Method
2

Pandas Data Structures Basics

  • Creating a DataFrame and Making Changes to it
3

Plotting Basics

  • Creating a Scatter Plot Using Multivariate Data
  • Creating a Density Plot Using Bivariate Data
4

Tidy Data

  • Using Functions and Methods to Process and Tidy Data
5

Apply Functions

  • Performing Calculations Across DataFrames
  • Vectorizing Functions
6

Data Assembly

  • Performing Concatenation Using the concat() Function
  • Merging Multiple Data Sets Using the .merge() Function
7

Data Normalization

  • Understanding Multiple Observational Units in a Data Set
8

Groupby Operations: Split-Apply-Combine

  • Performing Data Summarization Using Group-by Operations
  • Performing Boolean Subsetting on the Data
  • Performing Operations on Grouped Objects
9

Missing Data

  • Finding and Cleaning Missing Data
10

Data Types

  • Performing Data Type Conversion
11

Strings and Text Data

  • Finding and Substituting a Pattern
12

Dates and Times

  • Converting an Object Type into a datetime Type
  • Extracting Date Components from the Data
  • Getting Stock Data and Subsetting it Based on Dates
  • Resampling Dates Using the .resample() Method
13

Linear Regression (Continuous Outcome Variable)

  • Performing Linear Regression
  • Performing Multiple Regression
14

Generalized Linear Models

  • Performing Logistic Regression
  • Performing Poisson Regression Using the poisson() Function
15

Survival Analysis

  • Performing Survival Analysis Using the KaplanMeierFitter() Function
16

Model Diagnostics

  • Comparing Models Using Cross-Validation
17

Regularization

  • Performing L1 Regularization Using the Lasso() Function
  • Performing L2 Regularization Using the Ridge() Function
18

Clustering

  • Performing k-Means Clustering
  • Using Hierarchical Clustering Algorithms

Any questions?
Check out the FAQs

Still have unanswered questions and need to get in touch?

Contact Us Now

Pandas in Python are a powerful open-source library for data analysis. It offers data structures like DataFrames and tools to manipulate, clean, and visualize that data.

Yes, Python is excellent for data analysis. It's easy to learn, has versatile libraries (like Pandas), and a large, supportive community. Python's flexibility makes it useful for various data science tasks.

While some basic programming experience can be helpful, this course is designed to be accessible for beginners. We'll start with the fundamentals of Python and Pandas, gradually building your skills throughout the course.

Related Courses

All Course
scroll to top