We’ve all heard the cliche “garbage in, garbage out." This applies to our data as well. In order to perform proper analysis on data, it needs to be cleaned. Fictional twitter character @BigDataBorat states, “In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data”. This week we will learn about “prepare data”.

We are going to use Pandas to load and clean data.

Here are some resources for that:
  • Effective Pandas Book - Chapters on Series
  • Effective Pandas Video Course
    • Loading Data
    • Exploring Data
    • Tweaking Data
  • Applied Pandas - Survey Analysis
    • Loading Data
    • Types
    • Cleanup


This week's demo will show cleaning up some meteorological data. We will look at where data is missing; handling missing data is one aspect of cleaning data. We will also deal with categorical data and turning text data into numeric data.

Cleaning data can take a bit of time, but once you have done it, you can proceed with future analysis much easier.

This week we will leverage the pandas, missingno, and matplotlib libraries to deal with data quality.