Over 10 Modules with Practical Code on Real Datasets
Loading Data
In the real world, data is messy. Often it doesn't come in a format that gives you everything on a silver platter. Using real datasets, we will learn how to load data from common formats. We will explain and demo how to read files from CSV and Excel. We will also show how to read and write to databases.
Inspecting Data
Getting your data to load is not enough. We will start exploring the data, diving into types, shapes, memory usage, and more. Jupyter also plays a role here, and we will show how to adjust settings to view more data.
Tweaking Data
Big data Borat claimed that Data Scientists spend 80% of their time cleaning up the data. We will show common manipulations for numeric, string, and data types. We will discuss optimized operations that get your code to run 10-20 times faster while still retaining that easy to read quality. Big data Borat is going to be upset!
Statistics
To really understand your data, you are going to need to dig into it. Do you want to really grok your data? If you do, this module will be a treat for you. We will look into understanding numeric and categorical data. In addition, we will discuss how to quantify the relation between columns. You will start to feel like your data is a familiar book, dog-eared, and underlined!
Plotting
Summarizing your data with numbers is one way to understand it. However, we will not end there. It has been said that a picture is worth a thousand words, and this holds for data as well. When you master visualization, you will be able to picture what relationships really look like. And you will have insight into your data that you wouldn't have even thought about until you actually see it. In this case, seeing is believing.
Filtering Data
After you understand your data, you will want to dig into it. You will want to slice and dice it. This can be really confusing. However, we will walk through real-world examples and help you understand the options that you have. Faster than a speeding SQL query, you will have the data that you want.
Grouping Data
Masters of Excel know how to pivot like the back of their hand. They can quickly whip up summaries along multiple dimensions. We will show you how to one-up them in Pandas. Pandas has powerful aggregation tooling for summarizing along any axis you can think of. Of course, you can always export it to a spreadsheet after if your boss really wants one.
Joining Data
Data doesn't live in isolation. It has friends and enemies. Sometimes these need to get together and have a pow-wow. We will demonstrate how to do this in Pandas as well as show some of the situations you might want to look out for.