Exploratory Data Analysis

 

A Data Scientist’s Essential Guide to Exploratory Data Analysis





Exploratory Data Analysis (EDA) is the single most important task to conduct at the beginning of every data science project.

In essence, it involves thoroughly examining and characterizing your data in order to find its underlying characteristics, possible anomalies, and hidden patterns and relationships.

This understanding of your data is what will ultimately guide through the following steps of you machine learning pipeline, from data preprocessing to model building and analysis of results.

As a rule of thumb, we traditionally start by characterizing the data relatively to the number of observations, number and types of features, overall missing rate, and percentage of duplicate observations.

With some pandas manipulation and the right cheatsheet, we could eventually print out the above information with some short snippets of code:

Post a Comment

0 Comments