Data Cleaning & Preparation

What is Data Cleaning & Preparation and why it it is so important?

Data cleaning is the process of editing, correcting, and structuring data within a dataset so that it is generally uniform and prepared for analysis. This includes removing corrupt or irrelevant data and formatting it into a language that computers can understand for optimal analysis.

Data cleaning is often a tedious process, but it’s absolutely essential to get top results and powerful insights from your data. This is powerfully elucidated with the 1-10-100 principle: It costs €1 to prevent bad data, €10 to correct bad data, and €100 to fix a downstream problem created by bad data.

    Data Cleaning and Preparation steps:
  • Step 1: Removing irrelevant data
  • Step 2: Deduplicating data
  • Step 3: Fixing structural errors
  • Step 4: Dealing with missing data
  • Step 5: Filtering out data outliers