Handling Missing Data with Python: Basics of Data Analytics with Python

print(‘Hello’)
Data Analytics is one of the most in-demand skills in today’s world. Solving your own puzzles with data feels thrilling and creative. I’ve always wanted to learn data analytics, but in truth, even after enrolling in many courses, I found myself stuck at the same starting point again and again. My mind felt so unstructured, and I didn’t know where to focus.
So, I decided to begin properly this time, one clear step at a time. Since I already know the basics of Python, I’m now focusing on Data Analytics with Python, along with practising on Kaggle.
And today, I’m starting with one of the most important basics every data analyst needs to know:
How to handle missing or null values in a dataset.
LIBRARIES
Let’s begin by importing necessary libraries
import pandas as pd
import numpy as np
Numpy - great for numerical operations, here we are using numpy to help us represent missing values(np.nan)
Pandas - helps to create ,clean, analyze data in table-like formats
CREATE A SAMPLE DATASET
num=pd.DataFrame({'LETTERS':['TEN','TWENTY','THIRTY','FOURTY','HUNDRED'],'NUMERICS':[10,20,30,40,100],'INCREMENT':[11,21,31,41,101],'DECREMENT':[9,np.nan,29,39,np.nan]},index=[1,2,3,4,5])
Here I have created a small DataFrame called num that have some missing values (NaN). When you practice you can choose a much bigger dataframe than this. But I would reccomend smaller ones, because it would be more easier to notice difference if you are someone who skips large chunks of comprehension.
So what is a DataFrame?
A labeled table of data that you can easily clean, analyze, and manipulate using pandas. A new DataFrame in Python is created using the pandas library. We usually do this with the pd.DataFrame() function.

This is a dataframe.
- np.nan -it’s a special constant in NumPy used to represent missing or undefined values
DETECTING MISSING VALUES
There are two common methods used to detect missing data in a DataFrame
isnull()
It shows where values are missing.
True→ The value is missingFalse→ The value is not missing
num.isnull()

.isnull().sum()
.isnull()- returns a table of True/False values.sum()- adds up all the True values in each column
num.isnull().sum()

Handling Missing Values
After identifying missing values, the next step is to decide what to do with them.
Removing missing values:
num.dropna()The output would be:

dropna()- by default dropna() removes rows that contain any missing values. You can see that difference from the output ,right? Columns 2 and 5 are dropped as they contained null values.
If you want to remove entire columns that have missing values:num.dropna(axis=1)
The entire column ‘DECREMENT’ has dropped, because it had null values.
Let me show you the difference
print("original dataset🌕:\n",num) print("\nAfter removing rows with null values🌒:\n", num_cleaned) print("\nAfter removing columns with null values🌘:\n",num_cleancol)output:

Filling Missing Values:
Instead of removing data, we can replace missing values with something meaningful or something specific. we use
fillna()- replaces missing values with the value you give. You can give any value, here i chose 100 .num.fillna('100')output will be :

Some other options you can use to fill missing values:
So Forward Fill and Backward Fill
Instead of using a fixed number, we want to fill missing values based on nearby values in the same column.
Forward fill (
ffill) – copies the value from the cell above.num.ffill()Uses previous value to fill the missing value

- Backward fill (
bfill) – copies the value from the cell below. Uses next value to fill the missing value
num.bfill()

That concludes our today’s learning journal. This how I learned how to handle missing values using Pandas. It’s the first step every data analyst learns before moving on to transformations, grouping, and visualizations. I’ll be sharing more of my learning journals in the coming days.
TIP 💡: If you want to practice working with real data, explore Kaggle. It’s a great platform where you can find tons of free datasets to experiment with. You can practice cleaning, analyzing data and even build small projects.
RESOURCES
📊 kaggle
📊Github link for the Notebook : Missing Value handling
📊Watch the explanation reel : Missing data handling for beginner data analytics


