Descriptive Statistics in Machine learning using Python

The Descriptive statistics can showcase you great insights into the data and it’s nature. With a single line of code, you can create multiple insights over the input data. The describe() function is used for this purpose which lists over 8 statistical properties of each data attribute –

Count
Mean
Standard deviation
Min value
25th percentile
50th percentile
75th percentile
Max value

Descriptive Statistics of Airquality dataset

In this section, we are going to use the describe() function to get the statistical insights of the input airquality dataset.

#Imports required library 
import pandas as pd
#Reads the dataframe
df = pd.read_csv('airquality.csv')
#View data
df

Well, we have the input dataframe now. Let’s make a statistical interference into the data.

#Gives the statistical insights over the data
df.describe()

As you see, there is a lot to understand and analyze. The describe() function has returned tons of important insights into the data. With this, you can make the following inferences concerning data points in Ozone.

Total count of the data points is 153.
The mean of each column is 42 for Ozone data.
The min value is 1 and max value is 168 for Ozone data points.
The 25% of the data is 18.
The 50% of the data is 31.
The 75% of the data is 168.

That’s fantastic. With a single line of code you got the complete outline of the data using python.

Skewness of the data using skew() function

The skew refers to a gaussian distribution that is either skewed in one or another direction. Many machine learning algorithms will assume Gaussian distribution.

Analysis of the skew is most important in the data preparation process and it will eventually contribute to the model accuracy.

Let’s see how we can find that using skew().

#Returns the skewness of the data
df.skew()

Unnamed: 0    0.000000
Ozone         1.241796
Solar.R      -0.428045
Wind          0.347818
Temp         -0.377884
Month        -0.002391
Day           0.002652
dtype:       float64

Things to consider –

Positive is the right skew
Negative is the left skew
Values closer to zero are less skewed.

Final words

As you deal with large datasets, it’s hard to peek into the data and generate insights. With the describe() function you can easily get the statistical properties of the data.

But getting the insights is not enough unless you take a movement and understand the behavior and distribution of the data.

Ask as many questions to yourself and try to answer them. Write down your observations. Think about the reason for that insights and this will gradually provide more hidden patterns and behaviours which will be helpful in further process.

Categorized in:

Python

Tagged in:

Python Programming

Descriptive Statistics in Machine learning using Python

Descriptive Statistics of Airquality dataset

Skewness of the data using skew() function

Final words

Other Stories

USERNAMES FOR SNAPCHAT IDEAS

How many followers you need to make money on Instagram

Press ESC to close

Or check our Popular Categories...

Descriptive Statistics of Airquality dataset

Skewness of the data using skew() function

Final words

Related Articles

Can Python Replace Javascript

How to Install Angular CLI 12 and Use It on Windows 10?

React Native vs Ionic

React vs Angular: Which JS Framework to Pick for Front-end Development?

Other Stories

USERNAMES FOR SNAPCHAT IDEAS

How many followers you need to make money on Instagram