You use .loc() and .iloc() structure to select different feature of columns in datasets. We are here to tell you about difference between loc() and iloc() in Pandas DataFrame. Pandas library of python is a very important tool. And also useful in many basic functions or mathematical functions and very heavily used in machine learning field. It contains many important functions and two of these functions are  loc() and iloc(). loc() and iloc() are used for slicing of data in a dataframe. They help in particular selection of the data in the dataframe. They basically help in filtering of the data according to your connection and needs.

  • loc: select by labels of rows and columns.
  • iloc: select by positions of rows and columns.

Also read: Read specific columns from csv in python pandas

What is .loc()?

loc() loc() is label based data selecting method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike iloc()loc() can accept the boolean data unlike iloc() .

What is .iloc()?

iloc() : iloc() is a indexed based selecting method which means that we have to pass integer index in the method to select specific row/column. This method does not include the last element of the range passed in it unlike loc()iloc() does not accept the boolean data unlike loc()

loc() vs iloc():

The loc indexer can also do boolean selection.  With iloc we cannot pass a boolean series. We must convert the boolean Series into a numpy array.loc gets rows (or columns) with particular labels from the index.iloc gets rows (or columns) at particular positions in the index (so it only takes integers).

Example to clarify Difference between loc() and iloc() in Pandas DataFrame:

We will start by importing pandas and numpy dataframe.

import pandas as pd
import numpy as np

Now lets do an example on telco customer churn dataset which is available on kaggle. Here is the dataset into dataframe of pandas.

df = pd.read_csv("Projects/churn_prediction/Telco-Customer-Churn.csv")
Difference between loc() and iloc() in Pandas DataFrame

We are only able to see the columns that fit to screen but it actually contains a total of 21 columns.

loc is used for selecting data by label. The labels for columns are called column names. For instance, you can see above the column names (labels) which are customerID, gender, etc. You have to keep in mind about rows you are selecting. Since we did not give any indices pandas will create indexes on it own. Thus it will go from 0 and then upto to the columns there are in your datasets. .loc() feature takes 0 position of the starting integer. We will show you how .iloc() and loc() handles rows differently with example.

Select row “2” and column “gender”.

Difference between loc() and iloc() in Pandas DataFrame

it gives the values of gender’ column of row ‘2’.

Now select the rows upto 5 columns “gender” and “Partner” .

Next, select the row labels of “2”, “4”, “5” and the column of “InternetService”.

Difference between loc() and iloc() in Pandas DataFrame

Also, you can filter the dataframe and then apply loc or iloc.

Now you have to select row labels to “10” and “PhoneService” and “InternetService” columns of customer with Partner (Partner == ‘Yes’)

You can filter the dataframe but don’t change the index. Hence, the final dataframe will only contain the indices of labels of the rows which are not omitted. So, you can select the rows with labels upto “10” when using loc[:10]. Contrary to this, when you will use iloc[:10] after applying the filter you will get 10 rows. This is because regardless of the labels iloc selects by position.

Difference between loc() and iloc() in Pandas DataFrame

As you have already noticed the command have changed the way we select the columns. We also need to pass the position as a command in iloc().

Select the first 5 rows and first 5 columns:

Difference between loc() and iloc() in Pandas DataFrame

Select the last 5 rows and last 5 columns:

The position starts from the beginning which is 0. And if we start from last column that is where we take minus that is why we are starting from -5.

We can also apply lambda functions as per our requiremnet.

Select the every third row up to 15th row and show only “Partner” and “Internet Service” columns.

Difference between loc() and iloc() in Pandas DataFrame

We can select positions or labels in between.

Select the row positions between 20 and 25 , column positions between 4 and 6.

Difference between loc() and iloc() in Pandas DataFrame

Conclusion:

We hope that this blog has made the difference between loc and iloc very clear for you. With a lot of practice on various examples you can learn this and even excel in it. All you have to do is put a little effort in it.

We hope that this blog was helpful for you. Thank you for reading it and please share any feedback.

Categorized in:

Tagged in:

, ,