Having problems with on how to read specific columns from csv in python pandas? We have solution for you. Many people who are working with CSVs with a lot of columns, face difficulties to find an easy way to read in only the columns one needs. You can always pre-filter the data using cut or awk on command line. But a simple and easy tool for this is the Pandas. There are a few ways in which you can read in only the columns you need.

Firstly, you can filter data when you are reading it with the command read_csv. Secondly, you can read the whole thing and then select only the columns you need. The method of filtering with read_csv is a better way and we can prove that. So, here we go:

Also Read: Invoice2data Python Library: Introduction and Setup

What is a CSV File?

CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters. The structure of a CSV file is given away by its name. It contains structural data in a particular format in the form of columns and rows.  A new line terminates each row to start the next row. Similarly, a comma, also known as the delimiter, separates columns within each row.

Giving you a example:

Read specific columns from csv in python pandas

Now that you have understood a little bit about CSV. It is time we should talk about Pandas and how to read the csv files in your python.

Installing Pandas !!

We have to install Panda before using the framework. One of the easiest methods to install Pandas is to install Anaconda. It is a cross-platform Python Distribution for tasks like Python computing and data analysis. Once you install Anaconda, you will have access to Pandas and other libraries such as SciPy and NumPy without doing anything else.

pip install pandas

Reading CSV Files with read_csv() !!

Now when we are through with installing process we are going to tell you. How to read csv in python. We read a particular csv in python to show you an example.

import pandas as pd

Now copy and paste the following code to parse the file in your python.

Python_OL_data = pd.read_csv('Python_OL.csv')

When we execute this code, it will read the CSV file “Python_OL.csv” from the current directory. You can see from the script above that to read a CSV file, you have to pass the file path to the read_csv() method of the Pandas library. The read_csv() method then returns a Pandas DataFrame that contains the data of the CSV file.

You can also read the first 5 columns of the csv by default by writing head. This is the method of the pandas DataFrame.

Python_OL_data.head()
Read specific columns from csv in python pandas

Now we are done with basics of csv and how to read the CSV and how to check the top values. We are going to talk about Reading specific columns from csv in python pandas.

Read specific columns from csv in python pandas

To read a specific column into csv. You have to understand a little bit about iloc function which helps in getting the integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

The allowed inputs in the .iloc feature are:

  • An integer, e.g. 5.
  • A list or array of integers, e.g. [4, 3, 0].
  • A slice object with ints, e.g. 1:7.
  • A boolean array.

We are providing example with the iloc feature to give you an idea of how to read specific columns from csv in python pandas.

import pandas as pd
dataset=pd.read_csv("C://Users//harsh//Desktop//uppu//Python_OL.csv")
print("Column Name: "+dataset.columns[1])
value=dataset.iloc[:,1:2].values
print("Value")
print(value)
dataset.head()

The print prints the Column name and .iloc selects the columns which you want to read and .dataset.iloc[:, 1:2].values, it’s gives a2 dimensional(matrix). And selects the columns after skipping one and picks the second one.

The output of reading a specific column with the help of .iloc.

Read specific columns from csv in python pandas

Conclusion:

So, as we can see, filtering for the columns that we need using the .iloc param in read_csv is about 4 times faster and uses almost half the memory in this test. There also doesn’t seem to be a big loss of performance between using the  df.loc[:, cols].

We have provided you with basic information about CSVs and how to read them. How to check few columns and rows. And most importantly How to read specific columns from csv in python pandas that can easily understood with the example given. Thank You for reading this.

Categorized in:

Tagged in:

, ,