We will explore various social media data accessed from Twitter API with Python in this blog. In order to access data, we will use the Twitter RESTful API and find out about both Twitter users and what they are tweeting about. You have to do the following things in order to start:
- Firstly, if you don’t have an Twitter account then set one up.
- Secondly, you have to apply for Developer Access by using your Twitter account. After doing so, create an application that will generate the API credentials. This is what you will use to access Twitter from
- Lastly, you have to import the
After doing the above-mentioned things, let’s begin to query Twitter’s API and learn about tweets.
Setting up Twitter App
After applying for Developer Access, you can easily create an application on Twitter through which you can access tweets. Please ensure that you have a Twitter account already.
Note: You have to give a mobile number that will receive text messages to Twitter to verify your use of the API.
Access Twitter API in Python
After setting the Twitter app, you will be ready to access tweets in
Python. Let’s start by importing the necessary
import os import tweepy as tw import pandas as pd
You will require 4 things from your Twitter App page so that you can access the Twitter API. You will find these keys in your Twitter app settings in the
Keys and Access Tokens tab.
- access token key
- consumer key
- access token secret key
- consumer seceret key
We recommend you do not share these with anyone as these values are specific to your app.
Firstly, define your keys:
consumer_key= 'yourkeyhere' consumer_secret= 'yourkeyhere' access_token= 'yourkeyhere' access_token_secret= 'yourkeyhere'
auth = tw.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tw.API(auth, wait_on_rate_limit=True)
Send a Tweet
With the help of your API access, you can send tweets. But please ensure that your tweet is 280 characters or less.
# Post a tweet from Python api.update_status("Look, I'm tweeting from #Python in my #earthanalytics class! @EarthLabCU") # Your tweet has been posted!
Search Twitter for Tweets
You are all set to search Twitter for recent tweets now. Firstly, find some recent tweets that use the
#wildfires hashtag. You can use the
.Cursor method in order to get an object containing tweets of the hashtag
You have to define the following to create this query:
- Search term – According to the above example
- Also, start date of the search
Keep in mind that the Twitter API will only allow you to access tweets of the past few weeks so you will not be able to dig into the history too far.
# Define the search term and the date_since date as variables search_words = "#wildfires" date_since = "2018-11-16"
Now you have to use
.Cursor() for searching tweets containing the search term #wildfires on Twitter. You have the option to restrict the number of tweets by specifying a number through the
# Collect tweets tweets = tw.Cursor(api.search, q=search_words, lang="en", since=date_since).items(5) tweets
<tweepy.cursor.ItemIterator at 0x7fafc296e400>
.Cursor() has the ability to return an object through which you can iterate or loop over to access the data collected. There are various attributes available for each item in the iterator through which you can access information about each tweet. The information will include:
- content of the tweet,
- name of the sender of the tweet,
- the date on which tweet was sent, etc.
The below-mentioned code loops through the prints and objects of the text associated with every tweet.
# Collect tweets tweets = tw.Cursor(api.search, q=search_words, lang="en", since=date_since).items(5) # Iterate and print tweets for tweet in tweets: print(tweet.text)
2/2 provide forest products to local mills, provide jobs to local communities, and improve the ecological health of… https://t.co/XemzXvyPyX 1/2 Obama's Forest Service Chief in 2015 -->"Treating these acres through commercial thinning, hazardous fuels remo… https://t.co/01obvjezQW RT @EnviroEdgeNews: US-#Volunteers care for abandoned #pets found in #California #wildfires; #Dogs, #cats, [#horses], livestock get care an… RT @FairWarningNews: The wildfires that ravaged CA have been contained, but the health impacts from the resulting air pollution will be sev… RT @chiarabtownley: If you know anybody who has been affected by the wildfires, please refer them to @awarenow_io It is one of the companie…
In the above approach, we have used a standard for loop. But it is a great place for using Python list comprehension. With the help of a list comprehension, you can efficiently collect object elements contained within an iterator as a list.
# Collect tweets tweets = tw.Cursor(api.search, q=search_words, lang="en", since=date_since).items(5) # Collect a list of tweets [tweet.text for tweet in tweets]
['Expert insight on how #wildfires impact our environment: https://t.co/sHg6PcC3R3', 'Lomakatsi crews join the firefight: \n\n#wildfires #smoke #firefighter\n\nhttps://t.co/DcI2uvmKQv', 'RT @rpallanuk: Current @PHE_uk #climate extremes bulletin: #Arctic #wildfires & Greenland melt, #drought in Australia/NSW; #flooding+#droug…', "RT @witzshared: And yet the lies continue. Can't trust a corporation this deaf dumb and blind -- PG&E tells court deferred #Maintenance did…", 'The #wildfires have consumed an area twice the size of Connecticut, and their smoke is reaching Seattle. Russia isn… https://t.co/SgoF6tds1s']
To remove or keep Retweets
When someone else shares your tweet then it is called a Retweet. It is basically like sharing on Facebook or other social media apps. Many times you want to keep these retweets or remove them because of duplicacy of contents. In the below-mentioned command, you will ignore all retweets by adding
-filter:retweets to your query. The Twitter API documentation can customize your queries in other ways.
new_search = search_words + " -filter:retweets" new_search '#wildfires -filter:retweets'
tweets = tw.Cursor(api.search, q=new_search, lang="en", since=date_since).items(5) [tweet.text for tweet in tweets] ['@HARRISFAULKNER over 10% of a entire state (#Oregon) has been displaced due to #wildfires which is unprecedented, a… https://t.co/SJPyDw2vGZ', 'I left a small window open last night and the smoke from the outside #wildfires made our smoke alarm go off at 4 am… https://t.co/qj79wtXZ7o', '5 of the 10 biggest #wildfires in California history are burning right now.\n\nFossil fuels brought the… https://t.co/BqRZvnj7Ir', '#Wildfires are part of a vicious cycle: their #emissions fuel global heating, leading to ever-worse fires, which re… https://t.co/OA4UZoFbn8', 'This could be helpful if you need to evacuate!\n#wildfires #OregonIsBurning https://t.co/7F
Who is tweeting about wildfires?
There is so much information you can access that is associated with every tweet. We have mentioned an illustration below of accessing the users who are sending the tweets related to #wildfires and their locations. Please note that since user locations are entered manually by the user into Twitter, you may see a many variation in the format of this value.
tweet.user.screen_namegives information of the user’s twitter handle associated with each tweet.
tweet.user.locationgives the location provided by the user.
You can type
tweet. to experiment with other items available in every tweet. Then, use the tab button to see all of the available attributes stored.
tweets = tw.Cursor(api.search, q=new_search, lang="en", since=date_since).items(5) users_locs = [[tweet.user.screen_name, tweet.user.location] for tweet in tweets] users_locs [['J___D___B', 'United States'], ['KelliAgodon', 'S E A T T L E ☮ ?️\u200d?'], ['jpmckinnie', 'Los Angeles, CA'], ['jxnova', 'Harlem, USA'], ['momtifa', 'Portland, Oregon, USA']]
Create a Pandas dataframe from a list of Tweet data
There is a list of items that you want to work with and you can create a pandas dataframe containing the data.
tweet_text = pd.DataFrame(data=users_locs, columns=['user', "location"]) tweet_text
|momtifa||Portland, Oregon, USA|
|KelliAgodon||S E A T T L E ☮ ?️?|
|jpmckinnie||Los Angeles, CA|
Customizing Twitter queries
You can customize your Twitter search queries as we have already mentioned above with the help of Twitter API documentation. For instance, when you search
climate+change, Twitter will provide all the tweets containing both of those words (in a row) in each tweet. In the below-mentioned code, we will create a list that can be queried using Python indexing to return the first five tweets.
new_search = "climate+change -filter:retweets" tweets = tw.Cursor(api.search, q=new_search, lang="en", since='2018-04-23').items(1000) all_tweets = [tweet.text for tweet in tweets] all_tweets[:5]
['They care so much for these bears, but climate change is altering their relationship with them. It’s getting so dan… https://t.co/D4wLNhhsdt', 'Prediction any celebrity/person in government that preaches about climate change probably is blackmailed… https://t.co/TM64QukGhy', '@RichardBurgon Brain washed and trying to do the same to others. Capitalism is ALL that "Climate Change" is about. https://t.co/GbNE87luVx', "We're in a climate crisis, but Canada's handing out billions to fossil fuel companies. Click to change this:… https://t.co/oQZXUfOWe8", 'Hundreds Of Starved Reindeer Found Dead In Norway, Climate Change Blamed - Forbes #nordic #norway https://t.co/9XLS8yi72l']
In this blog, we discussed Twitter API with python and we have that it is clear to you now. Thank you for reading the blog.