Twitter Sentiment Analysis part 4: Loading Tweets from Twitter using Tweepy

Hello guys and welcome to this 4th part of this series on Twitter sentiment analysis using NLTK.

In the previous parts, we learn how to create the dataset for predicting and we also predict some reviews, in this tutorial, we will load some tweets from Tweeter and then predict the nature of tweets.

for this, we need authentication API keys and tokens so that we can access the tweets from the tweeter. The process to get these keys is very simple but you have to answer a lot of questions,

First of all, we have to register as an app developer. To register, click on this link and you will see a page like this.

then fill the form and click on create, then you get the window like this.

then click on key and token you will get the window like this,

Save these authentication keys, as we need them in our program

Now we have the keys , Lets move toward the program and see how it works, I am adding the full code here, if you want to understand the specific function or specific line then just navigate to the particular line in the explanation

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json
import time
from nltk.corpus import stopwords
from nltk import word_tokenize
import pickle

stop_words= set(stopwords.words("english"))


consumer_key="key"
consumer_secret="secret key"
access_token="token"
access_token_secret="secret token"

pickle_out=open("/Users/pushkarsingh/Desktop/twitter/pos_adj.pickle","rb")
pos_adj=pickle.load(pickle_out)

pickle_out=open("/Users/pushkarsingh/Desktop/twitter/neg_adj.pickle","rb")
neg_adj=pickle.load(pickle_out)

def check(example):
    pos_count=0
    neg_count=0
    ex_words=word_tokenize(example)

    for ex_word in ex_words:
        if ex_word not in stop_words:
            if ex_word.lower() in pos_adj:
                pos_count+=1
            if ex_word.lower() in neg_adj:
                neg_count+=1

    if pos_count>neg_count:
        checker="pos"
        conf=pos_count-neg_count
    elif pos_count<neg_count:
        checker="neg"
        conf=neg_count-pos_count
    else:
        checker="None"
        conf=0
    return checker, conf

class listener(StreamListener):

    def on_data(self, data):
        all_data = json.loads(data)
        tweet = all_data["text"]
        label,_=check(tweet)
        output = open("/Users/pushkarsingh/Desktop/twitter/test_twitter.txt","a")
        output.write(label)
        output.write('\n')
        output.close()
        return True

    def on_error(self, status):
        print(status)

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

twitterStream = Stream(auth, listener())
twitterStream.filter(track=["India"])

On the successful execution we will get a file which contains the labels as shown below.

Explanation

First of all, we are importing necessary dependencies, we already have NLTK and pickle libraries as we install them in the previous tutorials, but tweepy and JSON are new.

We can install tweepy using pip, Just type and run the command below

pip install tweepy

Similarly, we can install JSON using pip, Just type and run the command below

pip install simplejson
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
from nltk.corpus import stopwords
from nltk import word_tokenize
import pickle


Here in this step we are writing the keys and tokens that we get in above process.

consumer_key="key"
consumer_secret="secret key"
access_token="token"
access_token_secret="secret token"

This is the prediction function which we discussed in part 3.

stop_words= set(stopwords.words("english"))

pickle_out=open("/Users/pushkarsingh/Desktop/twitter/pos-yy_adj.pickle","rb")
pos_dict=pickle.load(pickle_out)

pickle_out=open("/Users/pushkarsingh/Desktop/twitter/neg-yy_adj.pickle","rb")
neg_dict=pickle.load(pickle_out)

def predict(example):
    pos_count=0
    neg_count=0
    ex_words=word_tokenize(example)

    for ex_word in ex_words:
        if ex_word.lower() not in stop_words:
            for key, value in pos_dict.items():
                if key==ex_word.lower():
                    pos_count+=value
            for key, value in neg_dict.items():
                if key==ex_word.lower() :
                    neg_count+=value

    if pos_count>neg_count:
        conf=pos_count-neg_count
        checker="pos"
        
            
    elif pos_count<neg_count:
        conf=neg_count-pos_count
        checker="neg"
       
    elif pos_count==neg_count:
        checker="None"
        conf=0
        
    return checker, conf

Now we have three labels which we can assign to a tweet, if the tweet has more positive words, we will assign “pos”, if the tweet has more negative word then we assign “neg” and if in case the tweet contains both positive and negative value in equal amount then we  assign “None” label to the particular tweet.

After assigning the labels,  we open a file and save these labels in it.

We will use this file to read the labels and plot the graph in real time, Now when we change the subject lets say from Trump to Modi, then we need to delete the previous file so that the previous analysis will not affect the ups and downs of our new subject.

For the first time there will not be any file with this name, so using except block we are creating the file.

try:
    os.remove("/Users/pushkarsingh/Desktop/twitter/test_twitter.txt")
except:
    os.makedirs("/Users/pushkarsingh/Desktop/twitter/test_twitter.txt")

Here we are creating a class which we will use to load the tweets, now there are two functions is the class. First one is on_data which we will use to defining the tweets variable and doing analysis, second is on_error which we will use if we encounter any error.

These functions and classes are the standard one from Tweepy. I suggest you to go to the documentation of Tweepy to learn more because here we only discuss from the point of view of Twitter Sentiment Analysis but there are lot more things which you can do using this module.

class listener(StreamListener):

Here the tweets are load in JSON file. Along with tweets, there are some other data too which loads in this JSON file like username, created date, time, username handler name etc.

So, we have to define the parameter to filter the data which we want to use like here we are loading the tweets and created_date.

In the next line, we are calling our predict function to predict the nature of tweets.

Then we open a file and save the label, either “pos” or “neg”, in that file in a new line.

    def on_data(self, data):
        tweets = json.loads(data)
        tweet = tweets["text"]
        #print(tweet)
        #created_date=tweets["created_at"]
        label,_=predict(tweet)
        output = open("/Users/pushkarsingh/Desktop/twitter/test_twitter.txt","a")
        output.write(label)
        output.write('\n')
        return True

Here is the error function which tells us what type of error is occurred. this contains a limited amount of error like error 420, ‘text’ error etc.

    def on_error(self, status):
        if staus == "420":
            print(status)
            print("This error is due to multiple connections, wait for a few seconds before connecting again.")
        else:
            print(status)

Here is an authorization process which will validate the keys of an API, again this is the standard from Tweepy Liberary.

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

on successful authentication, the stream will get connected and start running. we can also filter the tweets on the basis of a word which must contain in a tweet.

In the end, we are closing the file which we used to store the “pos” and “neg” labels.

twitterStream = Stream(auth, listener())
twitterStream.filter(track=["India"])
output.close()

To print the tweets, just uncomment the print(tweets) line

This is all from my side, In the next part we will learn how to show this data on live graph using matplotlib.

If you have any doubt, concern or suggestion till this point, feel free to comment below.

Thanks for reading 😀


Leave a Reply

Your email address will not be published. Required fields are marked *