Covid-19 Vaccine Stance Detection

Created Using: Python Tweepy NetworkX

The code for this project is available at: github.com/HitanshShah/covid-19-twitter. Reach out to me via email if you have any queries related to this project.

Social media provides users with not just a huge source of information, but also the freedom to post their own views and opinions related to certain subjects. Each individual is entitled to their own thoughts or opinions. One such subject matter was the Covid-19 pandemic and the vaccines developed to fight the virus. On one hand, there was the thought the vaccines are going to put an end to the global suffering, whereas on the other, people thought vaccines were ploys set by the government and medical institutions. This project analyzes the sentiments around Covid-19 vaccines on Twitter by analyzing tweets from both, Pro and Anti, vaccine campaigns.

The first step was to scrape data from Twitter using the Twitter Developer API and Tweepy library. I scraped this data on the basis of popular hashtags like #WearAMask, #GetVaccinated, #MaskUp, #NoVaccinMandate, #NoVaccineForMe. I chose these hashtags because they were not the most common or famous ones, which made it easier to get precise insights. For instance, I found a number of tweets which were neither pro nor anti, but more of a general discussion starter, for hashtags like VaccineIsPoison or VaccinesWork, which otherwise sound like having strong opinions.

For each of these hashtags, I used the following approach:

Extract 200 tweets using the hashtag as the keyword. Here, I excluded retweets, because popular tweets could have many many retweets and that would skew the results significantly.
From all of these tweets, generate a set of Unique Hashtags. Most users on the internet tend to use more than one hashtags in their tweets or posts. Thus, it would be interesting to see how many unique hashtags there would be associated with our hashtags and if we could observe any patterns.
Create a co-occurrence matrix for all of these keywords, and normalize the values.
Using this co-occurence matrix and the count of occurrences of individual hashtags, create a circular layout network graph, in which the hashtags with a higher occurrence are displayed in the central region of the network.
Plot the connected components of these graphs to analyze sparsity or density.
Plot the degree rank and degree distribution of these networks for statistical comparisons.

Results

Left: Network graph for #WearAMask. Right: Network graph for #NoVaccinesMandate.

Left: #WearAMask. Right: #NoVaccinesMandate.

From the above visualizations, we can make the following conclusions:

Generally, for positive or pro-vaccine campaign hashtags, the network graphs are dense with higher degree distribution. It is quite the opposite for negative or anti-vaccine campaign hashtags.
There are more central thoughts for pro-vaccine, whereas the anti-vaccine tweets originate from fewer central thoughts, and usually revolve around the same train of thought.