Sunday, February 1, 2015

Analyzing SuperBowl Sentiment with Twitter API and R



As marketers, being able to capture the sentiments and perceptions related to our brand helps us get valuable insights. It allows us to determine if our campaigns are getting the response that we planned for, while also giving us an opportunity to improve our strategy in case our ideas are not resonating well with our customers. In a 2014 Marketing Trends Survey, marketers ranked sentiment as the third-most valuable element to be extracted within data-driven marketing strategies, after web behavior and browsing behavior.


Today I thought I will do some inside digging on what Twitterites were thinking pre and post Superbowl game..... and Let's not forget the half time show.

Pre-game: Sentiment Algorithm was implemented prior to the game to scan all the tweets, remove the non-ASCI words, and provide a score based on the number of positive and negative comments present in the tweets.  In order to count the number of positive and negative words, I used the opinion lexicon in english, provided by Hu and Liu:http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

Twitter API and R program were used to extract tweets that were hashtagged SuperBowl (#superbowl). The sentiment algorithm helped me come up with a score for each tweet on a scale from -5 to 5.  The lowest scores (negative) indicated that tweets were associated with unhappiness, zero meant neutral feelings, and positive meant viewers were happy.

For eg: "Awesome Party!" receives a score of +1 (Awesome); "I hate this game." receives -1, commercials were bad and useless.  receives -3 and so on.


Game on!


People were having positive feelings prior to the start of the game and this one received a +5. 
Super*3 + Delicious + Fun =5




Overall score distribution at 2:16 pm PST :  There were 198 tweets out of a total of 3000 total that got a negative score and rest of them were positive. As you could see in the below chart,  93.3% tweets were positive/neutral and rest (6.67%) were negative. 


Sorry, no points for sarcasm. It's difficult to create an algorithm for that.

















Half Time: That was a pretty spectacular half time show by Katy Perry according to Twitter.















And it jumped to 99.2% positive/neutral tweets just after her show. Out of a total of 3000 tweets, only 22 tweets had a negative score. 






2 minutes warning: The tense final moments...and we still receive a score of +1! Hope the Victoria Secret ad reduced some of the tensions.










Final scores were 90% positive/neutral and 10% negative. Clearly you could see happy Patriots fans and some dejected Seahawks fan. Also, with the Super Bowl taking a violent turn at the end of the game, the number of negative tweets shot up.

Here is a box plot of the overall sentiments of the game. You can see in the figure the median ranging from 0 to 1, 2 being the maximum, 3 and 4 scores becoming the positive outliers.





There you go folks...SUPERBOWL XLIX to the New England Patriots!  


Integrating R with Tableau allows me to create more interactive charts. Below is an analysis on tweets mentioning "Franklin Templeton" with the positive tweets for the past week being displayed on a tree map/packed bubble.



















Reference:



1. Jeffrey beans approach on twitter text mining.

2. Opinion lexicon in english, provided by Hu and Liu : http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews."
Proceedings of the ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle,
Washington, USA, 
Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing
and Comparing Opinions on the Web." Proceedings of the 14th
International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.


Saturday, January 24, 2015

Twitter Word Cloud on #Infosec

What I did?

Step 1: Scanned the past 1500 tweets with hashtag "Info Sec" and exported them to R. 
Step 2: Data cleaning to remove punctuation and stop-words. 
Step 3: Text mining to extract the most frequent word that was used  
Step 4: Made a word cloud out of the remaining words. 

Uses of getting the world cloud real-time?

1. Keeps me up to data with what's happening around
2. Allows me to see the most-tweeted terms and topics
3. Real-time brand monitoring: I can engage the right team to react to the brand mentions and diffuse any possible negative situations.