We’re smarter together. Learn from this collection of community knowledge and add your expertise.

A tweet is just a tweet...until you analyse it!

by Super Contributor on ‎06-02-2017 03:05 PM (1,741 Views)

I recently was asked to start exploring Twitter analysis. I must admit I was nervous as I’ve tried text analytics in the past. Beyond PERL Expressions in SQL, more complex analysis of text has always seemed like magic to me.

 

My good friend @Reeza provided a link to her code, which did what I needed. Now I share it with you so you can perform the same basic analyses. FreeDataFriday_graphic.jpg

 

Get the Data

Kaggle is quickly becoming one of my favourite sites for data; this dataset is no exception.  You can get the file here.

 

How to go about getting SAS University Edition

If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.

 

Getting the data ready

The data was a straight import into SAS University Edition.  It took a little longer than normal on my computer, but that was due to the size of the dataset.  

 

The Results

Airlines have been getting pretty bad press lately, and I wanted to see if this was evident in tweets about them. There is a column called "Negative Reason Confidence" which is an indicator of how certain we can be that any tweet labelled as "Negative" is actually negative.  I used a simple bar chart and set it up like so:

 

Image1.png

 image2.png

 

A couple of things to note: 1) I'm using a Where clause to limit my data output, and 2) I've selected the Show Bar Labels.  When I run the task, I get the following graph:

 

iamge3.png

 

A friend who does sentiment analysis for a company says anything over 70% indicates very strong confidence in the tone of the tweet. It's a good bet that the messages flagged as Negative are from unhappy customers.  In this analysis, US Airways leads the others.

 

Next, I want to take a look at the actual tweets themselves.  I first create a new table of just the contents of the tweet, which is in a column "Text":

 

image4.png

Then I run it through the code generously provided by @Reeza:

image5.png

 

This code splits the tweet text from a horizontal string and transposes it to one word per row, making analysis much easier.

 

With the new output, I then run two SQL queries - one for the number of times the airline was tweeted, and the second with the number of hashtags used: 

 

image4.png

 

United clearly gets the majority of tweets:

image6.png

Here is the breakdown by hashtag; they appear to be largely referring to the Airline (#Jetblue), generic (#travel) or negative (#badservice, #neveragain). 

 

image8.png

When I have more time I'd love to explore this data more and see what other, more interesting things I can find. Suggestions for analyses are more than welcome! 

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Need data for learning?

 

The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:

 

4.png

 

We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:

 

9.png

 

Click Analytics U, then select "Subscribe" from the Options menu.

 

Happy Learning!

 

 

 

Comments
by Trusted Advisor
on ‎06-02-2017 07:01 PM

FYI, there are some papers on using SAS Visual Analytics to do Twitter analysis which also include network analysis which may be of interest too:

 

From Traffic to Twitter - Exploring Networks with SAS Visual Analytics® from @FalkoSchulz and @Nascif_SAS

 

Bringing Google Analytics, Facebook and Twitter Data to SAS Visual Analytics from I-Kong Fu

by Super Contributor
on ‎06-05-2017 08:45 PM

@MichelleHomes - very cool!  One day I will play with VA, I have made that promise to myself :-) 

 

Hope all is well with you and the family!

Chris

by Trusted Advisor
on ‎06-06-2017 12:30 AM

#lifelearner!!! 

 

Cheers,

Michelle

by New Contributor Prof_Jim
on ‎07-04-2017 02:16 PM

Great job!

 

Transform all text to uppercase (or lowercase) to further combine the hashtag frequency output?

 

Just a thought...

 

Jim

by Super Contributor
on ‎07-09-2017 09:03 PM

Thanks @Prof_Jim, that's a really good suggestion and one I should have thought of :-)  If i can I'll rerun the code and update the post!

 

appreciate your time!

Chris

by New Contributor Prof_Jim
on ‎07-09-2017 11:03 PM
My pleasure, Chris.

Best Regards,

Jim
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.