BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shefer
Calcite | Level 5

Ultimately I am aiming to identify social media topics that will be trending in a few hours (for instance to help bloggers in writing about topics their fan base will be interested in). Not only does the streaming API allow access to a greater sample of the data than the REST API, it also supplies current tweets as opposed to the past 100 tweets of a topic which could go back as far as 2 days. I will also be using the REST API in my analysis, but getting that data was the easy part Smiley Happy

As mentioned above, I have already succeeded in getting the streaming API data using Proc HTTP. The problem is that I cannot create an automated process where the data is downloaded, imported and predictions are made if I can never get to the importing stage. Since my previous post, I have also tried using HTTPBuilder as a replacement for Proc HTTP hoping that Groovy will allow me to stop downloading at some stage and reconnect to the API later, but I keep getting a “401 Unauthorized” error. Once I figure that out I might find a solution using Groovy, but I would really appreciate other suggestions.

FriedEgg
SAS Employee

PROC HTTP is really not designed to deal with a stream.  As you have now witnessed.  Since Twitter will basically never terminate delivering data, you will never move past this step in your code without some unexpected exception occurring.  Before you proceed you will need to determine how you wish to interact with the steaming data and under what conditions you want to proceed from data collection to analysis.

FriedEgg
SAS Employee

Here is something I quickly worked up for the Twitter Streaming API

filename cp  temp;

filename ivy "%sysfunc(pathname(work,l))/ivy.jar";

proc http

   method = 'get'

   url    = 'http://central.maven.org/maven2/org/apache/ivy/ivy/2.3.0-rc1/ivy-2.3.0-rc1.jar'

   out    = ivy

   ;

run;

proc groovy classpath=cp;

   add classpath=ivy;

   add sasjar="groovy_2.1.3" version="2.1.3.0_SAS_20130517000930";

   submit parseonly;

      import twitter4j.Status

      import twitter4j.StatusListener

      import twitter4j.TwitterStreamFactory

      import twitter4j.conf.ConfigurationBuilder

     

      import java.util.concurrent.LinkedBlockingQueue

      import java.util.concurrent.TimeUnit

     

      @Grapes([

              @Grab(group='org.twitter4j', module='twitter4j-core', version='3.0.3'),

              @Grab(group='org.twitter4j', module='twitter4j-stream', version='3.0.3'),

      ])

     

      class TwitterStream {

     

          final private TOTAL_TWEETS = 1000

     

          public ArrayList<Status> open() {

              def cb = new ConfigurationBuilder()

              cb.setDebugEnabled(false)

                      .setOAuthConsumerKey("--YOUR API key--")

                      .setOAuthConsumerSecret("--YOUR API secret--")

                      .setOAuthAccessToken("--YOUR Access token--")

                      .setOAuthAccessTokenSecret("--YOUR Access token secret--")

     

              def twitter = new TwitterStreamFactory(cb.build()).getInstance()

     

              def tweets = new LinkedBlockingQueue<Status>(10000)

     

              def listener = [

                      onStatus: { Status st ->

                          if (st.user.lang == 'en') {

                              tweets.offer(st)

                          }

                      },

                      onDeletionNotice: {},

                      onException: {}

              ] as StatusListener

              twitter.addListener(listener)

              twitter.sample()

     

              def collected = new ArrayList<Status>(TOTAL_TWEETS)

              while (collected.size() < TOTAL_TWEETS) {

                  final Status status = tweets.poll(10, TimeUnit.SECONDS)

                  if (status == null) {

                      continue

                  }

                  collected.add(status)

              }

              twitter.cleanUp()

     

              return collected

          }

      }

   endsubmit;

   submit;

      import java.util.ArrayList;

      import java.util.Iterator;

      import twitter4j.Status;

     

      public class TwitterStreamSAS {

          public void main() {

              TwitterStream stream = new TwitterStream();

              tweets = stream.open();

              iter = tweets.iterator();

          }

     

          public boolean hasNext() {

              return iter.hasNext();

          }

     

          public boolean getNext() {

              if (!hasNext()) {

                  return false;

              }

              tweet = ((Status) (iter.next()));

              String text = tweet.getText();

              if (!isAsciiPrintable(text)) {

                  if (!hasNext()) {

                      return false;

                  }

                  getNext();

              }

              return true;

          }

     

          public String getText() { return tweet.getText(); }

     

          public String getScreenName() { return tweet.getUser().getScreenName(); }

     

          protected ArrayList tweets;

          protected Iterator iter;

          protected Status tweet;

     

          private static boolean isAsciiPrintable(String str) {

              if (str == null) {

                  return false;

              }

              int sz = str.length();

              for (int i = 0; i < sz; i++) {

                  if (!isAsciiPrintable(str.charAt(i))) {

                      return false;

                  }

              }

              return true;

          }

     

          private static boolean isAsciiPrintable(char ch) {

              return ch >= 32 && ch < 127;

          }

      }

   endsubmit;

run;

options set=classpath "%sysfunc(pathname(cp,f))";

data twitter_stream;

   length screenName text $ 140;

   dcl javaobj stream("TwitterStreamSAS");

   stream.callVoidMethod("main");

   stream.callBooleanMethod("hasNext", rc);

   do while(rc);

      stream.callBooleanMethod("getNext",rc);

      if rc=0 then leave;

      stream.callStringMethod("getScreenName", screenName);

      stream.callStringMethod("getText", text);

      output;

      stream.callBooleanMethod("hasNext", rc);

   end;

run;

shefer
Calcite | Level 5

Thank you so much - that is exactly what I need!! You are a genius! Smiley Happy

FriedEgg
SAS Employee

I'm glad you find it useful.

William
Calcite | Level 5

Hi FriedEgg,

Do you know how to add date range into search query?

For a example, in Twitte searching url, I could type in

...q=%23SASGF13%20since%3A2014-12-01%20until%3A2014-12-31

But if I assign

%let search_query = %23SASGF13%20since%3A2014-12-01%20until%3A2014-12-31;

The program will always give empty result.

thanks, William

FriedEgg
SAS Employee

The search query looks correct to me, but I am not surprised by the empty result.  I don't think (m)any people were tweeting about the SAS Global Forum 2013 at the end of 2014...

William
Calcite | Level 5

Sorry, I might not correct express myself.

First I searched on the Twitter directly, and the result looks good.

WSIB since:2014-12-01 until:2014-12-31 - Twitter Search

(no idea, the link above automatically be conveted into this form)

Then I copy the query into the SAS program

%let search_query = WSIB%20since%3A2014-12-01%20until%3A2014-12-31;

then submitting the SAS program. The SAS program run well, no any error/warning message, but just empty result dataset.

NOTE: The data set WORK.TWITTER has 0 observations and 16 variables.

NOTE: DATA statement used (Total process time):

      real time           1.14 seconds

      cpu time            0.01 seconds

So I guess search_query syntax might be incorrect (when %let search_query = WSIB;. The SAS program will create a 100 obs result dataset correctly)

FriedEgg
SAS Employee

Okay, I recall something else now.  Twitter search api only provides data for the last 6-9 days.  I would look over their API to confirm:

https://dev.twitter.com/rest/public/search

William
Calcite | Level 5

thanks FriedEgg, you are right.

I checked with Twitter API page,

The Twitter Search API is part of Twitter’s v1.1 REST API. It allows queries against the indices of recent or popular Tweets and behaves similarily to, but not exactly like the Search feature available in Twitter mobile or web clients,

Now I am searching for other choice to retrieve old tweetes

FriedEgg
SAS Employee

Twitter has a number of data partners that provide access to older tweets.  One such company is DataSift, which has approximately two years worth of tweets.

McStagger
Calcite | Level 5

Will this SAS code work in JMP?

FriedEgg
SAS Employee

Assuming your JMP can submit SAS code to a local/remote installation of SAS 9.3+, then yes.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 57 replies
  • 10412 views
  • 8 likes
  • 8 in conversation