BookmarkSubscribeRSS Feed

Accessing Social Media data (and more) from SAS Viya with embedded CData JDBC drivers

Started ‎03-30-2022 by
Modified ‎03-30-2022 by
Views 2,356

Back in version 2021.2.1, several third-party JDBC drivers have been included in SAS Viya. These drivers come from CData Software, a company providing data access and connectivity solutions. This allowed SAS Viya to expand quickly its connectivity with social media platforms and file hosting services.

 

So, now with SAS Viya, both in SAS Compute Server and CAS, using JDBC under the covers, we can transparently access data from:

 

  • Facebook
  • Google Analytics
  • Google Drive
  • Microsoft OneDrive
  • OData
  • Twitter
  • YouTube Analytics

 

Unless you've been living under a rock for the past 2 decades, Facebook and Twitter are the most famous social media platforms. They enclose lots of text data that companies are eager to analyze to monitor the reputation and success of their brand/products, using for example SAS Text Analytics capabilities.

 

Google Analytics is a service used to track website activity. It collects a lot of metrics on your website that you can further analyze in SAS Viya.

 

YouTube Analytics is a similar service dedicated to YouTube video consumption. This allows to check the relevance of your videos and performance of your channels.

 

Google Drive and Microsoft OneDrive are popular file hosting services. The CData JDBC drivers enable users to interact with their drive's contents from SAS Viya.

 

OData is a protocol that defines rules for building and consuming RESTful APIs (https://www.odata.org/). When a REST API supports it, the CData driver can leverage it in a relational manner, making it easy to work with.

 

How to validate you have them?

 

In a SAS Viya Compute session, you can simply run this code to get the current CLASSPATH:

 

options sastrace="d,,," sastraceloc=saslog nostsuffix ;
libname test jdbc url="jdbc:dummy" ;

 

It should give the following result:

 

JDBC: attempting JDBC connection: jdbc:dummy
JDBC: CLASSPATH:
/opt/sas/viya/home/SASFoundation/lib/access/hadoop/access-hadoop-hivehelper-2.1.7.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.facebook.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.googleanalytics.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.googledrive.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.odata.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.onedrive.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.twitter.jar
/opt/sas/viya/home/lib64/accessclients/jdbc/cdata.jdbc.youtubeanalytics.jar

 

You can then notice the presence of the 7 CData JDBC drivers. You can use them in SAS Compute as well as in CAS.

 

Where to find documentation?

 

From a SAS perspective, the drivers are described here. This page provides the appropriate links to the corresponding CData driver documentation which is very helpful. You will find information about the different driver's properties and also about the Data Model which helps you figure out which tables you want to access.

 

nir_post_74_01_cdata_doc.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

The tricky part is probably to setup the connection. The authentication against those platforms can be complex, can require elevated privileges and can vary from platform to platform.

 

An example

 

Let's try to access some tweets! My JDBC connection string looks like:

 

jdbc:twitter:UseAppOnlyAuthentication="True";InitiateOAuth="GETANDREFRESH";OAuthClientSecret="9CySY7Ph1YfW6N4K...";_persist_oauthaccesstokensecret=WhY5Rs6iStfp...

 

I can then use that string simply in my libname statement (through a macro-variable in my case), and list the tables that are part of that library:

 

%let tw_string=%nrbquote(jdbc:twitter:UseAppOnlyAuthentication="True";InitiateOAuth="GETANDREFRESH";OAuthClientSecret="9CySY7Ph1YfW6N4K...";_persist_oauthaccesstokensecret=WhY5Rs6iStfp...) ;

libname tw jdbc
   url="&tw_string" 
   preserve_names=yes ;

/* List tables and views */
proc datasets lib=tw ;
quit ;

 

 

The CData JDBC driver for Twitter exposes the following tables:

 

nir_post_74_02_twitter_tables.png

 

I can then extract tweets containing a specific string:

 

/* Search a string */
/* Pseudo-column to specify a search at run time */
data tweets ;
   set tw.tweets(obs=100) ;
   where SearchTerms="Olympics" ;
run ;

 

or discover what are the Twitter trending topics in a particular location:

 

/* Extract Trends from another country */
/* WoeId = WhereOnEarth Id */
/* https://www.findmecity.com/index.html */
data trends ;
   set tw.trends ;
   where WoeId="23424819" ;
run ;

 

Some of the CData drivers are bi-directional. Twitter is in this case. So, I can inject data in the Data Model to send some tweets with dynamic text from SAS Viya!

 

/* Get table information */
data _null_ ;
   dsid=open("sashelp.prdsale") ;
   call symput("nbvars",strip(put(attrn(dsid,"NVARS"),8.))) ;
   call symput("nbobs",strip(put(attrn(dsid,"NOBS"),8.))) ;
   dsid=close(dsid) ;
run ;

/* Insert a Tweet */
proc sql ;
   insert into tw.tweets(Text) values ("Hello from GEL: my favorite SASHELP.PRDSALE table has &nbvars variables and &nbobs observations !") ;
quit ;

 

Which will post this:

nir_post_74_03_write_tweet.png

 

 

Thanks to my colleague Robin Thomson for his great help on this topic. Thanks for reading.  

 

Find more articles from SAS Global Enablement and Learning here.

Comments

Do you also have an example for the OneDrive connector? 

During the Authentication process SAS crashes with the message "ERROR: Out of memory."

 

An example on how to do the authentication would be highly appreciated

Version history
Last update:
‎03-30-2022 01:36 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels