BookmarkSubscribeRSS Feed

SAS and Databricks: Your Practical Guide to Data Access and Analysis

Started 3 weeks ago by
Modified 3 weeks ago by
Views 865

Let's get practical! Whether your goal is to construct an analytics pipeline with advanced feature engineering or to develop a data engineering pipeline for monitoring data quality, generating synthetic data, or other tasks, machine learning plays a crucial role, either directly or behind the scenes. It's important to aim for cost-effectiveness, ensuring that you can accomplish more and faster without incurring high cloud charges for you and your department.

 

In a previous blog post by Jarno Lindqvist on Databricks and SAS, reference was made to an independent study conducted by the Futurum group, which demonstrated that the SAS in-memory analytics computational engine outperformed all competitors and test scenarios by an average factor of 30. Transitioning data residing on a Databricks platform into SAS in-memory is therefore what you want to do, and it is as simple as flipping a switch. Here's how it's done.

 

Create a connection to Databricks from SAS Viya

This is what I do to create a connection to Databricks from SAS Viya:

I sign into SAS Data Explorer on the SAS Viya server - March 2024 release.  

CecilyHoffritz_0-1712812175504.png

 

I select Add to create a new data connection.

CecilyHoffritz_1-1712812175510.png

 

I select Databricks (in this case the Azure variant because Azure is where my demo Databricks environment resides).

 

CecilyHoffritz_2-1712812175526.png

 

I fill out connection details. In my case, I have privileges to create a shared Global library.

CecilyHoffritz_3-1712812175537.png

 

I authenticate with user and access token, you will most likely be authenticating through single sign on.

CecilyHoffritz_4-1712812175546.png

 

I test the connection and then I save and connect. The Databricks tables become visible.

CecilyHoffritz_5-1712812175547.png

 

I can view a sample of the Databricks table that I want to use, and notice, that I am using a Databricks spark engine CecilyHoffritz_6-1712812175548.png for that.

CecilyHoffritz_7-1712812175553.png

 

I can conduct in-database processing directly within Databricks, enabling me to perform tasks such as joining and subsetting data. However, my primary focus now is on loading my Databricks data into SAS memory. To accomplish this, all I need to do is click on the lightning icon located in the upper left corner.

 

CecilyHoffritz_8-1712812175559.png

 

The “umbilical cord” that connects the Databricks source with its SAS in-memory counterpart is now established and data can now flow freely to the highly parallelized, multi-mode in-memory analytics engine that SAS offers. 

CecilyHoffritz_9-1712812175562.png

Transferring data into SAS memory manually or scripted and automated truly is as simple as flipping a switch!

 

Other noteworthy benefits of Databricks data in SAS memory

Apart from the remarkable speed highlighted in the previously mentioned independent study, SAS in-memory execution offers a plethora of other advantages. One notable benefit is the ability to share in-memory data among users, eliminating the need for each user to maintain their own copy of the data, thus optimizing memory usage and reducing the need for additional expensive cloud resources. Additionally, this data is readily accessible within SAS Analytical Lifecycle applications, enabling functions such as data cataloging, dashboard creation, advanced analytics for informed decision-making, and more. This accessibility promotes collaboration among diverse stakeholders, including business users, data scientists, data engineers, and others, facilitated by the smooth collaboration features of these applications.

 

Working smart with your data

Once seamlessly integrated with Databricks and data loaded into SAS memory, a world of possibilities unfolds! You can establish data and model pipelines, trace lineage, conduct comprehensive statistical analyses, and create extensive reports and dynamic dashboards, all with lightning speed.

CecilyHoffritz_10-1712812175579.png

 

Figure 1: Overview of actions in SAS Data Explorer


Here's an example of an extensive data analysis, utilizing decision trees, logistic regression, and various other techniques. Automatic explanations with information on fairness and bias for trustworthy AI accompany the generated analytics, enriching the understanding of results. This entire process is conducted on the SAS in-memory server, guaranteeing optimized processing capabilities.

CecilyHoffritz_11-1712812175589.png

Figure 2: Overview of SAS Visual Analytics report with tabs for various explorations

 

The lineage view below has been incrementally expanding, revealing the progression of my work so far:

Establishing a connection to Databricks, transferring data to SAS memory, and executing visualized analytical explorations within SAS memory.

CecilyHoffritz_12-1712812175591.png

Figure 3: Overview of SAS Lineage and relationship between visualization report and data.


If, a month from now, I find myself needing to refresh my memory about the many valuable assets I've developed or seeking access to assets crafted by my colleagues (subject to appropriate permissions), I can effortlessly locate them through the SAS Information Catalog.
Additionally, I can delve deeper into the business metadata applied by the data owner to gain further insights.

CecilyHoffritz_13-1712812175603.png

Figure 4: Overview of SAS Information Catalog with info on privacy, contacts, tags, glossary terms and more.

 

Summary

The integration of Databricks data into SAS memory is showcased as a powerful tool for efficient analytics. The process allows for lightning-fast processing and collaborative exploration, with benefits including optimized memory usage and shared data access. Users can seamlessly establish connections, transfer data, and conduct in-depth analyses using SAS Visual Analytics and more. The SAS Information Catalog is highlighted as essential for asset retrieval and metadata exploration, empowering users to leverage their data assets effectively.

 

Learn more about SAS and Databricks

Expand your SAS and Databricks knowledge further by exploring these blogs by my colleagues, and anticipate our upcoming blogs posted every Wednesday in SAS Communities.

Harness the analytical power of your Databricks platform with SAS - SAS Support Communities

Data everywhere and anyhow! Gain insights from across the clouds with SAS

Elevated efficiency and reduced cost: SAS in the era of Cloud Adoption

Version history
Last update:
3 weeks ago
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags