Architecting, installing and maintaining your SAS environment

Using Databricks or MEMSQL for Distributed Data Layer in AWS

Reply
Occasional Contributor
Posts: 10

Using Databricks or MEMSQL for Distributed Data Layer in AWS

My organization recently purchased Databricks as a complimentary analytic platform to run alongside SAS. While using Databricks natively through their notebook yields awesome performance via Spark technology, I am now wondering if I can use Databricks to replace all my current Oracle connections. Initial tests using pass-thru via ODBC has showed awesome performance gains (queries that would take hours if run in an Oracle database finish in under one to two minutes using a 11 node cluster). In spite of the potential, performance is only good when SAS does not try to orchestrate the completion of a query, and further, expecting users to only use pass-thru when connecting to Databricks is unrealistic. We are also doing a POC for MEMSQL as an alternative solution. My question, Margret or Tony, if you're reading this post please let me know if you've done any testing with these technologies :-), has anyone tried using Databricks or MEMSQL to underlay SAS 9.4 or SAS Viya? We recently purchased Viya and will be deploying in prod in late summer. We need to decide on a data layer, Oracle, Databricks, MEMSQL, or HDFS? So many options and I am hopeful someone has already blazed a path. If not I guess this will be a whitepaper for next year's SASGF! 

 

  Thanks everyone! 

Super Contributor
Posts: 276

Re: Using Databricks or MEMSQL for Distributed Data Layer in AWS

Hi @rkbright,

 

My personal views are as follows

- Databricks (Spark) / Apache Spark are in a sense could be considered as "competing" and/or "complementing" product to SAS Viya. They are both distributed in-memory Data Grid and Processing system, providing programming features beyond SQL.

 

- MEMSQL Uses memory for transactional workloads and cleanup while using disk for historical data and analysis, with SQL being the only programming interface, with extended capabilities and support to JSON, and Geospatial data.

 

- HDFS can be used to cheaply store historical data on disk, and act as a data source for uploading data into Memory for all three technologies (Databricks Spark, Viya, and MEMSQL) instead of a relational database such as ORACLE.

 

All three technologies, can read from & write to HDFS, but ultimately, each has it's own strengths, and one could be more suitable to certain task(s) than the others, but overall, the combination of the three can give your organization the ultimate Analytical Platform :-)

 

Hope this helps,

Ahmed

Ask a Question
Discussion stats
  • 1 reply
  • 153 views
  • 0 likes
  • 2 in conversation