We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Moving Your Programs to the World of DS2 - Ask The Expert

by SAS Employee JeffSimpsonSAS ‎06-06-2017 10:15 AM - edited ‎06-08-2017 11:00 AM (1,179 Views)

Did you miss the Ask the Expert session on Moving Your SAS Programs to the World of DS2?  Not to worry, you can catch it on-demand at your leisure.

 

This session introduces you to the SAS DS2 programming language and describes how you can take advantage of its many benefits.

 

You’ll learn how to:

  • Parallelize your data step processing.
  • Work more closely with third-party RDBMS with in-database processing 
  • Convert your SAS programs to threads for parallel processing in Base SAS.
  • Use the SAS In-Database Code Accelerators for Hadoop, Teradata and Greenplum.

I have added the Q&A from the session below and attached you will find the slides:

 

Q: ­A MERGE statement can take advantage of the physical order of the merged SAS data sets, but the FedSQL SET with a full join component is going to do a Cartesian cross before generating the results.  I presume this will be far slower for SAS datasets, yes?­

 

A: ­The SQL will do a Cartesian product of the matching values only.  SAS SQL also has an optimizer that will optimize the process for maximum efficiency.  If you are creating a production job, benchmarking is recommended.­‑

________________________________________________________________

 

Q: ­When do you need to use an ENDDATA statement?­

 

A: The ENDDATA statement terminates the DS2 code block. The statements between the DATA and ENDDATA statement are submitted for execution. The DS2 run, init, and term methods are specified between the DATA and ENDDATA statements.

________________________________________________________________

 

Q: ­When proc ds2 includes a data step with by processing, I understand that record order is preserved WITHIN each by group, but the order of the by groups may not be preserved.  Is that correct?  (I am referring to SAS data sets as the data source here).­

 

A: Actually, it works the other way.  Record order is established by the By Grouping but the order of the records within are not preserved because the By Group processing moves the records around. 

________________________________________________________________

 

Q: ­When you wrote the code statement that says "Threads=4", I assume this is because there are 4 quarters in a year, but how would SAS know how to divide the task into threads?  Is it always related to variable arrays?­

 

A: In my example, Threads=4 was not specific to the number of quarters, it was just a coincidence.  Oftentimes, you need to test using different values of threads= and compare benchmarks.

________________________________________________________________

 

Q: ­Is the processing faster to convert data step merge code to DS2 code versus just doing a proc sql join?­

 

A: It can vary by a number of factors and there is no golden run.  Proc SQL has a built in optimizer, but you oftentimes need to look at bench marks to see which is best for your data and environment.

________________________________________________________________

 

Q: Is there a difference between the SAS In-Database Code Accelerator and the SAS Scoring accelerator?

 

A: Yes, the SAS In-Database Code Accelerator is an add-on product for Hadoop, Pivotal (Greenplum) and Teradata. It allows you to run DS2 directly in the database.

 

The SAS Scoring Accelerator enables customers to translate scoring models created in SAS® Enterprise Miner™ or SAS/STAT into database-specific functions to be deployed and then executed directly within the database environment. SAS Scoring Accelerators include Hadoop, Cloudera, Pivotal (Greenplum), Aster, Teradata, Netezza, Oracle, SAP HANA and IBM DB2)

________________________________________________________________

 

Q: Is it possible to run the DS2 code in Base SAS without any additional products?

 

A: Absolutely. The functionality is part of the Base SAS language and does not require SAS/Access as long as you are working with SAS datasets.

________________________________________________________________

 

Q: So, DS2 works with a number of relational databases, but there seems to be some specific advantages to having a code accelerator for Teradata, Hadoop and Greenplum.  Can you clarify the differences in how SAS and DS2 interact with the various data sources?

 

A: There are two ways we push DS2 code down to a MPP platform

            1 SAS In-Database Code Accelerator for (Hadoop, Teradata, and Greenplum)

  • This the only way end-users can write DS2 code and have it run In-Database  and or MapReduce
  1. SAS Scoring Accelerator for ….. which has more platforms supported than In-Data Base Code Accelerator
  • Only Score code generated via EM and Factory Miner.

 

For all other databases (Oracle, SQL Server, DB2…) the end-user written DS2 code will run on the SAS Server i.e. all data is returned to that SAS Server for processing by the DS2 program.

 

 

Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.

 

NOTE: For best results when opening the attached slides, click on the “download” icon.

Comments
by Contributor RedPlanet
‎06-07-2017 11:25 AM - edited ‎06-12-2017 02:36 PM

Thanks for the excellent webinar!  Is that really a reduction of the 1 1/2 day course on DS2 programming into one hour?  Wow on that!

 

The questions from the recorded webinar did not match the questions answered above.  Can you augment the questions list?

 

Also, Jeff Simpson was mentioned as a contact for any further questions.  How should we contact Jeff, and is his email address available?

 

I want to post this article, the WebEx URL, and other relevant the URLs in an online community on our corporate website.  Please confirm that it's OK to post that information.

 

by SAS Employee JeffSimpsonSAS
on ‎06-09-2017 04:30 PM

 

This is highly condensed material of about 1 1/2 days, so this is not a replacement for the course materials.  I would highly recommend takeing the actual course if possible. You can even take it live witrh an instructor online.

 

Let's converse in the comments section right here on the public article, or you can Private Message me on the community.

 

Yes, feel free to post this Article and the On-Demand version of the Webex: .  Please cite this original source.  Thank you.

 

 

 

 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.