Did you miss the Ask the Expert session on Moving Your SAS Programs to the World of DS2? Not to worry, you can catch it on-demand at your leisure.
This session introduces you to the SAS DS2 programming language and describes how you can take advantage of its many benefits.
You’ll learn how to:
I have added the Q&A from the session below and attached you will find the slides:
A MERGE statement can take advantage of the physical order of the merged SAS data sets, but the FedSQL SET with a full join component is going to do a Cartesian cross before generating the results. I presume this will be far slower for SAS datasets, yes?
The SQL will do a Cartesian product of the matching values only. SAS SQL also has an optimizer that will optimize the process for maximum efficiency. If you are creating a production job, benchmarking is recommended.‑
When do you need to use an ENDDATA statement?
The ENDDATA statement terminates the DS2 code block. The statements between the DATA and ENDDATA statement are submitted for execution. The DS2 run, init, and term methods are specified between the DATA and ENDDATA statements.
When proc ds2 includes a data step with by processing, I understand that record order is preserved WITHIN each by group, but the order of the by groups may not be preserved. Is that correct? (I am referring to SAS data sets as the data source here).
Actually, it works the other way. Record order is established by the By Grouping but the order of the records within are not preserved because the By Group processing moves the records around.
When you wrote the code statement that says "Threads=4", I assume this is because there are 4 quarters in a year, but how would SAS know how to divide the task into threads? Is it always related to variable arrays?
In my example, Threads=4 was not specific to the number of quarters, it was just a coincidence. Oftentimes, you need to test using different values of threads= and compare benchmarks.
Is the processing faster to convert data step merge code to DS2 code versus just doing a proc sql join?
A: It can vary by a number of factors and there is no golden run. Proc SQL has a built in optimizer, but you oftentimes need to look at bench marks to see which is best for your data and environment.
Is there a difference between the SAS In-Database Code Accelerator and the SAS Scoring accelerator?
Yes, the SAS In-Database Code Accelerator is an add-on product for Hadoop, Pivotal (Greenplum) and Teradata. It allows you to run DS2 directly in the database.
The SAS Scoring Accelerator enables customers to translate scoring models created in SAS® Enterprise Miner™ or SAS/STAT into database-specific functions to be deployed and then executed directly within the database environment. SAS Scoring Accelerators include Hadoop, Cloudera, Pivotal (Greenplum), Aster, Teradata, Netezza, Oracle, SAP HANA and IBM DB2)
Is it possible to run the DS2 code in Base SAS without any additional products?
Absolutely. The functionality is part of the Base SAS language and does not require SAS/Access as long as you are working with SAS datasets.
So, DS2 works with a number of relational databases, but there seems to be some specific advantages to having a code accelerator for Teradata, Hadoop and Greenplum. Can you clarify the differences in how SAS and DS2 interact with the various data sources?
There are two ways we push DS2 code down to a MPP platform
For all other databases (Oracle, SQL Server, DB2…) the end-user written DS2 code will run on the SAS Server i.e. all data is returned to that SAS Server for processing by the DS2 program.
Course: DS2 Programming: Essentials
Book: The DS2 Procedure: SAS® Programming Methods at Work
Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.
NOTE: For best results when opening the attached slides, click on the “download” icon.