BookmarkSubscribeRSS Feed

Moving Your Programs to the World of DS2

Started ‎06-06-2017 by
Modified ‎05-17-2018 by
Views 9,033

Did you miss the Ask the Expert session on Moving Your SAS Programs to the World of DS2?  Not to worry, you can catch it on-demand at your leisure.

 

Watch the webinar

 

This session introduces you to the SAS DS2 programming language and describes how you can take advantage of its many benefits.

 

You’ll learn how to:

  • Parallelize your data step processing.
  • Work more closely with third-party RDBMS with in-database processing 
  • Work with new data types not previously available in Base SAS.
  • Convert your SAS programs to threads for parallel processing in Base SAS.
  • Use the SAS In-Database Code Accelerators for Hadoop, Teradata and Greenplum.

 

New Data Types via DS2New Data Types via DS2

 

I have added the Q&A from the session below and attached you will find the slides:

 

­A MERGE statement can take advantage of the physical order of the merged SAS data sets, but the FedSQL SET with a full join component is going to do a Cartesian cross before generating the results.  I presume this will be far slower for SAS datasets, yes?­

 

­The SQL will do a Cartesian product of the matching values only.  SAS SQL also has an optimizer that will optimize the process for maximum efficiency.  If you are creating a production job, benchmarking is recommended.­‑

 

 When do you need to use an ENDDATA statement?­

 

The ENDDATA statement terminates the DS2 code block. The statements between the DATA and ENDDATA statement are submitted for execution. The DS2 run, init, and term methods are specified between the DATA and ENDDATA statements.

 

­When proc ds2 includes a data step with by processing, I understand that record order is preserved WITHIN each by group, but the order of the by groups may not be preserved.  Is that correct?  (I am referring to SAS data sets as the data source here).­

 

Actually, it works the other way.  Record order is established by the By Grouping but the order of the records within are not preserved because the By Group processing moves the records around. 

 

When you wrote the code statement that says "Threads=4", I assume this is because there are 4 quarters in a year, but how would SAS know how to divide the task into threads?  Is it always related to variable arrays?­

 

In my example, Threads=4 was not specific to the number of quarters, it was just a coincidence.  Oftentimes, you need to test using different values of threads= and compare benchmarks.

 

­Is the processing faster to convert data step merge code to DS2 code versus just doing a proc sql join?­

 

A: It can vary by a number of factors and there is no golden rule.  Proc SQL has a built in optimizer, but you oftentimes need to look at bench marks to see which is best for your data and environment.

 

Q: ­We use "in database" processing with large oracle databases, can we continue to do oracle "in database" with DS2?­

A: ­There is not a Code Accelerator for Oracle. You will need to pull the data out of the database and then process the DS2 code in SAS.­

 

­You could use your SQL pass through queries and then use what is returned to SAS for processing on the SAS side. DS2 that process SAS data sets does not require the

In-Database Code Accelerator­.  Using DS2 allows you to apply threads to SAS data, thus threading the data step, which can help from an efficiency standpoint.

 

 Q: ­Does the Init and Term replace the First. & Last. statements? or is it additional and we could use the first and last inside the run?­

A: ­The INIT method is similar to using _n_=1 for the 1st iteration of the data step.

 

The TERM method is equivalent to the END= dataset option that can be used to track the last iteration of the data step­.

 

Q: ­Do user defined functions need to be declared before INIT? Or just before the method where they are called?­

A: It depends on what you are trying to do.  If the user-defined method needs to have an

INIT method in its code, you would include that.  If you are calling the user-defined function at a later time, it just needs to be inside one of the methods.

 

 

Q: ­is there a cheat sheet somewhere that shows what is compatible in DS2 and what is not?­

A: ­http://go.documentation.sas.com/?docsetId=ds2ref&docsetTarget=n0yp3l3ohcnz3kn1wuy8p5sl4r7d.htm&docse...  ­

 

Q: ­Do SAS ACESS modules achieve threading when query is passed through to a database such as Oracle or SQL Server?­

 

A: SAS/Access can access the data using Explicit SQL Passthrough.  Oracle offers internal parallel reads.   http://www2.sas.com/proceedings/sugi28/151-28.pdf

 

DS2 queries are single threaded unless you are using the In-Database code accelerator, which is not available for Oracle or SQL Server.­

 

Q: ­How does writing passthrough Proc SQL different from DS2 proc SQL?­

A: ­DS2 is frequently used in combination with FedSQL, which is an RDBMs

ANSI standard new language in Version 9.4.  You do not need to know database specific functions in order to process in-database.  ­

 

Q: My organization is in the midst of a very major move from Teradata to Oracle.   So will we always have to transport data to SAS and will not be able to process on databases?

A: We do not offer a Code Accelerator for Oracle as we do for Teradata.  However SAS/Access can access the data using Explicit SQL Passthrough and Oracle can process perform the SQL query in-database(internall).  Oracle offers internal parallel reads.   http://www2.sas.com/proceedings/sugi28/151-28.pdf

 

 

 

Q: ­How useful is PROC DStoDS2 for doing the conversions? ­

A: Proc DStoDS2 doesn’t do a complete code rewrite because not all code will translate, especially SAS statements that do not have a Ds2 equivalent such as Infile and Input.

Further, you might have to clean up the output code to create a DS2 program that can be compiled and executed.

 

Q: ­Are there any cases where a data step will run as quickly as a proc ds2 threaded step?­

A: ­Benchmarking is always suggested for production jobs.  In general, if the data is small, SAS can read the data sequentially in a very efficient manner, so it may not be necessary to include threads.

 

Q:­Actually, I've tested using a large dataset, the data step ran a little faster than the ds2 threaded step.  Not what I expected. ­

 

Q: Is the processing faster to convert data step merge code to DS2 code versus just doing a proc sql join?­

 

A: PROC DS2 isn't necessarily the code of choice in all instances compared to the DATA step or Proc SQL code.  This link takes you to a section in the PROC DS2 documentation about when it's best to consider using PROC DS2.

http://support.sas.com/documentation/cdl/en/ds2ref/67313/HTML/default/viewer.htm#p1shb3eskw9xh9n1orh...


The threads that are mentioned are compute threads, so your application would need to be computationally intensive before it would generally benefit from being written in PROC DS2 code.

The SAS In-Database Code Accelerator enables you to publish a DS2 thread program to the database and execute that thread program in parallel inside the database. Currently we offer the Code Accelerator to be used with PROC DS2 on Teradata, Greenplum, and Hadoop.

 

 

Is there a difference between the SAS In-Database Code Accelerator and the SAS Scoring accelerator?

 

Yes, the SAS In-Database Code Accelerator is an add-on product for Hadoop, Pivotal (Greenplum) and Teradata. It allows you to run DS2 directly in the database.

 

The SAS Scoring Accelerator enables customers to translate scoring models created in SAS® Enterprise Miner™ or SAS/STAT into database-specific functions to be deployed and then executed directly within the database environment. SAS Scoring Accelerators include Hadoop, Cloudera, Pivotal (Greenplum), Aster, Teradata, Netezza, Oracle, SAP HANA and IBM DB2)

 

Is it possible to run the DS2 code in Base SAS without any additional products?

 

Absolutely. The functionality is part of the Base SAS language and does not require SAS/Access as long as you are working with SAS datasets.

 

So, DS2 works with a number of relational databases, but there seems to be some specific advantages to having a code accelerator for Teradata, Hadoop and Greenplum.  Can you clarify the differences in how SAS and DS2 interact with the various data sources?

 

There are two ways we push DS2 code down to a MPP platform

  • SAS In-Database Code Accelerator for (Hadoop, Teradata, and Greenplum)
    • This the only way end-users can write DS2 code and have it run In-Database  and or MapReduce
  • SAS Scoring Accelerator for ….. which has more platforms supported than In-Data Base Code Accelerator
    • Only Score code generated via EM and Factory Miner.

 

For all other databases (Oracle, SQL Server, DB2…) the end-user written DS2 code will run on the SAS Server i.e. all data is returned to that SAS Server for processing by the DS2 program.

 

 

Recommended Resources
Course: DS2 Programming: Essentials
Book: The DS2 Procedure: SAS® Programming Methods at Work

 

Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.

 

NOTE: For best results when opening the attached slides, click on the “download” icon.

Comments

Thanks for the excellent webinar!  Is that really a reduction of the 1 1/2 day course on DS2 programming into one hour?  Wow on that!

 

The questions from the recorded webinar did not match the questions answered above.  Can you augment the questions list?

 

Also, Jeff Simpson was mentioned as a contact for any further questions.  How should we contact Jeff, and is his email address available?

 

I want to post this article, the WebEx URL, and other relevant the URLs in an online community on our corporate website.  Please confirm that it's OK to post that information.

 

 

This is highly condensed material of about 1 1/2 days, so this is not a replacement for the course materials.  I would highly recommend takeing the actual course if possible. You can even take it live witrh an instructor online.

 

Let's converse in the comments section right here on the public article, or you can Private Message me on the community.

 

Yes, feel free to post this Article and the On-Demand version of the Webex: .  Please cite this original source.  Thank you.

 

 

 

 

The link to this webinar is not working.

fja

Both links to the webinars are not working. Seminar would really be of interest ... even after 5y ... 

@fja 

Unfortunately, this webinar is no longer available. You can find the slides for the webinar attached at the top of the post. I apologize for any inconvenience.

Version history
Last update:
‎05-17-2018 03:24 PM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Labels
Article Tags