SAS versus SQL
It's a question as old as SAS itself. Especially when you think about the fact that SQL was first created by IBM back in 1975 and SAS was incorporated in 1976. Coincidence? Yes. But, it's not a coincidence that both languages have prevailed until this day. SAS, in my humble opinion, is one of the most flexible tools ever created and SQL is a “standard” that has weathered the test of time. |
SQL - Structured Query Language
|
SAS - Statistical Analysis System
|
The attached paper, that was presented at a CMS SAS Users Group (CMSSUG) meeting, describes the use of the DATA step to group and summarize information. Note how the data must first be sorted. This is not the case when using SQL. The FIRST (dot) notation is used in the DATA step to set the value of the first record in the BY group to zero. Accumulation of values occurs and the LAST record of the group is OUTPUT. SQL may be easier to follow as the COUNT and SUM functions, accompanied by the GROUP BY, perform the same function. The SUMMARY procedure may be a better choice if multiple variables and statistics are desired.
|
|
These functions enable On-Line Analytical Processing (OLAP) requests containing ANSI SQL: 2003 window functions, which greatly enhance the functionality of the Structured Query Language. This article discusses how SAS programming techniques can be imitated within a database environment using SAS/ACCESS pass-though queries to simulate By-Group processing with (dot) notation. As mentioned above, SAS does not yet support window functions for SAS data sets.
The three examples below use SAS/ACCESS to summarize Medicare claims. Both examples to the left use "Implicit" pass-through and utilize more traditional programming. In these examples, we are summarizing all of the records and placing the total onto each record in order to calculate a percent of total. Example 4 at the top left uses two DATA steps and Example 5 at the bottom left uses a derived table within an ANSI SQL:1999 style query. Example 6 on the right uses "Explicit" pass-through containing a window function. The function is initiated by the OVER clause. Notice how only one pass of the data is required.
|
|
A full explanation of these functions and many other examples can be found at the 2011 SAS Global Forum paper listed below.
Teradata® for the SAS® Programmer: Ordered Analytical Functions, Hash Objects, ANSI SQL: 2003
http://support.sas.com/resources/papers/proceedings11/019-2011.pdf
We also have a video on this topic, check it out:
Good afternoon,
where i can find all the datasets thata are used in the examples above (e.g. table1, Mdcr_Clm) ?
Best regards,
Massimo
I have a bad feeling about example 4. The first obs of mdcr_clm would be overwritten with data of the last observation.
Edit: <grin> someone added the necessary keep statement. That was quick 🙂
That does look to be true. Thanks for pointing that out Kurt!
Because of SAS SQL "remerging", I think example 5 could also be written as:
SELECT Bene_Id, Thru_Dt, Clm_Type, Paid_Amt, SUM(Paid_Amt) AS Total_Paid FROM Mdcr_Clm
Of course many SQL folks don't like remerging.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.