BookmarkSubscribeRSS Feed

Traditional SAS Programming versus SQL

Started ‎04-06-2018 by
Modified ‎09-17-2020 by
Views 11,025

 

joins sm.png

SAS versus SQL

 

It's a question as old as SAS itself. Especially when you think about the fact that SQL was first created by IBM back in 1975 and SAS was incorporated in 1976. Coincidence? Yes. But, it's not a coincidence that both languages have prevailed until this day. SAS, in my humble opinion, is one of the most flexible tools ever created and SQL is a “standard” that has weathered the test of time.

  

SQL - Structured Query Language
  • 1975 - Created by IBM®
  • 1986 - American National Standards Institute (ANSI)
  • 1999 - Major revision, adopted as FIPS 127-2
  • 2003 - Ordered Analytical (a.k.a. Window) Functions
SAS - Statistical Analysis System
  • 1976 - Incorporated by Jim & Co.
  • 1990 - SAS Implements SQL in version 6
  • 2000 - SAS Implements SQL:1999 in version 8
  • SAS does not yet utilize Window Functions

 

Traditional Programming

 

The attached paper, that was presented at a CMS SAS Users Group (CMSSUG) meeting, describes the use of the DATA step to group and summarize information.  Note how the data must first be sorted. This is not the case when using SQL. The FIRST (dot) notation is used in the DATA step to set the value of the first record in the BY group to zero. Accumulation of values occurs and the LAST record of the group is OUTPUT. SQL may be easier to follow as the COUNT and SUM functions, accompanied by the GROUP BY, perform the same function. The SUMMARY procedure may be a better choice if multiple variables and statistics are desired.

 

/* Example 1 */
PROC SORT DATA=table1; 
 BY var1;
RUN;

DATA table2 ( DROP= numvar );
 SET table1 ( KEEP= var1 numvar);
 BY var1;
 IF FIRST.var1 THEN DO;
 cntvar = 0;
 sumvar = 0;
 END;
 cntvar + 1;
 sumvar + numvar;
 IF LAST.var1 THEN OUTPUT;
RUN;
/* Example 2 */
PROC SQL;
 CREATE TABLE table2 AS 
 SELECT var1, 
 COUNT(*) AS cntvar,
 SUM(numvar) AS sumvar
 FROM table1
 GROUP BY var1;
QUIT;
/* Example 3 */
PROC SUMMARY DATA=table1 NWAY MISSING;
 CLASS var1;
 VAR numvar;
 OUTPUT OUT=table3 
 N=cntvar 
 SUM=sumvar; 
RUN;

 

Ordered Analytical Functions

 

These functions enable On-Line Analytical Processing (OLAP) requests containing ANSI SQL: 2003 window functions, which greatly enhance the functionality of the Structured Query Language. This article discusses how SAS programming techniques can be imitated within a database environment using SAS/ACCESS pass-though queries to simulate By-Group processing with (dot) notation. As mentioned above, SAS does not yet support window functions for SAS data sets.

 

The three examples below use SAS/ACCESS to summarize Medicare claims. Both examples to the left use "Implicit" pass-through and utilize more traditional programming.  In these examples, we are summarizing all of the records and placing the total onto each record in order to calculate a percent of total. Example 4 at the top left uses two DATA steps and Example 5 at the bottom left uses a derived table within an ANSI SQL:1999 style query. Example 6 on the right uses "Explicit" pass-through containing a window function. The function is initiated by the OVER clause. Notice how only one pass of the data is required.

 

/* Example 4 */
DATA Derived_Table;
 SET Mdcr_Clm END=EndOfFile;
 Total_Paid + Paid_Amt;

KEEP Total_Paid; IF EndOfFile THEN OUTPUT; RUN; DATA Mdcr_Final; SET Mdcr_Clm; IF _N_=1 THEN SET Derived_Table; RUN;
/* Example 5 */
PROC SQL;
 CREATE TABLE Mdcr_Final AS
 SELECT 
 Bene_Id, Thru_Dt, Clm_Type, Paid_Amt,
 FROM Mdcr_Clm,
 ( SELECT SUM(Paid_Amt) AS Total_Paid
 FROM Mdcr_Clm ) AS Derived_Table;
QUIT;
/* Example 6 */
SELECT 
 Bene_Id, Thru_Dt, Clm_Type, Paid_Amt, 
 SUM(Paid_Amt) OVER () AS Total_Paid 
FROM Mdcr_Clm

Bene Id

Thru   Date

Clm Type

Paid Amt

Total Paid

1

01Jan11

IP

$500

$2300

1

05Jan11

SNF

$300

$2300

1

30Jan11

PHY

$175

$2300

2

02Feb11

IP

$750

$2300

2

14Feb11

SNF

$300

$2300

2

21Feb11

PHY

$125

$2300

2

17Mar11

HHA

$150

$2300

 

A full explanation of these functions and many other examples can be found at the 2011 SAS Global Forum paper listed below.

 

Teradata® for the SAS® Programmer: Ordered Analytical Functions, Hash Objects, ANSI SQL: 2003

http://support.sas.com/resources/papers/proceedings11/019-2011.pdf

 

We also have a video on this topic, check it out:

Comments

Good afternoon,

where i can find all the datasets thata are used in the examples above (e.g. table1, Mdcr_Clm) ?

 

Best regards,

Massimo

 

I have a bad feeling about example 4. The first obs of mdcr_clm would be overwritten with data of the last observation.

 

Edit: <grin> someone added the necessary keep statement. That was quick 🙂

That does look to be true.  Thanks for pointing that out Kurt!

Thanks for pointing that out Kurt! I have added a KEEP statement.

Because of SAS SQL "remerging", I think example 5 could also be written as:

SELECT 
 Bene_Id, Thru_Dt, Clm_Type, Paid_Amt, 
 SUM(Paid_Amt)  AS Total_Paid 
FROM Mdcr_Clm

Of course many SQL folks don't like remerging.  

Version history
Last update:
‎09-17-2020 09:22 AM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags