BookmarkSubscribeRSS Feed
alepage
Barite | Level 11

Hello,

 

I our old SAS programs, we can see merge statements to make some join. Now I using proc SQL with join statement which I prefer.

 

But I have found other SAS programs  that are using many set statements to make some kind of join.

Could it be possible to provide me information on that subject , how do is works ?  Also, the use of word key / unique with the set statement and so on.

 

 

 

 

7 REPLIES 7
ballardw
Super User

You would have to provide example data and code because there are multiple things that get involved with multiple Sets and not even going to attempt to summarize them in a generic question. That would be the subject of one or more papers.

 

Some of those codes could very well take PAGES of sql code to do what is done in five or six lines of data step code.

 

Please do not think of "data step" and "join" in the same context as there are so many differences that you will confuse yourself thinking "this is an xxxx join done in data step".

Reeza
Super User

See page 7 here

https://support.sas.com/resources/papers/proceedings15/2219-2015.pdf

 

Read the documentation for the KEY option and see the example in the link

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p00hxg3x8lwivcn1f0e9axziw57y.htm#n...

 

Another possibly useful paper:

https://support.sas.com/resources/papers/proceedings/proceedings/sugi25/25/po/25p234.pdf

 


@alepage wrote:

Hello,

 

I our old SAS programs, we can see merge statements to make some join. Now I using proc SQL with join statement which I prefer.

 

But I have found other SAS programs  that are using many set statements to make some kind of join.

Could it be possible to provide me information on that subject , how do is works ?  Also, the use of word key / unique with the set statement and so on.

 

 

 

 


 

alepage
Barite | Level 11
I saw the second reference on the Key option. Does this approach is still in use or it has been replaced by the use of proc sql; ... left join and so on
Reeza
Super User
Still in use - it's quite efficient resource wise, as data steps process row by row, whereas SQL will load into memory. Basically you are more likely to run out memory with SQL than SET statements. FYI I used this method to join two datasets with hundreds of millions of rows with a 4GB RAM machine (took 20 hours) but it worked. This was almost two decades ago though.
habeebao
Calcite | Level 5
A join through proc SQL allows you to join on a variable. I've found it's more efficient when I'm creating a table with shared observations; whereas a set will stack my tables, not necessarily joining them. It's helpful if my information is already formatted for existing variables.
ballardw
Super User

SET does much more than "stack". Depends on specific syntax used which is why we asked for examples.

 

Consider this code that you should be able to run as it uses two sets SAS supplies for practice:

data junk;
   set sashelp.class (where=(sex='F'));
   set sashelp.cars  (where=(type='Sedan'));
run;

No "stacking" going on there.

 

mkeintz
PROC Star

@habeebao wrote:
A join through proc SQL allows you to join on a variable. I've found it's more efficient when I'm creating a table with shared observations; whereas a set will stack my tables, not necessarily joining them. It's helpful if my information is already formatted for existing variables.

SET does a lot more than stack.

 

Consider two datasets QTR (4 per year) and YEAR, each sorted by ID/DATE.   And you want to create a new quarterly dataset with the current QTR record and the most recent YEAR record.  The problem is that QTR has 4 distinct dates, but YEAR has only one date, so joining on the variable DATE is not so simple.

 

You might use 

 

proc sql noprint;
 create table yrqtr as
 select * 
   from qtr
 left join 
   year (rename=(date=yr_date))
 on year.id=qtr.id
 and yr_date<=date<intnx('year',yr_date,1,'s')
 order by id,date;
quit;

which is not too complex.  But perhaps you also want to join a MONTH data set (12 dates per year).  And maybe a YEAR record is missing, so you want to carry it forward until the next YEAR record appears.

 

With conditional SET statements, and sorted (by ID/DATE) datasets, this is trivial in a DATA step with SET statements:

 

data YQM ;
  merge YEAR (in=iny keep=id date)
        QTR (in=inq keep=id date)
        MONTH (in=inm keep=id date);
  by id date;
  if iny then set YEAR (rename=(date=YR_date));
  if inq then set QTR (rename=(date=QTR_date));
  if inm then set MONTH (rename=(date=MON_date));
  if inm then output;
  if last.id then call missing(of _all_);
run;

This simple code produces one record per month, containing the current month, most recent quarter, and most recent year.  It is not just a stacking of the three datasets.  It is a (complex) join based on ID and DATE.  

 

I suspect this is even more beneficial when you don't have intervals nested within each other (month within qtr within year).  Say you have three event-driven data sets (admission/discharge, services, and tests), each sorted by ID/DATE.  Consider the SQL code you would need to generate, for each event, no matter what type, a record with the most recent data from each of the three data sets.

 

data event_carried_forward;
  merge admdis (in=inA keep=id date)
        srvcs (in=inS keep=id date)
        tests (in=inT keep=id date);
  by id date;
  if inA then set admdis (rename=(date=date_A));
  if inS then set srvcs (rename=(date=date_S));
  if inT then set tests (rename=(date=date_T));
  output; /* Output a record for every event. No IF test needed*/
  if last.id then call missing (of _all_);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1344 views
  • 7 likes
  • 5 in conversation