BookmarkSubscribeRSS Feed
elsfy
Quartz | Level 8

I have this code : 

data want ; update have result ; /* updates table have with table result values */ 
by var ; 
run ; 


i would like to know the equivalent way to do this with proc SQL in order to avoid the proc sort step before the data step. 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

Surely you realize that SQL still has to do a sort in order to do the update. You just don't have to program it in SAS.

--
Paige Miller
mkeintz
PROC Star

I agree with @PaigeMiller that SQL needs to do a sort under the hood.  It may be a bit more efficient than PROC SORT because the sorting only generates utility files (or maybe even just memory objects) rather than making a complete SAS dataset file.

 

But consider the far more wordy code needed to replicate the UPDATE statement, which only replaces a value from HAVE with a value from RESULT when the new value is not missing.  As far as I know, this means coding explicit COALESCE (or COALESCEC) functions for each variable that is present in both datasets.  For example, when you have a one-to-one join, you would need to do something like this:

 

data HAVE;
input id  a b c;
datalines;
1  11 111 1111
2  .  222 2222
3  33  .  3333
run;
data RESULT;
input id a b c ;
datalines;
1 -11 -111 -1111 
2 -22 -222 -2222
3  .  -333 -3333
run;

proc sql;
  create table ljoin as
  select coalesce(h.a,r.a) as a, coalesce(h.b,r.b) as b, coalesce(h.c,r.c) as c
  from HAVE as h left join RESULT as r on h.id=r.id;
quit;

 

And the above wouldn't replicate UPDATE for a one-to-many join (UPDATE would generate only 1 record per by-group).  I leave it to those more expert in SQL to show how that would be done.

 

The point is there is far more coding required to avoid a preliminary PROC SORT when replicating UPDATE.  I'm not even sure it can be done, because the UPDATE statement depends on the within-bygroup order of incoming RESULT records.  SQL makes no commitment to honor such record order.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 812 views
  • 2 likes
  • 3 in conversation