DATA Step, Macro, Functions and more

Simplify data merge code

Reply
Frequent Contributor
Posts: 99

Simplify data merge code

[ Edited ]
proc sort data=have1;by var1;run;
data have2 (rename =(var1er=var1));set have2; run;
proc sort data=have2;by var1;run;

data want;
merge have1(in=a) have2(in=b);
by var1; if a; run;


Is it possible to simplify this?

Trusted Advisor
Posts: 1,137

Re: Simplify data merge code

Posted in reply to afiqcjohari
Not sure, in the code the statement 'if a' is doing anything.

we could remove the same as there is no by statement.
Thanks,
Jag
Frequent Contributor
Posts: 99

Re: Simplify data merge code

Posted in reply to Jagadishkatam
Edited for the by statement.
Super User
Posts: 7,863

Re: Simplify data merge code

Posted in reply to afiqcjohari

No extra step for the rename is needed:

proc sort data=have1;
by var1;
run;

proc sort data=have2;
by var1er;
run;

data want;
merge
  have1 (in=a)
  have2 (
    in=b
    rename=(var1er=var1)
  )
;
by var1;
if a;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 99

Re: Simplify data merge code

Posted in reply to KurtBremser
Is it possible to take out the sorting part? Could it be done within the data step merge?
Super User
Posts: 7,863

Re: Simplify data merge code

Posted in reply to afiqcjohari

afiqcjohari wrote:
Is it possible to take out the sorting part? Could it be done within the data step merge?

To merge, you always have to sort. Even if you use proc sql and you do not have to sort explicitly, the sql procedure will do it automatically, which basically uses the same resources and needs the same time. If you need the same order repeatedly, an explicit sort is the way to go.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 99

Re: Simplify data merge code

Posted in reply to KurtBremser

KurtBremser wrote:

afiqcjohari wrote:
Is it possible to take out the sorting part? Could it be done within the data step merge?

To merge, you always have to sort. Even if you use proc sql and you do not have to sort explicitly, the sql procedure will do it automatically, which basically uses the same resources and needs the same time. If you need the same order repeatedly, an explicit sort is the way to go.


The initial code was in proc sql, but I prefer data step merge because it can create multiple tables. Had it been in proc sql, I need to write separate proc sql for each different table. But I really want to avoid writing the 2 proc sorts, if possible. But it seems that it's not. What I don't get is, why if data merge necessitates the 2 tables to be sorted by the common variable, shouldn't data merge internally 'know' that the 2 tables will need to be sorted? Meaning, data merge should by default sort these 2 variables implicitly.

 

Is there a case where data merge could be called without the tables being sorted in the first place?

Super User
Posts: 7,863

Re: Simplify data merge code

Posted in reply to afiqcjohari

The merge statement in a data step is very simple and relies on the given order of the datasets. This is its natural behaviour. Therefore the sort is necessary if you use by with a merge.

This simplicity is one of the reasons why sort & merge often outperforms SQL (which sorts implicitly).

 

A proc sort is usually three lines of code. I type that without really thinking in < 10 seconds. And I'm no touch-typist. To make your work easier, you can define a keyboard macro in EG that writes the sort step for you, so you only have to fill in the dataset and the by variables.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,997

Re: Simplify data merge code

Posted in reply to afiqcjohari

Something like:

proc sql;
  create table WANT as
  select A.VAR1ER as VAR1,
           B.*
  from   HAVE1 A
  left join HAVE2 B
  on      A.VAR1=B.VAR1;
quit;
Frequent Contributor
Posts: 99

Re: Simplify data merge code

Thank you. But one of the feature I like about SAS is the data merge step. It's actually pretty elegant minus the need to sort step. Smiley Happy. Actually the real code outputs to many tables which data merge is very good at. Writing in proc sql would mean 2 different proc sql for 2 different tables.
Super User
Super User
Posts: 7,997

Re: Simplify data merge code

Posted in reply to afiqcjohari

Well, SQL was developed to handle relational databases, so combining data is the key functioanlity of it.  True there are times when either can be better.  As far as this question, we only see a tiny part of the problem apparently, so can only guess.

Frequent Contributor
Posts: 99

Re: Simplify data merge code

[ Edited ]

I really appreciate your answer. What I don't understand is why does SAS require the programmer to explicitly write the 2 proc sorts on the 2 tables? Why can't data merge does it at one go? Because as per my understanding, data merge requires the tables to be sorted. It could have been more concise if data merge can implicity sort the tables by itself.

Super User
Posts: 7,863

Re: Simplify data merge code

Posted in reply to afiqcjohari

afiqcjohari wrote:

I really appreciate your answer. What I don't understand is why does SAS require the programmer to explicitly write the 2 proc sorts on the 2 tables? Why can't data merge does it at one go? Because as per my understanding, data merge requires the tables to be sorted. It could have been more concise if data merge can implicity sort the tables by itself.


Misconception. Merge on its own does NOT require the datasets to be sorted. Merge with a by does need sorting.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,518

Re: Simplify data merge code

Posted in reply to KurtBremser

Perhaps an important point .... MERGE with a BY statement requires that your data set is already in order before the DATA step begins.  If it is already in order without running PROC SORT, that is sufficient.  You only need to run PROC SORT when the data is not in the right order.

Super User
Posts: 11,343

Re: Simplify data merge code

Posted in reply to Astounding

Also the BY statement in a data step is used with other things than just MERGE, UPDATE or MODIFY statements in a data step.

And the BY statement supports the NOTSORTED and DESCENDING options and trying to "guess" all the time which combination a user "intended" might be very problematical.

 

And since a basic sort statement is 3 lines of code including the "run;" I am surprised this is much of an issue at all.

Ask a Question
Discussion stats
  • 21 replies
  • 193 views
  • 5 likes
  • 7 in conversation