SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

What is more efficient way to perform merge?

proc sql or data step? and why?

 /***Way1-Proc sql****/
proc sql;
create table want1 as
select a.*,b.Numerator 
from t1 as a
left join t2  as b
on a.Customer_id=b.Customer_id  and a.branch_ID=b.branch_ID
;
quit;


/***Way2-Data Step****/
proc sort data=t1;
by Customer_id branch_ID;
Run;
proc sort data=t2;
by Customer_id branch_ID;
Run;
Data want;
Retain Numerator;
Merge t1(in=a) t2(in=b);
by Customer_id branch_ID;
if a;
Run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Run both steps and look at the log. You may need sufficiently large datasets to see a difference.

If t2 (using only the three variables needed) can fit into memory, a single DATA step with a hash object will be most efficient:

data want;
set t1;
if _n_ = 1
then do;
  length numerator /* insert proper attribute here */;
  declare hash t2 (dataset:"t2 (keep=customer_id branch_id numerator)");
  t2.definekey("customer_id","branch_id");
  t2.definedata("numerator");
  t2.definedone();
  call missing("numerator");
end;
rc = t2.find();
drop rc;
run;

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Run both steps and look at the log. You may need sufficiently large datasets to see a difference.

If t2 (using only the three variables needed) can fit into memory, a single DATA step with a hash object will be most efficient:

data want;
set t1;
if _n_ = 1
then do;
  length numerator /* insert proper attribute here */;
  declare hash t2 (dataset:"t2 (keep=customer_id branch_id numerator)");
  t2.definekey("customer_id","branch_id");
  t2.definedata("numerator");
  t2.definedone();
  call missing("numerator");
end;
rc = t2.find();
drop rc;
run;
Ronein
Meteorite | Level 14

Let's say that t2 data set is located in permanent library r_r.

so the data set name is r_r.t2

does it mean that I need to write r_r.t2 in your code instead of t2?

ballardw
Super User

<Pedantic mode: on>

Any discussion of "efficient" should include a definition of what is measured and how it is used to determine efficiency. If you don't provide something to measure how do you know if something is "more efficient".

<Pedantic mode:off>

Programming time

Program maintenance

Single Run time

Data storage space

Network traffic

 

Just a few different things that might be considered in "efficient".

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 638 views
  • 1 like
  • 3 in conversation