DATA Step, Macro, Functions and more

Optimize proc sql joins

Reply
Occasional Contributor
Posts: 10

Optimize proc sql joins

I have a left join, is there a way to optimize it, i have used keyword magic=102, but did not see any improvement.

Can Proc ds2 help is this case

proc sql magic=102;
create table old_tran_prod_sync_master1 as 
select 
a.*,
b.Gluten_free,
b.Non_GMO,
b.Organic,
b.Organic_Validated,
b.Whole_Trade_Guarantee
from old_tran_prod_sync_master a 
left join product_master_file b on a.UPC = b.NAT_UPC;
quit;

::::::::::::::::::::::::LOG:::::::::::::::::::::::;;

NOTE: PROC SQL planner chooses merge join.
NOTE: Table WORK.OLD_TRAN_PROD_SYNC_MASTER1 created, with 181692824 rows and 22 columns.

842 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 15:31.88
user cpu time 5:40.43
system cpu time 5:08.82
memory 1059492.75k
OS Memory 1077260.00k
Timestamp 07/15/2017 08:59:10 AM
Step Count 78 Switch Count 231

Highlighted
Super User
Posts: 7,447

Re: Optimize proc sql joins

As already stated in your other thread (https://communities.sas.com/t5/Base-SAS-Programming/SAS-Code-Optimization/m-p/376456), 15 minutes for ~200 million observations isn't that bad.

Since you seem to have a one-to-many relationship, try this code:

proc sort data=old_tran_prod_sync_master;
by UPC;
run;

proc sort data=product_master_file;
by UPC;
run;

data old_tran_prod_sync_master1;
merge
  old_tran_prod_sync_master (in=a)
  product_master_file (
    in=b
    keep=UPC Gluten_free Non_GMO Organic Organic_Validated Whole_Trade_Guarantee
  )
;
by UPC;
if a;
run;

The sorts will only be necessary if the datasets are not already sorted by UPC.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 1 reply
  • 75 views
  • 0 likes
  • 2 in conversation