I have a left join, is there a way to optimize it, i have used keyword magic=102, but did not see any improvement.
Can Proc ds2 help is this case
proc sql magic=102;
create table old_tran_prod_sync_master1 as
select
a.*,
b.Gluten_free,
b.Non_GMO,
b.Organic,
b.Organic_Validated,
b.Whole_Trade_Guarantee
from old_tran_prod_sync_master a
left join product_master_file b on a.UPC = b.NAT_UPC;
quit;
::::::::::::::::::::::::LOG:::::::::::::::::::::::;;
NOTE: PROC SQL planner chooses merge join.
NOTE: Table WORK.OLD_TRAN_PROD_SYNC_MASTER1 created, with 181692824 rows and 22 columns.
842 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 15:31.88
user cpu time 5:40.43
system cpu time 5:08.82
memory 1059492.75k
OS Memory 1077260.00k
Timestamp 07/15/2017 08:59:10 AM
Step Count 78 Switch Count 231
As already stated in your other thread (https://communities.sas.com/t5/Base-SAS-Programming/SAS-Code-Optimization/m-p/376456), 15 minutes for ~200 million observations isn't that bad.
Since you seem to have a one-to-many relationship, try this code:
proc sort data=old_tran_prod_sync_master;
by UPC;
run;
proc sort data=product_master_file;
by UPC;
run;
data old_tran_prod_sync_master1;
merge
old_tran_prod_sync_master (in=a)
product_master_file (
in=b
keep=UPC Gluten_free Non_GMO Organic Organic_Validated Whole_Trade_Guarantee
)
;
by UPC;
if a;
run;
The sorts will only be necessary if the datasets are not already sorted by UPC.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.