Hi all,
I'm currently running
data compiled_ip;
set raw.ccaes103 raw.ccaes113 raw.ccaes122 raw.ccaes132;
keep dx1 dobyr age svcdate enrolid;
run;
But each of the files is >35 GB and takes a long time. Any way I can make this more efficient? Already tried to reduce the size by selecting only the 5 essential variables.
You could try below syntax which will only load the desired variables into the PDV.
data compiled_ip;
set
raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid)
raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid)
raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
;
run;
I would expect some performance improvement but not a big one.
All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.
As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.
You could try below syntax which will only load the desired variables into the PDV.
data compiled_ip;
set
raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid)
raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid)
raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
;
run;
I would expect some performance improvement but not a big one.
All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.
As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.
Proc append or the append statement in proc datasets can be more efficient than the set statement in certain circumstances because it can use block i/o. Something like:
proc sql;
create table work.compiled_ip like raw.ccaes103;
quit;
proc datasets library=raw nolist;
append base=work.compiled_ip data=ccaes103;
append base=work.compiled_ip data=ccaes113;
append base=work.compiled_ip data=ccaes122;
append base=work.compiled_ip data=ccaes132;
run;
(untested)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.