topic Re: Data set keep -- anyway to speed up the code and procssing time? in SAS Programming

Data set keep -- anyway to speed up the code and procssing time?

cdubs — Thu, 28 Dec 2017 01:04:30 GMT

Hi all,

I'm currently running

	data compiled_ip;
		set raw.ccaes103 raw.ccaes113 raw.ccaes122 raw.ccaes132; 
		
		keep dx1 dobyr age svcdate enrolid;
	run;

But each of the files is >35 GB and takes a long time. Any way I can make this more efficient? Already tried to reduce the size by selecting only the 5 essential variables.

Re: Data set keep -- anyway to speed up the code and procssing time?

Patrick — Thu, 28 Dec 2017 01:26:27 GMT

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

Re: Data set keep -- anyway to speed up the code and procssing time?

PGStats — Thu, 28 Dec 2017 07:16:35 GMT

Proc append or the append statement in proc datasets can be more efficient than the set statement in certain circumstances because it can use block i/o. Something like:

proc sql;
create table work.compiled_ip like raw.ccaes103;
quit;

proc datasets library=raw nolist;   
append base=work.compiled_ip data=ccaes103;
append base=work.compiled_ip data=ccaes113;
append base=work.compiled_ip data=ccaes122;
append base=work.compiled_ip data=ccaes132;
run;

(untested)