DATA Step, Macro, Functions and more

Data set keep -- anyway to speed up the code and procssing time?

Accepted Solution Solved
Reply
Contributor
Posts: 53
Accepted Solution

Data set keep -- anyway to speed up the code and procssing time?

[ Edited ]

Hi all,

 

I'm currently running

 

	data compiled_ip;
		set raw.ccaes103 raw.ccaes113 raw.ccaes122 raw.ccaes132; 

keep dx1 dobyr age svcdate enrolid; run;

 

But each of the files is >35 GB and takes a long time. Any way I can make this more efficient? Already tried to reduce the size by selecting only the 5 essential variables.


Accepted Solutions
Solution
‎12-29-2017 12:27 AM
Respected Advisor
Posts: 4,797

Re: Data set keep -- anyway to speed up the code and procssing time?

[ Edited ]

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

 

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

 

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

View solution in original post


All Replies
Solution
‎12-29-2017 12:27 AM
Respected Advisor
Posts: 4,797

Re: Data set keep -- anyway to speed up the code and procssing time?

[ Edited ]

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

 

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

 

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

Esteemed Advisor
Posts: 5,625

Re: Data set keep -- anyway to speed up the code and procssing time?

[ Edited ]

Proc append or the append statement in proc datasets can be more efficient than the set statement in certain circumstances because it can use block i/o. Something like:

 

proc sql;
create table work.compiled_ip like raw.ccaes103;
quit;

proc datasets library=raw nolist;   
append base=work.compiled_ip data=ccaes103;
append base=work.compiled_ip data=ccaes113;
append base=work.compiled_ip data=ccaes122;
append base=work.compiled_ip data=ccaes132;
run;

(untested) 

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 130 views
  • 2 likes
  • 3 in conversation