BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cdubs
Quartz | Level 8

Hi all,

 

I'm currently running

 

	data compiled_ip;
		set raw.ccaes103 raw.ccaes113 raw.ccaes122 raw.ccaes132; 

keep dx1 dobyr age svcdate enrolid; run;

 

But each of the files is >35 GB and takes a long time. Any way I can make this more efficient? Already tried to reduce the size by selecting only the 5 essential variables.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

 

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

 

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

 

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

 

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

PGStats
Opal | Level 21

Proc append or the append statement in proc datasets can be more efficient than the set statement in certain circumstances because it can use block i/o. Something like:

 

proc sql;
create table work.compiled_ip like raw.ccaes103;
quit;

proc datasets library=raw nolist;   
append base=work.compiled_ip data=ccaes103;
append base=work.compiled_ip data=ccaes113;
append base=work.compiled_ip data=ccaes122;
append base=work.compiled_ip data=ccaes132;
run;

(untested) 

PG
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1286 views
  • 2 likes
  • 3 in conversation