BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cdubs
Quartz | Level 8

Hi all,

 

I'm currently running

 

	data compiled_ip;
		set raw.ccaes103 raw.ccaes113 raw.ccaes122 raw.ccaes132; 

keep dx1 dobyr age svcdate enrolid; run;

 

But each of the files is >35 GB and takes a long time. Any way I can make this more efficient? Already tried to reduce the size by selecting only the 5 essential variables.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

 

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

 

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

@cdubs

You could try below syntax which will only load the desired variables into the PDV.

data compiled_ip;
  set 
    raw.ccaes103 (keep= dx1 dobyr age svcdate enrolid)
    raw.ccaes113 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes122 (keep= dx1 dobyr age svcdate enrolid) 
    raw.ccaes132 (keep= dx1 dobyr age svcdate enrolid)
  ;
run;

I would expect some performance improvement but not a big one.

 

All other measure depend on your environment and could require quite a bit of coding and testing which is imho only worth doing if this is a production job on the critical path.

 

As the bottleneck is most likely disk or network I/O my first measure would be to ensure that your source data sets get stored compressed.

PGStats
Opal | Level 21

Proc append or the append statement in proc datasets can be more efficient than the set statement in certain circumstances because it can use block i/o. Something like:

 

proc sql;
create table work.compiled_ip like raw.ccaes103;
quit;

proc datasets library=raw nolist;   
append base=work.compiled_ip data=ccaes103;
append base=work.compiled_ip data=ccaes113;
append base=work.compiled_ip data=ccaes122;
append base=work.compiled_ip data=ccaes132;
run;

(untested) 

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 675 views
  • 2 likes
  • 3 in conversation