BookmarkSubscribeRSS Feed
tej123
Fluorite | Level 6

I created a file (metadata.sas7bdat) which has 100 entries with two columns directory and sasdatasets.

 

Directory                filename

/home                    abc.sas7bdat

/home                    def.sas7bdat

/home/sub_dir       ghj.sas7bdat

 

Assume I have sample.sas which just gets the means of each dataset.

 

Proc means data=abc.sas out=x;

Run;

 

The above sample.sas should run on each dataset present in metadata file. Right now I have written it in very traditional way of it loops through all the files in metadata and runs proc means on each file and appends all the data. The final dataset has the means of all the files present in metadata.

 

I believe if I can split the metadata file into 4 parts each having (100/4=25 entries) and submit it as 4 programs and finally merge the output from all the 4 programs It would reduce the processing time by large amount. ( think of 10,000 entries and also assume there is more processing than proc means). Its just I am not well versed with what kind of options to use to submit it as 4 programs and how to sync the output from 4 different processes.

 

Can you provide me the skeleton of how I should construct this program , I have my vague thoughts but I am sure I can take away   some elegant answers .

3 REPLIES 3
Satish_Parida
Lapis Lazuli | Level 10
  1. Put your code to a macro. Let it have 3 macro variables input.
    1. Firstobs for your metadata in that range
    2. LastObs for your metadata in that range
    3. Serial Number of the iteration like 1 2 3 etc for your result data set to be named
  2. Use %sysget to get data from command line the above 3 variables needs to be passed from command line
    1. https://blogs.sas.com/content/iml/2015/03/16/pass-params-sysget.html              <DOC>
    2. Use the command line vars to pass into macro function of step1
  3. Now you have a process ready. You can use an External program to call the above SAS code with arguments in parallel. 
Astounding
PROC Star

One of the biggest time-wasters in this sort of program is the combining of the means into one large file.  Many times, that step looks something like this:

 

data all_means;

set all_means means_from_the_next_dataset;

run;

 

You haven't shown your code, nor the structure of the data sets that come out of PROC MEANS, so this is a "just in case" sort of note.  It would be much faster to use PROC APPEND for that step:

 

proc append base=all_means data=means_from_the_next_dataset;

run;

 

Just imagine ... the DATA step reads the means from the first data set you process 10,000 times, while PROC APPEND reads that data set just once.  The data set may be small, but that sort of thing adds up.

Reeza
Super User

Unless your records are in the millions or more you're not going to gain any efficiency by modifying your program. It will take longer for you to develop the 'efficient' code than you will ever save. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 841 views
  • 2 likes
  • 4 in conversation