I created a file (metadata.sas7bdat) which has 100 entries with two columns directory and sasdatasets.
Directory filename
/home abc.sas7bdat
/home def.sas7bdat
/home/sub_dir ghj.sas7bdat
Assume I have sample.sas which just gets the means of each dataset.
Proc means data=abc.sas out=x;
Run;
The above sample.sas should run on each dataset present in metadata file. Right now I have written it in very traditional way of it loops through all the files in metadata and runs proc means on each file and appends all the data. The final dataset has the means of all the files present in metadata.
I believe if I can split the metadata file into 4 parts each having (100/4=25 entries) and submit it as 4 programs and finally merge the output from all the 4 programs It would reduce the processing time by large amount. ( think of 10,000 entries and also assume there is more processing than proc means). Its just I am not well versed with what kind of options to use to submit it as 4 programs and how to sync the output from 4 different processes.
Can you provide me the skeleton of how I should construct this program , I have my vague thoughts but I am sure I can take away some elegant answers .
One of the biggest time-wasters in this sort of program is the combining of the means into one large file. Many times, that step looks something like this:
data all_means;
set all_means means_from_the_next_dataset;
run;
You haven't shown your code, nor the structure of the data sets that come out of PROC MEANS, so this is a "just in case" sort of note. It would be much faster to use PROC APPEND for that step:
proc append base=all_means data=means_from_the_next_dataset;
run;
Just imagine ... the DATA step reads the means from the first data set you process 10,000 times, while PROC APPEND reads that data set just once. The data set may be small, but that sort of thing adds up.
Unless your records are in the millions or more you're not going to gain any efficiency by modifying your program. It will take longer for you to develop the 'efficient' code than you will ever save.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.