02-04-2018 10:48 AM
I created a file (metadata.sas7bdat) which has 100 entries with two columns directory and sasdatasets.
Assume I have sample.sas which just gets the means of each dataset.
Proc means data=abc.sas out=x;
The above sample.sas should run on each dataset present in metadata file. Right now I have written it in very traditional way of it loops through all the files in metadata and runs proc means on each file and appends all the data. The final dataset has the means of all the files present in metadata.
I believe if I can split the metadata file into 4 parts each having (100/4=25 entries) and submit it as 4 programs and finally merge the output from all the 4 programs It would reduce the processing time by large amount. ( think of 10,000 entries and also assume there is more processing than proc means). Its just I am not well versed with what kind of options to use to submit it as 4 programs and how to sync the output from 4 different processes.
Can you provide me the skeleton of how I should construct this program , I have my vague thoughts but I am sure I can take away some elegant answers .
02-04-2018 11:14 AM
02-04-2018 12:57 PM
One of the biggest time-wasters in this sort of program is the combining of the means into one large file. Many times, that step looks something like this:
set all_means means_from_the_next_dataset;
You haven't shown your code, nor the structure of the data sets that come out of PROC MEANS, so this is a "just in case" sort of note. It would be much faster to use PROC APPEND for that step:
proc append base=all_means data=means_from_the_next_dataset;
Just imagine ... the DATA step reads the means from the first data set you process 10,000 times, while PROC APPEND reads that data set just once. The data set may be small, but that sort of thing adds up.
02-04-2018 01:41 PM
Unless your records are in the millions or more you're not going to gain any efficiency by modifying your program. It will take longer for you to develop the 'efficient' code than you will ever save.