DATA Step, Macro, Functions and more

Splitting a program into multiple programs

Occasional Contributor
Posts: 9

Splitting a program into multiple programs

I created a file (metadata.sas7bdat) which has 100 entries with two columns directory and sasdatasets.


Directory                filename

/home                    abc.sas7bdat

/home                    def.sas7bdat

/home/sub_dir       ghj.sas7bdat


Assume I have which just gets the means of each dataset.


Proc means out=x;



The above should run on each dataset present in metadata file. Right now I have written it in very traditional way of it loops through all the files in metadata and runs proc means on each file and appends all the data. The final dataset has the means of all the files present in metadata.


I believe if I can split the metadata file into 4 parts each having (100/4=25 entries) and submit it as 4 programs and finally merge the output from all the 4 programs It would reduce the processing time by large amount. ( think of 10,000 entries and also assume there is more processing than proc means). Its just I am not well versed with what kind of options to use to submit it as 4 programs and how to sync the output from 4 different processes.


Can you provide me the skeleton of how I should construct this program , I have my vague thoughts but I am sure I can take away   some elegant answers .

Frequent Contributor
Posts: 109

Re: Splitting a program into multiple programs

  1. Put your code to a macro. Let it have 3 macro variables input.
    1. Firstobs for your metadata in that range
    2. LastObs for your metadata in that range
    3. Serial Number of the iteration like 1 2 3 etc for your result data set to be named
  2. Use %sysget to get data from command line the above 3 variables needs to be passed from command line
    1.              <DOC>
    2. Use the command line vars to pass into macro function of step1
  3. Now you have a process ready. You can use an External program to call the above SAS code with arguments in parallel. 
Super User
Posts: 6,638

Re: Splitting a program into multiple programs

One of the biggest time-wasters in this sort of program is the combining of the means into one large file.  Many times, that step looks something like this:


data all_means;

set all_means means_from_the_next_dataset;



You haven't shown your code, nor the structure of the data sets that come out of PROC MEANS, so this is a "just in case" sort of note.  It would be much faster to use PROC APPEND for that step:


proc append base=all_means data=means_from_the_next_dataset;



Just imagine ... the DATA step reads the means from the first data set you process 10,000 times, while PROC APPEND reads that data set just once.  The data set may be small, but that sort of thing adds up.

Super User
Posts: 23,332

Re: Splitting a program into multiple programs

Unless your records are in the millions or more you're not going to gain any efficiency by modifying your program. It will take longer for you to develop the 'efficient' code than you will ever save. 

Ask a Question
Discussion stats
  • 3 replies
  • 4 in conversation