Desktop productivity for business analysts and programmers

Parallel processing for a do loop

Reply
New Contributor
Posts: 2

Parallel processing for a do loop

I'm trying to reduce the computational time of a program which uses large datasets. The outline of the program is as follows:

 

1.create an empty base table A.

2. do the following for I=0 to 35;

3.    create a table B_I;

4.    append the table B_I to A;

5. end do;

 

Can someone suggest me some possible techniques to reduce the computational time of the code or if the code can be run parallel by breaking it into small steps?

 

Super User
Posts: 8,053

Re: Parallel processing for a do loop

Posted in reply to quarterpastix

Write the program in such a manner that table A is created in one step.

You could only parallelize the creation of the individual datasets, but the append/concatenation has to be done by one process, as only one process can have write access to table A at any given time.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 8,174

Re: Parallel processing for a do loop

Posted in reply to quarterpastix

Thats all well and good, but you haven't really shown us anything.  From what you post:

data a;
  /* create empty table part */
  length var1 $20 var2 8;

  /* append data part */
  do i=1 to 35;
    /* insert tableb creation here */
  end;
run;

By creatinig data separately and then appending you are creating 35 * read/writes plus headers for each of them.

New Contributor
Posts: 2

Re: Parallel processing for a do loop

The  code roughly looks like this.

 

%macro account ();
	
	PROC SQL;
		create table temp.base(
		    var1 length=8,
			var2 length=8,
			var3 length=8,
			var4 length=8,
			var5 length=8,
			var6 length=8,
			var7 length=8,
			);
	QUIT;

	%do i = 0 %to &number.;

		DATA _NULL_;
			*--- determine year-month of loop in sas date;
			YYYY_MM_SAS = INTNX("MONTH", "&MORT_S_DATE."D, &I.);
			CALL SYMPUTX ('YYYY_MM_SAS', YYYY_MM_SAS);

			*--- determine month of loop in YYYY_MM format;
			CALL SYMPUTX ('YYYY_MM', tranwrd(put(YYYY_MM_SAS, yymmd8.),"-","_"));

			*--- determine month of loop in YYYYMM format;
			CALL SYMPUTX ('YYYYMM', put(YYYY_MM_SAS, yymmn6.));
		RUN;

		DATA add&YYYY_MM._PERF;
			LENGTH 
				var1 length=8
				var2 length=8
				var3 length=8
				var4 length=8
				var5 length=8
				var6 length=8
				var7 length=8
			;
			SET arrears_credit (keep=var1 var2 
				where=(year_month=&YYYYMM.));
			SET ACCOUNT_&YYYY_MM. (KEEP =  var 1 var3 var4 var5 var6 var7
				)
				KEY = var1 / UNIQUE;

			
		RUN;

		PROC APPEND BASE = base DATA = add&YYYY_MM._PERF;
		RUN;

		PROC SQL;
			DROP TABLE add&YYYY_MM._PERF;
		QUIT;

	%end;
%mend account;
Super User
Super User
Posts: 8,174

Re: Parallel processing for a do loop

Posted in reply to quarterpastix

Yes, unfortunately a common issue this one, and all caused by the fact that date is a part of the dataset name.  Think how much simpler the world would be if you had one dataset, with a fixed name and structure, and a column which contained the date, no looping or appending, just simple data.  I don't know where this thinking came in but it does seem to account for more than half the questions on this forum. 

Anyways (not tested):

%let mort_s_date=12JAN2014;
%let number=5;

data _null_;
  call execute('data base;');
  do i=1 to &number.;
    tmp=intnx("month","&mort_s_date."d, &i.);
    call execute('set arrears_credit (keep=var1 var2 where=(yearmonth='||put(tmp,yymm6.)||'));');
    call execute('set account_'||put(year(tmp),z4.)||'_'||put(month(tmp),z2.)||' (keep=var1 var3 var4 var5 var6 var7);');
  end;
run;

 

 

Ask a Question
Discussion stats
  • 4 replies
  • 113 views
  • 0 likes
  • 3 in conversation