BookmarkSubscribeRSS Feed
quarterpastix
Calcite | Level 5

I'm trying to reduce the computational time of a program which uses large datasets. The outline of the program is as follows:

 

1.create an empty base table A.

2. do the following for I=0 to 35;

3.    create a table B_I;

4.    append the table B_I to A;

5. end do;

 

Can someone suggest me some possible techniques to reduce the computational time of the code or if the code can be run parallel by breaking it into small steps?

 

4 REPLIES 4
Kurt_Bremser
Super User

Write the program in such a manner that table A is created in one step.

You could only parallelize the creation of the individual datasets, but the append/concatenation has to be done by one process, as only one process can have write access to table A at any given time.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Thats all well and good, but you haven't really shown us anything.  From what you post:

data a;
  /* create empty table part */
  length var1 $20 var2 8;

  /* append data part */
  do i=1 to 35;
    /* insert tableb creation here */
  end;
run;

By creatinig data separately and then appending you are creating 35 * read/writes plus headers for each of them.

quarterpastix
Calcite | Level 5

The  code roughly looks like this.

 

%macro account ();
	
	PROC SQL;
		create table temp.base(
		    var1 length=8,
			var2 length=8,
			var3 length=8,
			var4 length=8,
			var5 length=8,
			var6 length=8,
			var7 length=8,
			);
	QUIT;

	%do i = 0 %to &number.;

		DATA _NULL_;
			*--- determine year-month of loop in sas date;
			YYYY_MM_SAS = INTNX("MONTH", "&MORT_S_DATE."D, &I.);
			CALL SYMPUTX ('YYYY_MM_SAS', YYYY_MM_SAS);

			*--- determine month of loop in YYYY_MM format;
			CALL SYMPUTX ('YYYY_MM', tranwrd(put(YYYY_MM_SAS, yymmd8.),"-","_"));

			*--- determine month of loop in YYYYMM format;
			CALL SYMPUTX ('YYYYMM', put(YYYY_MM_SAS, yymmn6.));
		RUN;

		DATA add&YYYY_MM._PERF;
			LENGTH 
				var1 length=8
				var2 length=8
				var3 length=8
				var4 length=8
				var5 length=8
				var6 length=8
				var7 length=8
			;
			SET arrears_credit (keep=var1 var2 
				where=(year_month=&YYYYMM.));
			SET ACCOUNT_&YYYY_MM. (KEEP =  var 1 var3 var4 var5 var6 var7
				)
				KEY = var1 / UNIQUE;

			
		RUN;

		PROC APPEND BASE = base DATA = add&YYYY_MM._PERF;
		RUN;

		PROC SQL;
			DROP TABLE add&YYYY_MM._PERF;
		QUIT;

	%end;
%mend account;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, unfortunately a common issue this one, and all caused by the fact that date is a part of the dataset name.  Think how much simpler the world would be if you had one dataset, with a fixed name and structure, and a column which contained the date, no looping or appending, just simple data.  I don't know where this thinking came in but it does seem to account for more than half the questions on this forum. 

Anyways (not tested):

%let mort_s_date=12JAN2014;
%let number=5;

data _null_;
  call execute('data base;');
  do i=1 to &number.;
    tmp=intnx("month","&mort_s_date."d, &i.);
    call execute('set arrears_credit (keep=var1 var2 where=(yearmonth='||put(tmp,yymm6.)||'));');
    call execute('set account_'||put(year(tmp),z4.)||'_'||put(month(tmp),z2.)||' (keep=var1 var3 var4 var5 var6 var7);');
  end;
run;

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2131 views
  • 0 likes
  • 3 in conversation